An Ansible Adventure: Using the Python API to list unmanaged packages

Reading time:	24 minutes	Published date:	11 Jul 2020

I’ve been trying to get into the habit of managing an ansible playbook to keep track of any manual installations or config tweaks I do on my local machine. No matter how vigilant you are though, there will always be packages you install that slip through the cracks.

It would be great for example if you could run an ansible playbook which runs all the relevant install tasks, and then provides you a list of these “unmanaged packages” that are not handled by the playbook:

$ ansible-playbok site.yml -i inventory.yml --ask-sudo-pass --tags package-installs
...
UNMANAGED PACKAGE LIST ************************************************
localhost:
  - sqlite3
  - libsigc++-2.0-0v5
  - build-essential
  - jekyll
  - libxkbcommon-x11-0
192.168.1.3:
  - net-tools
  - packer

You can imagine how such a script might be useful if you want to ensure that all system package installations are performed using the playbook, or to audit systems to ensure only automatically installed packages are installed.

What follows is an interesting exercise (adventure if you will) of trying to create such a script using internal ansible python objects. I hesitate to call this an API (even though the official docs do), as that would imply stable, object level references and documentation - but there is none of that, and even what little official documentation there is uses examples with plenty of calls to methods starting with _. Thankfully the code is easy to read and understand, and I certainly learned a lot about how Ansible works along the way. I hope you do too.

Before we begin I should mention that all the code snippets use internal objects from Ansible v2.9. These are very likely to change between versions so I do not expect them to work for other versions. I have also only focused on making this work on Ubuntu/Debian, as this is what I use personally.

0. Background

For this exercise, I wanted the end result to be a standalone python script that:

I can run without any dependencies (other than the ansible pip package(s)) and without editing/creating any files or configs.
Will execute any playbook against on all the relevant hosts, based on task level args as well as inventory, host and group args just like a regular playbook execution would.
Will accept any other valid options for ansible-playbook (i.e act as a wrapper for ansible-playbook)
Ensure that the playbook executes in --check/dry-run mode, so that nothing is actually changed on the host(s)
Print only the unmanaged packages for each host where packages are installed.

This limited my options somewhat, and I arrived at this approach:

Create a set of tasks to find manually installed packages on Ubuntu/Debian
Create a script which can execute any playbook using the Python API
Prepend the tasks defined in our short playbook in 1. to the playbook being executed by the script in 2.
Create a callback plugin that will read stdout on successful execution of each task in the target playbook, and keep track of any packages that will be installed, and compare this to the output of the manually installed packages to generate the list of unmanaged packages for each host.

1. Find manually installed packages (on Debian/Ubuntu)

The first step is to figure out which packages have been installed manually on the system. We are only interested in packages that have been manually installed by the user - for eg. by running apt-get install <package name>. For this, all we have to do is call on our trusty package management system of choice, to filter out package dependency installs. I’ll only be covering Ubuntu and Debian systems, as I use Ubuntu, but you can find a similar solution[1] for other distros. Judging by the amount of duplicate questions for this topic on stack overflow (for eg. see this[3], this[4], this[5] ) and this[6]) this is a common task people want to do.

Here are a few ways of doing it:

Check manually installed packages using apt
```
$ apt list --manual-installed
# But this returns the following warning:
## WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

# So instead, lets use this, which returns a more reliable output:
$ apt-mark showmanual
$ apt-mark showmanual | wc -l # yields => 264 on my machine
```
apt marks all dependencies of manually installed packages as “automatic”, so that if you run apt-get autoremove, it removes any dangling dependencies. This is useful because it can provide a list of packages that were installed by the user manually, rather than by the system automatically to satisfy dependencies. However, apt also marks packages that were installed as part of the Ubuntu install as manual, which is not ideal for our use case.
Look through the apt history files (/var/log/apt/history.log.*.gz) and check for install entries:
```
$ zcat /var/log/apt/history.log.*.gz | cat - /var/log/apt/history.log | grep -Po 'apt-get install (?!.*--reinstall)\K.*'
$ !! | wc -l # yields => 25 on my machine
```
One disadvantage of this option is that on most systems logrotate may have already cleaned up these logs, so there is no guarantee they’ll still be there to check. The main problem though is that we’d also need to check which packages were removed manually and disregard the package if the uninstall happened after the install. This is probably do-able, but not worth my time.

Both of these options are not really satisfactory by themselves. The first option however, can be improved by filtering out all of the packages that were installed as part of the Ubuntu install process. If the install is recent, logs can sometimes be found at /var/log/installer/, but this is not always reliable as they may have been moved and not available on all versions. A good enough solution is to grab the manifest of the installation packages from releases.ubuntu.com:

#!/bin/bash
UBUNTU_VERSION="$(lsb_release -a 2> /dev/null | grep "Description" | grep -Po 'Ubuntu \K(.*) ')"
UBUNTU_RELEASE="$(lsb_release -a 2> /dev/null | grep "Release" | cut -f 2)"
UBUNTU_INSTALL_TYPE="desktop"
# (non-auto installed packagess) - (pre-installed packages from ubuntu release manifest)
comm -13 <(wget "http://releases.ubuntu.com/releases/$UBUNTU_RELEASE/ubuntu-$UBUNTU_VERSION-$UBUNTU_INSTALL_TYPE-amd64.manifest" -q -O - | cut -f 1 | sort -u) <(apt-mark showmanual | sort -u)
!! | wc -l # => yields 71

This is still not perfect as it reports packages such as libsigc++-2.0-0v5 and libido3-0.1-0 which I definitely did not install manually, but its probably accurate enough for our purposes.

Converting the above script to an ansible playbook yields the following:

- hosts: all
  gather_facts: yes
  become: false
  tasks:
  - name: package_check - get ubuntu version
    set_fact:
      ubuntu_version: "{{ ansible_lsb.get('description') | regex_findall('[0-9\\.]+') }}"

  - name: package_check - get ubuntu manifest packages
    shell: "set -o pipefail && wget 'http://releases.ubuntu.com/releases/{{ ansible_distribution_version }}/ubuntu-{{ ubuntu_version.0 }}-desktop-amd64.manifest' -q -O - | cut -f 1 | sort -u"
    args:
      executable: /bin/bash
    register: ubuntu_manifest_packages

  - name: package_check - get manually installed packages
    shell: "set -o pipefail && apt-mark showmanual | sort -u"
    args:
      executable: /bin/bash
    register: manual_installed_packages

  - name: package_check - get real manually installed packages
    set_fact:
      manual_packages: "{{ manual_installed_packages.stdout_lines | difference(ubuntu_manifest_packages.stdout_lines) }}"

We use set -o pipefail with the shell module to ensure that the pipeline raises a non-zero exit code if any of the commands within the pipeline fail. This means that we need to use the /bin/bash executable (as the default shell ansible uses is /bin/sh, which does not support the pipefail option). Not also that we saved the final list of manual packages within ansible’s facts, so that it can be easily retrieved when this playbook is executed with other plays.

2. Running an Ansible Playbook using the Python API

Now that we’ve figured out how to check installed packages and created a playbook to do so, we need to find a way to execute this playbook along with our target playbook on the same hosts. The usual and sensible way is to include the play within the playbook:

- name: Include a play after another play
  include: otherplays.yaml

However, for our use case, we want a standalone script with no other file dependencies. This requires us to use ansible’s internal python objects to customise the execution of both of the playbooks.

2.1 Argument Parsing

Firstly, we need our script to act as a wrapper for ansible-playbook, and accept all of the arguments it accepts and pass these on to the ansible context so it can be used by our playbook executor.

When you run any ansible command, it calls a stub cli script. This stub then invokes the appropriate cli object depending on the command. For eg. ansible-playbook invokes the ansible.cli.playbook.PlaybookCLI class. All of the CLI objects can be found in ansible.cli.* and they each extend the ansible.cli.CLI interface.

Our script will only ever execute playbooks, so we just need to invoke the PlaybookCLI class and run the parse() method to parse the cli arguments:

import sys
from ansible import context
from ansible.module_utils._text import to_text
from ansible.cli.playbook import PlaybookCLI

# Read all the arguments from the cli
args = [to_text(a, errors='surrogate_or_strict') for a in sys.argv]

# Pass them to the playbook cli object
pb_cli = PlaybookCLI(args)

# Parse the arguments
pb_cli.parse()

# The context should now contain all of our parsed arguments
print(context.CLIARGS)

We also need to ensure that any playbook that is executed, is run using --check mode (also known as dry-run), so that nothing is actually changed on the hosts.

When the arguments are parsed into context.CLIARGS, they are stored in an ImmutableDict class. It is definitely possible to hack around it to modify the contents and add --check if necessary. But, it is easier to just do this before the args are parsed:

...
# Read all the arguments from the cli
args = [to_text(a, errors='surrogate_or_strict') for a in sys.argv]
# Ensure dry-run option is specified
if '--check' not in args:
    args.append('--check')
...

2.2 Preparing for Execution

Next, we prepare to execute the playbook by performing all of the necessary pre-execution actions Ansible performs such as filtering hosts from the inventory file, checking to make sure playbook files exist, parsing playbook yaml files to extract tasks, loading any plugins, handling passwords, substituting in any variable values etc.

Luckily for us, all of this work is done for us in the run() method of ansible.cli.playbook.PlaybookCLI. We will add this to the script as is (from v2.9.1):

import os
from ansible.module_utils._text import to_text, to_bytes
from ansible.cli import CLI
from ansible.utils.collection_loader import get_collection_name_from_path, set_collection_playbook_paths
from ansible.plugins.loader import add_all_plugin_dirs
...

### This section is copied directly from: https://github.com/ansible/ansible/blob/stable-2.9/lib/ansible/cli/playbook.py (def run()), with minor modifications to enable it work outside of a class

# manages passwords
sshpass = None
becomepass = None
passwords = {}

# initial error check, to make sure all specified playbooks are accessible
# before we start running anything through the playbook executor
b_playbook_dirs = []
for playbook in context.CLIARGS['args']:
    if not os.path.exists(playbook):
        raise AnsibleError("the playbook: %s could not be found" % playbook)
    if not (os.path.isfile(playbook) or stat.S_ISFIFO(os.stat(playbook).st_mode)):
        raise AnsibleError("the playbook: %s does not appear to be a file" % playbook)

    b_playbook_dir = os.path.dirname(os.path.abspath(to_bytes(playbook, errors='surrogate_or_strict')))
    # load plugins from all playbooks in case they add callbacks/inventory/etc
    add_all_plugin_dirs(b_playbook_dir)

    b_playbook_dirs.append(b_playbook_dir)

set_collection_playbook_paths(b_playbook_dirs)

playbook_collection = get_collection_name_from_path(b_playbook_dirs[0])

if playbook_collection:
    display.warning("running playbook inside collection {0}".format(playbook_collection))
    AnsibleCollectionLoader().set_default_collection(playbook_collection)

# don't deal with privilege escalation or passwords when we don't need to
if not (context.CLIARGS['listhosts'] or context.CLIARGS['listtasks'] or
        context.CLIARGS['listtags'] or context.CLIARGS['syntax']):
    (sshpass, becomepass) = pb_cli.ask_passwords()
    passwords = {'conn_pass': sshpass, 'become_pass': becomepass}

# create base objects
loader, inventory, variable_manager = pb_cli._play_prereqs()

# Fix this when we rewrite inventory by making localhost a real host (and thus show up in list_hosts())
CLI.get_host_list(inventory, context.CLIARGS['subset'])

# flush fact cache if requested
if context.CLIARGS['flush_cache']:
    pb_cli._flush_cache(inventory, variable_manager)

### End of Section

2.3 Executing the playbook

Now that we have everything in place to execute the playbook, we can go ahead and execute it. Ansible uses the PlaybookExecutor class to perform this set of actions.

We can refer back to the PlaybookCLI class to see how it is called:

from ansible.executor.playbook_executor import PlaybookExecutor
...
# create the playbook executor, which manages running the plays via a task queue manager
pbex = PlaybookExecutor(playbooks=context.CLIARGS['args'],
                        inventory=inventory,
                        variable_manager=variable_manager,
                        loader=loader,
                        passwords=passwords)
results = pbex.run()

We don’t really need to do anything with the results, because all of the useful information we need will be reported to stdout, and we can capture this using Callback Plugins (more on this in section 2.5).

If we save this script as check_packages.py and run it, it should work just like ansible-playbook. For example, if we used it to run our playbook which checks manually installed packages, we should get:

$ python3 check_packages.py manual_packages.yml --ask-become-pass -i hosts
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [localhost] *******************************************************

TASK [Gathering Facts] *************************************************
ok: [localhost]

TASK [set_fact] ********************************************************
ok: [localhost]

TASK [shell] ***********************************************************
changed: [localhost]

TASK [shell] ***********************************************************
changed: [localhost]

TASK [set_fact] ********************************************************
ok: [localhost]

PLAY RECAP *************************************************************
localhost: ok=5    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Great! We’ve literally just created a crappier version of ansible-playbook. How is this helpful I hear you ask? Let’s find out.

2.4 Executing Plays

The whole point of figuring out how to execute a playbook using the Python API was to execute the playbook for finding manually installed packages (from section 1) along with any target playbook.

If we don’t want to read the playbook tasks from a yaml file, the alternative is to skip the yaml parsing, and just provide the executor with the parsed tasks which we can hardcode in our script.

Lets have a look at how Ansible parses the playbook file into the internal data structure the executor uses. The ansible.parsing.dataloader.DataLoader class is responsible for parsing the yaml into json that the executor can execute. We can get the parsed json with a bit of debugging:

# 1. Temporarily save the manually installed packages playbook yaml file somewhere
import json
from ansible.playbook import Playbook
# 2. Load the playbook file
pb = Playbook.load(context.CLIARGS['args'][0], variable_manager=variable_manager, loader=loader)
# Debug: Print the json parsed from the yaml file by the DataLoader object
print(json.dumps(pb._loader.__dict__['_FILE_CACHE']['/path/to/playbook/file/package_check.yml'][0], indent=2))

This should give us the following output:

{
  "hosts": "all",
  "gather_facts": true,
  "become": false,
  "tasks": [
    {
      "name": package_check - get ubuntu version,
      "set_fact": {
        "ubuntu_version": "{{ ansible_lsb.get('description') | regex_findall('[0-9\\.]+') }}"
      }
    },
    {
      "name": package_check - get ubuntu manifest packages,
      "shell": "set -o pipefail && wget 'http://releases.ubuntu.com/releases/{{ ansible_distribution_version }}/ubuntu-{{ ubuntu_version.0 }}-desktop-amd64.manifest' -q -O - | cut -f 1 | sort -u",
      "args": {
        "executable": "/bin/bash"
      },
      "register": "ubuntu_manifest_packages"
    },
    {
      "name": package_check - get manually installed packages,
      "shell": "set -o pipefail && apt-mark showmanual | sort -u",
      "args": {
        "executable": "/bin/bash"
      },
      "register": "manual_installed_packages"
    },
    {
      "name": package_check - get real manually installed packages,
      "set_fact": {
        "manual_packages": "{{ manual_installed_packages.stdout_lines | difference(ubuntu_manifest_packages.stdout_lines) }}"
      }
    }
  ]
}

You may have noticed that the json above has the hosts: all directive. This will mean that these tasks will run on all of the hosts defined in the inventory file. And because of the Implicit Localhost behaviour, if a local connection is not specified in the inventory file, the tasks will not run on localhost. This is not exactly the behaviour we want, but we need to cover more ground be able to fix this, so we’ll come back to this in section 3.7

We now have the tasks, how do we pass it to the executor to run? The only example in the ansible python API docs provides us with this answer, including some helpful comments. First we convert the json above to a python dict and hardcode it in our script as the play_source variable. Next, we copy a few lines from the example in the docs to get:

## Copied from: https://docs.ansible.com/ansible/latest/dev_guide/developing_api.html

# Create play object, playbook objects use .load instead of init or new methods,
# this will also automatically create the task objects from the info provided in play_source
play = Play().load(play_source, variable_manager=variable_manager, loader=loader)

# Run it - instantiate task queue manager, which takes care of forking and setting up all objects to iterate over host list and tasks
tqm = None
try:
    tqm = TaskQueueManager(
              inventory=inventory,
              variable_manager=variable_manager,
              loader=loader,
              passwords=passwords
          )
    result = tqm.run(play) # most interesting data for a play is actually sent to the callback's methods
finally:
    # we always need to cleanup child procs and the structures we use to communicate with them
    if tqm is not None:
        tqm.cleanup()

    # Remove ansible tmpdir
    shutil.rmtree(C.DEFAULT_LOCAL_TMP, True)

## End of Section

If we run this however, our tasks will be skipped. This is because we are running this right after running a PlaybookExecutor object that has used the same inventory, variable_manager and loader objects. If we peek at the code for the PlaybookExecutor, we can see that some of these objects are cleaned up after the run:

...
        finally:
            if self._tqm is not None:
                self._tqm.cleanup()
            if self._loader:
                self._loader.cleanup_all_tmp_files()

This causes ansible to lose state. If we want to regain this we need to reinstantiate these objects before creating the TaskQueueManager object:

# create base objects
loader, inventory, variable_manager = pb_cli._play_prereqs()

Before we can call this section done, there is one other loose end to tie up. Since we are not explicitly passing in any cli arg options to the TaskQueueManager, it will use the options that have been parsed into the ansible context (context.CLIARGS['args']).

In section 2.1 we made sure that the context args contains --check, but now we want to run the manual installation check tasks outside of --check mode. Remember that context.CLIARGS['args'] is an ImmutableDict, so we can’t edit it in place. Instead, we can get the currently parsed options as a plain dictionary, modify it and create a new ImmutableDict object to override the current context:

from ansible.module_utils.common.collections import ImmutableDict
...
# Get the cli options from the current context
options = context.CLIARGS._store
play_options = copy(options)

# Ensure --check option is disabled
if play_options.get('check') != False:
    play_options['check'] = False

# Ensure all tags are cleared
if play_options.get('tags'):
    play_options['tags'] = ()
if play_options.get('skip_tags'):
    play_options['skip_tags'] = ()

# Override the current context
context.CLIARGS = ImmutableDict(**play_options)

tqm = None
try:
    tqm = TaskQueueManager(
...

The script should now execute the playbook tasks in --check mode and the manually installed package tasks without the --check option.

3. Writing a Callback Plugin

The final piece of the puzzle is to gather the results of the tasks from each host and print them in a pretty format at the end. Ansible provides callbacks at each stage of execution for developers to execute custom code. These are called Callback Plugins:

Callback plugins enable adding new behaviors to Ansible when responding to events.

You can activate a custom callback by either dropping it into a callback_plugins directory adjacent to your play, inside a role, or by putting it in one of the callback directory sources configured in ansible.cfg.

We don’t really want to create or modify the ansible conf or create additional directories if we can avoid it. Ideally this script should be runnable without creating any artifacts or modifying the hosts in ay way. We can define our plugin within our script and override the default stdout_plugin to be our one. Let’s go through this step by step:

3.1 Defining a new Callback Plugin

The Ansible developer docs provide examples of various callback plugins, but no real object-level api reference. Luckily the code is straightforward and easy to follow. To implement a Callback Plugin, we need to extend the CallbackBase abstract class. The docstring within the class says:

This is a base ansible callback class that does nothing. New callbacks should use this class as a base and override any callback methods they wish to execute custom actions.

There is no real documentation on what triggers each of the callback methods, but most can be guessed from the name. For eg.

v2_runner_on_ok: is called on successful completion of a task
v2_runner_on_failed: is called when a task fails
v2_runner_on_skipped: is called when the task is skipped
v2_runner_on_unreachable: is called when the host is unreachable

This is probably a good time to mention that I am running Ansible v2.9.1, and all the code referenced here is for that version. The methods above are also specifically for versions > 2, but they will call the relevant v1 methods if necessary.

3.2 Plugin Naming

The callback plugin class must be named CallbackModule, as this is what the executor expects all plugins to be called when reading from one of the plugin directories. This is not mandatory however, if we are using the internal plumbing to execute the playbook ourselves (more on this in the next section).

Let’s define the basics of any callback plugin following one of the examples:

__metaclass__ = type
from ansible.plugins.callback import CallbackBase

class CallbackModule(CallbackBase):
    '''
    Prints package information to stdout
    '''

    CALLBACK_VERSION = 1.0
    CALLBACK_TYPE = 'stdout'
    CALLBACK_NAME = 'package_check'

    # Define an object to to store results of detected package installs by the playbook and manual installs per host
    # {
    #     "localhost": {
    #         "manual" set{'steam', 'vim'}
    #         "managed": set{'steam'}
    #     },
    #     "192.168.1.40": {
    #         "manual" set{'ntp', 'sshd'}
    #         "managed": set{'sshd', 'ntp'}
    #     }
    # }
    self.package_info = {}

It is prudent to define CALLBACK_TYPE because the docs say:

You can only have one plugin be the main manager of your console output. If you want to replace the default, you should define CALLBACK_TYPE = stdout in the subclass

3.3 Plugin Documentation

Now let’s add some doucmentation for the plugin before the class definition:

DOCUMENTATION = '''
    callback: package_check
    type: stdout
    short_description: report unmanaged manually installed packages
    version_added: historical
    description:
        - This provides a summary of all packages installed on the system not managed by the playbook
'''

class CallbackModule(CallbackBase):
   ...

This will appear when running a user searches for the plugin documentation using ansible-doc. For example, check out the docs for the default callback plugin:

ansible-doc -t callback default

We won’t see the doco for our plugin using ansible-doc yet because we haven’t saved it to one of the ansible plugin directories (for eg. ~/.ansible/plugins/). We don’t plan on doing this, but it is probably best practice to do this anyway in case we decide to later on.

3.4 Override Plugin Callback Methods

We want the behaviour our module to be exactly the same as the default stdout callback plugin, with the exception of the v2_runner_on_ok and v2_playbook_on_stats (called when tasks are complete and stats about to be displayed).

So, instead of implementing all the other methods ourselves, we can just extend the default plugin and override the above methods. So our plugin class now looks like:

__metaclass__ = type
from ansible.plugins.callback.default import CallbackModule as DefaultCallbackModule
...

class CallbackModule(DefaultCallbackModule):
    '''
    Prints package information to stdout
    '''
...

Now to override our methods. First lets look at the callback for a successful task. We are interested in any successful package installs performed by a task in the target playbook. To filter these out, we can have a look at the object that is passed to the callback method as the result arg. This is a TaskResult object. Having a look through the code, the self._task_fields object seems promising, and could have the info we’re looking for to filter our task results. If we create a simple override method as follows and run an example playbook with it:

    def v2_runner_on_ok(self, result):
        # Print any warnings
        self._handle_warnings(result._result)

        # Debug: print the task result fields object
        print(json.dumps(result._task_fields, indent=2))

It should yield something like the following for a package install:

TASK [install steam] ***********************************************
{'action': 'apt',
 'any_errors_fatal': False,
 'args': {'_ansible_check_mode': True,
          '_ansible_debug': False,
          '_ansible_diff': False,
          '_ansible_keep_remote_files': False,
          '_ansible_module_name': 'apt',
          '_ansible_no_log': False,
          '_ansible_remote_tmp': '~/.ansible/tmp',
          '_ansible_selinux_special_fs': ['fuse',
                                          'nfs',
                                          'vboxsf',
                                          'ramfs',
                                          '9p',
                                          'vfat'],
          '_ansible_shell_executable': '/bin/sh',
          '_ansible_socket': None,
          '_ansible_string_conversion_action': 'warn',
          '_ansible_syslog_facility': 'LOG_USER',
          '_ansible_tmpdir': '/home/moebius/.ansible/tmp/ansible-tmp-1594038862.479424-8355-23378989975275/',
          '_ansible_verbosity': 0,
          '_ansible_version': '2.9.10',
          'name': 'steam',
          'state': 'latest',
          'update_cache': True},
 'async': 0,
 'async_val': 0,
 'become': True,
 'become_exe': None,
 'become_flags': None,
 'become_method': 'sudo',
 'become_user': None,
 'changed_when': [],
 'check_mode': True,
 'collections': [],
 'connection': 'smart',
 'debugger': None,
 'delay': 5,
 'delegate_facts': None,
 'delegate_to': None,
 'diff': False,
 'environment': [{}],
 'failed_when': [],
 'ignore_errors': None,
 'ignore_unreachable': None,
 'loop': None,
 'loop_control': None,
 'loop_with': None,
 'module_defaults': [],
 'name': 'install steam',
 'no_log': None,
 'notify': None,
 'poll': 15,
 'port': None,
 'register': None,
 'remote_user': None,
 'retries': 3,
 'run_once': None,
 'tags': ['games', 'package'],
 'throttle': 0,
 'until': [],
 'vars': {},
 'when': []}

We can see that the action attribute provides the module name of the task, which is useful to filter task info for package installs.

    def v2_runner_on_ok(self, result):
        # Print any warnings
        self._handle_warnings(result._result)

        # Filter for package installs
        if result._task_fields.get('action') == 'apt':
            # We only care about installs (not uninstalls)
            if result._task_fields.get('args', {}).get('state') != 'absent':
                # Add the package name to list of managed packages
                package_name = result._task_fields.get('args', {}).get('name')
                if package_name:
                    # Build up a de-duplicated list of managed packages for each host
                    # List can be unordered, hence the use of sets
                    if h not in self.package_info:
                        self.package_info[h] = {
                            "manual": set(),
                            "managed": set()
                        }
                    # Handle multiple packages being installed
                    if isinstance(package_name, list):
                        self.package_info[h]["managed"].update(package_name)
                    # Handle single package installs
                    elif isinstance(package_name, str):
                        self.package_info[h]["managed"].add(package_name)
                else:
                    print('Got empty package name')

Now to handle the additional tasks we run to get manually installed packages. We can parse these from the package_check - get real manually installed packages task from section 2.5:

...
        # Filter for package installs
        if result._task_fields.get('name') == 'package_check - get real manually installed packages':
            packages = result._task_fields.get('args', {}).get('manual_packages', [])
            if h not in self.package_info:
                self.package_info[h] = {
                            "manual": set(),
                            "managed": set()
                        }

The callback plugin will now build up a data structure containing packages that are installed by the playbook, as well as manually installed packages for each successful play (i.e host).

3.5 Using the custom callback plugin

So now that we have a (hopefully) working plugin, how do we test it with our script?

First we must define the callback class within the script, directly after the imports. Then we must initialise the plugin:

# Initialise our custom callback
results_callback = CallbackModule()

Then, we have to ensure that the PlaybookExecutor (used to run the target playbook - see 2.3) uses this plugin:

# create the playbook executor, which manages running the plays via a task queue manager
pbex = PlaybookExecutor(playbooks=context.CLIARGS['args'],
                        inventory=inventory,
                        variable_manager=variable_manager, loader=loader,
                        passwords=passwords)
# Ensure the playbook executor uses our custom callback plugin by setting the stdout_callback option within the internal TaskQueueManager Object
pbex._tqm._stdout_callback = results_callback
results = pbex.run()

We also have to ensure that when we execute the plays for manually installed package checks (see 2.4), we also use this plugin by passing the stdout_callback option to the TaskQueueManager:

...
    tqm = TaskQueueManager(
              inventory=inventory,
              variable_manager=variable_manager,
              loader=loader,
              passwords=passwords,
              stdout_callback=results_callback,  # Use our custom callback
          )
...

The final step is to actually print out the summary of unmanaged packages being built up in the callback plugin:

# Print banner title
results_callback._display.banner("UNMANAGED PACKAGE LIST")
# Print summary of results per host
for hostname, value in results_callback.package_info.items():
    # Print hostname
    print("%s:" % stringc(hostname, "green"))
    # Print unmanaged packages
    for p in self.package_info[hostname]['manual'] - self.package_info[hostname]['managed']:
        print("  - %s" % p)

If we run this, we should finally get what we are after:

$ python3 check_packages.py -i hosts site.yml --tags package --ask-become-pass -v
BECOME password:

PLAY [localhost] *******************************************************

TASK [Gathering Facts] *************************************************
[WARNING]: Failure using method (v2_runner_on_start) in callback plugin (<__main__.CallbackModule object at 0x7f9654d9e9e8>): 'show_per_host_start'

TASK [Install tmux] ***************************************************

TASK [Install latest version of docker] ********************************

TASK [firefox : ubuntu | ensure packages are installed] ***************
changed: [localhost] => (item=['firefox', 'firefox-locale-en', 'firefox-locale-es'])

TASK [install git] ****************************************************

TASK [misc : Install useful packages] *********************************

TASK [os : Install unattended-upgrades] *******************************

TASK [python : Install Python] ****************************************

TASK [install steam] **************************************************

TASK [themes : install gnome tweaks] **********************************

TASK [vim : install VIM] **********************************************

TASK [vlc : Install VLC and Video Codecs] *****************************

PLAY RECAP ************************************************************
localhost                  : ok=12   changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0


PLAY [localhost] ******************************************************

TASK [Gathering Facts] ************************************************

TASK [package_check - get ubuntu version] *****************************

TASK [package_check - get ubuntu manifest packages] *******************

TASK [package_check - get manually installed packages] ****************

TASK [package_check - get real manually installed packages] ***********

UNMANAGED PACKAGE LIST ************************************************
localhost:
  - gtk-im-libthai
  - build-essential
  - liblvm2app2.2
  - libpinyin-data
  - libdebian-installer4
  - apt-transport-https
  - chromium-browser
  - shellcheck
  - bcmwl-kernel-source
  - libpython-stdlib
  - libxkbcommon-x11-0
  - grub-efi-amd64-signed
  - linux-signed-generic
  - libtimezonemap1
  - libgtk2.0-0
  - libido3-0.1-0
  - libchewing3
  - jekyll
  - python-apt
  - zlib1g-dev
  - ruby
  - hugo
  - libdevmapper-event1.02.1
  - libatkmm-1.6-1v5
  - libgail18
  - linux-headers-generic
  - sqlite3
  - gconf2
  - libglibmm-2.4-1v5
  - google-chrome-stable
  - libm17n-0
  - libpangomm-1.4-1v5
  - packer
  - icaclient
  - libsigc++-2.0-0v5
  - liblvm2cmd2.02
  - python-ldb
  - steam:i386
  - conky
  - python-pip
  - shim-signed
  - libhangul1
  - libotf0
  - virtualenv
  - libdbusmenu-gtk4
  - libpinyin13
  - gir1.2-xkl-1.0
  - net-tools
  - libcairomm-1.0-1v5
  - code
  - libreadline5
  - libgail-common
  - speedtest-cli
  - libgtkmm-2.4-1v5
  - libllvm6.0

3.6 Using Verbosity to Control Output

We now have a working script, but I think we can improve it. We are only really interested in the summary of packages, and reports of any failures or unreachable hosts. It would be nice if we can prevent everything else from being displayed unless the user provides the verbose flag.

Handily, there is a global Display() object available through the CallbackBase object which provides a verbosity flag based on the args in the current context. So we can override the callback methods responsible for printing lines we are not interested in and get them to do nothing unless this flag is set:

    # Callback for displaying stats at the end of a playbook run
    # Prints: PLAY RECAP
    def v2_playbook_on_stats(self, stats):
        if self._display.verbosity:
            super(CallbackModule, self).v2_playbook_on_stats(stats)

    # Callback for when a task starts
    # Prints: TASK [install steam] **********************
    def v2_playbook_on_task_start(self, task, is_conditional):
        if self._display.verbosity:
            super(CallbackModule, self).v2_playbook_on_task_start(task, is_conditional)

    # Callback for when a task item completes successfully
    # Prints: changed: [localhost] => (item=['firefox', 'firefox-locale-en', 'firefox-locale-es'])
    def v2_runner_item_on_ok(self, result):
        if self._display.verbosity:
            super(CallbackModule, self).v2_runner_item_on_ok(result)

3.7 Restricting Tasks to Successful Hosts

There is one last loose end to tie up before we can call it a day. You may remember that back in section 2.4 we were using the directive hosts: all to run the tasks associated with manual installs. This is not ideal because we only want these tasks to run on the hosts which the target playbook successfully installs packages on. We want to ignore any hosts that have been skipped, or are unreachable. We have already built up such a list of hosts in the callback module object. So we can use this ro restrict the hosts on which the tasks are run on:

# Get list of hosts that the playbook successfully ran install tasks on
hosts = list(results_callback.package_info.keys())

play_source = {
  "hosts": hosts,
  "gather_facts": true,
  "become": false,
  "tasks": [
    ...

And voila! We have finally achieved the mildly useful. You can find the complete script here. If you’ve made it this far, I hope you’ve learned a bit about the guts of Ansible along the way. I know I have.