Resolution for Ansible Tower 3.3 - Job Template hang: "Resource is being used by running jobs"

Joshua Andrews
by Joshua Andrews

Categories

Earlier this year, several SovLabs clients reported an issue after upgrading to Ansible Tower 3.3 (see Ansible release notes here). Resolution for the error “deleting a machine will fail with the error “Resource is being used by running jobs” if there are playbooks running against that inventory.” Clients encountered this error after upgrading Ansible Tower to version 3.3. We have confirmed that Ansible Tower 3.4 still exhibits this same functionality.  SovLabs release 2019.5 GA includes a resolution for this issue. See SovLabs release 2019.5 GA Release Notes here.

 

The Issue As Reported by SovLabs Customers

Several SovLabs customers reported that Ansible Tower Job Template workflows were hanging when deprovisioning a machine at the machineDisposing stage.  Some digging revealed that the Ansible Tower API was returning the error message “Resource is being used by running jobs” when users were utilizing the GUI or API to delete a machine if a playbook was running against the inventory. Interestingly, any playbook running against the inventory - even one that only slept - caused the issue to occur.

error - using GUI to delete machine from inventory with playbook running

The screenshot above shows the error that occurs when using the GUI to delete a machine from an inventory with a playbook running.

API to delete a machine from a running inventory

The screenshot above shows using the API to delete a machine from a running inventory.

Note that we see the issue only when using static inventory.  We do not see this behavior when using Dynamic Inventory. While we immediately rolled out a hotfix (2018.3.5 release notes) to help prevent any race condition between vRA removing the VM before Tower, and we contacted Red Hat to determine the root cause.


New, Expected Ansible Behavior

We worked with Red Hat to determine that this behavior was a new, expected behavior with Ansible Tower 3.3 for static inventories.  To combat 500 errors thrown when a host is removed from a running inventory, the API now returns a 409 error and prevents removing a host when then the inventory holding that host is busy.  Tower cannot currently determine if a current or scheduled PLAY involves the hosts being deleted. Allowing a delete to occur when a PLAY is running or scheduled against the host would result in an unrecoverable state.

We worked in conjunction with Red Hat to create a method of disabling hosts in inventory so that new playbooks cannot be scheduled against disabled hosts, and we leveraged this method into the SovLabs solution.


The SovLabs Solution

We have enhanced the existing SovLabs Ansible Tower Module to account for this new behavior in Tower.  Specifically, when this issue occurs, we inventory the VMs that need to be removed. After running our business logic for Ansible Tower, we check that inventory and attempt to remove any hosts that are listed. We also have a scheduled task that checks the inventory every 10 minutes and attempts to remove any hosts that are listed.  

We will continue to work with RedHat to incorporate any changes they make to resolve this in the Ansible Tower.  

If you have upgraded to Ansible Tower v3.3, and have experienced this new issue, please open a ticket with our Customer Success Team.

The first step in providing reliable self-service automation for your CMP is starting a free trial.

Free Trial
Wave One Wave One
Wave Two Wave Two