I have been managing one production-grade AWX instance, hosted on Kubernetes, which is executing a couple of hundred jobs every day. After a few months of usage, it was found that the disk where the persistent data of our Ansible AWX instance is located was running out of space.
The reason for this disk clogging was that there was no clean-up configured for AWX jobs history. Around 35K AWX jobs were accumulated since its inception. After doing some research online, I found the following solutions to this issue.
Solution 1# Enable AWX managing job for cleaning the job history
AWX UI comes with pre-configured management jobs including one for AWX job details cleanup. This job can be configured to keep the job history for a specific time period and clean up the rest. The retention period of AWX job history depends on the use case and varies on the environment. In our environment, keeping one month of job history is sufficient which can be stored within the provided storage capacity in our environment.
Sometimes, there is an issue with the Cleanup details job that is unable to free up the disk space. In order to get the space back, the following operation is needed on the Postgres database.
Solution#2 Postgres DB vacuuming to retrieve disk space
After inspecting the Postgres database, it was found that one specific table namely main_jobevent was around 16GB in size and still growing. This table stores the ansible logs for each job runs on the AWX. If there isn’t any need to keep the AWX job history at all, the Postgres database comes with a vacuum operation that needs to run on main_jobevent table.
SELECT pg_size_pretty( pg_total_relation_size('main_jobevent') ); # pg_size_pretty #---------------- # 16 GB # (1 row)
After vacuuming: Note: To return space to the OS, use
VACUUM FULL main_jobevent; SELECT pg_size_pretty( pg_total_relation_size('main_jobevent') ); # pg_size_pretty #---------------- # 4915 MB # (1 row)
Bonus tip# Backup dump by excluding jobevent table
Postgres database backup takes a snapshot of the running database which also includes the main_jobevent table. If there isn’t any need to backup AWX job history, this table can be excluded from the Postgres database dump. There is an option
pg_dump tool which excludes any given table from the dump by matching on a pattern (i.e. it allows wildcards):
pg_dump — clean — create -h $host -U $username -d $database -p $port -F custom — exclude-table-data=’main_jobevent_*’ | gzip > $BACKUP.gz
This exclude option reduced the backup size of our AWX size by 90%.
Writer // KUMORION BLOG //
Enthusiastic DevOps learner and sometimes like to write about his experiences for community awareness.