Template Job or Scheduled Job Fails - Creating Events Dirs Permission Error

Mindwatering Incorporated

Author: Tripp W Black

Created: 10/30/2024 at 05:00 PM

Category:
Linux
RH AAP

Issue:
After reboot, Ansible Automation Platform (AAP) fails to run some template jobs (on demand) or scheduled jobs with error:
Error: creating events dirs: mkdir /run/user/<awxuseraccountid>: permission denied

The issue usually displays after every or some reboots of AAP 2.x (any point release).

View log of error:
1. Login
$ ssh myadminid@aapdev.mindwatering.net
<enter password>
$ sudo su -
<enter password>

2. View the failure w/in the user service:
[root@appdev ~]# systemctl status user@<awxuseraccountid>.service

Workaround:
1. Login to the controller appliance:
$ ssh myadminid@aapdev.mindwatering.net
<enter password>
$ sudo su -
<enter password>

2. Create the missing folder:
Note: Replace the <awxuseraccountid> with the actual user ID (e.g. 990) in the error message.
[root@appdev ~]# id -u awx
<note the awx user account id>
example: 990
[root@appdev ~]# mkdir /run/user/<awxuseraccountid>
example: mkdir /run/user/990
[root@appdev ~]# chown awx:awx /run/user/<awxuseraccountid>
[root@appdev ~]# chmod 700 /run/user/<awxuseraccountid>
[root@appdev ~]# systemctl restart user@<awxuseraccountid>.service

Red Hat Tech-note 7050672 indicates this "solution" will work until the next reboot. This is really a workaround.

Possible Solution:
Tech-note 7050672 gives two different possible solutions for two different reasons.

Option A. For workaround folder creation/user rights reason, enable persistent sessions across reboots:
[root@appdev ~]# id -u awx
<note the awx user account id>
example: 990
[root@appdev ~]# loginctl enable-linger <awxuseraccountid>
(or loginctl enable-linger 990)

Notes:
- The tech-note mentions loginctl enable-linger <awxuseraccountid>. According to logind.conf(5), adding enable-linger for a user overrides the KillUserProcesses=yes setting. By adding a user id to loginctl enable-linger, the processes for that user are excluded from being killed/cleaned up even if specified KillUserProcesses is yes. Tagging the user with enable-linger persists across reboots.
- To display if a specific user is enabled, enter the following command: # loginctl show-user <awxuseraccountid>.

Option B. If the error is caused by running out of user-defined file watch instances and/or file handles:
1. Determine if the inotify limit is the issue by looking at the awx user service log:
[root@appdev ~]# id -u awx
<note the awx user account id>
example: 990
[root@appdev ~]# systemctl status user@<awxuseraccountid>.service
<read result>

If the issue relates to open file handles, and the number of those open files were exceeded while awx was processing executions, the error will display in this status, and/or its service log.
example errors:
Failed to create timezone change event source: Too many open files
Failed to allocate manager object: Too many open files
...
user@<awxuseraccountid>.service: Failed with result 'protocol'.

2. Temporarily update to confirm increasing helps with:
[root@appdev ~]# sysctl -w fs.inotify.max_user_instances=1024
[root@appdev ~]# sysctl -w fs.inotify.max_user_watches=242696

3. Permanently update to persist across reboots:
[root@appdev ~]# echo "fs.inotify.max_user_instances=1024" > /etc/sysctl.d/01-fd_notify_tweak.conf
[root@appdev ~]# echo "fs.inotify.max_user_watches=242696" > /etc/sysctl.d/01-fd_notify_tweak.conf
[root@appdev ~]# sysctl --system
[root@appdev ~]# systemctl restart user@<awxuseraccountid>.service
[root@appdev ~]# systemctl status user@<awxuseraccountid>.service

4. Verify change:
[root@appdev ~]# sysctl fs.notify
<verify the output is now 1024 and 242696 for this example>

Note:
- The technote specifies only instances rather than too many file watches.
- To keep this update persistent from upgrades, use a name that is unique and won't be overwritten by OS/package upgrades. For example: /etc/sysctl.d/01-fs_notify_tweak.conf, /etc/sysctl.d/appdevawxuser.conf, or /etc/sysctl.d/local.conf.
- Disabling a instance in the AAP Controller GUI does NOT stop scheduled jobs from running on an instance. The disable doesn't do what we would think it does.

4. (OPTIONAL) Restart the user service to confirm that the limit is high enough:
[root@appdev ~]# systemctl restart user@<awxuseraccountid>.service
<typically displays nothing for successful restart>
[root@appdev ~]# systemctl status user@<awxuseraccountid>.service
<verify status is okay>

5. View the current notify limit:
[root@appdev ~]# sysctl fs.inotify
<view results>

example:
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 1024
fs.inotify.max_user_watches = 242696

Keywords: inotify, sysctl.d, fs.notify.max.user_instances

previous page