cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 10

Linux Agent - not running Application monitor bash script correctly post 2020.2 upgrade

Hey everyone!

Before I opened a support ticket, I wanted to see if the community had observed anything like I'm seeing now with the application monitors.

Base state:

We have an application monitor that is watching an app running in a docker container. As part of that, I found a bash script on here that monitors the "docker info" output - specifically containers in started, stopped, paused status, and the total # of containers on the OS. It's a very simple script:

#! /bin/bash
JSON="$(/usr/bin/docker info --format '{{json .}}')"
echo $JSON| awk -F ','  '{print $3}' | awk -F ':'  '{print "Message.running:"$1,"\nStatistic.running: "$2}'
echo $JSON| awk -F ','  '{print $4}' | awk -F ':'  '{print "Message.paused:"$1,"\nStatistic.paused: "$2}'
echo $JSON| awk -F ','  '{print $5}' | awk -F ':'  '{print "Message.stopped:"$1,"\nStatistic.stopped: "$2}'
echo $JSON| awk -F ','  '{print $6}' | awk -F ':'  '{print "Message.images:"$1,"\nStatistic.images: "$2}'

This code is executed with "sudo" because docker needs elevated permissions to return this data.

What's broken now

As of 11AM, we are getting reports that the script is failing with a down status. This time happens to coincide with when we upgraded to 2020.2, and upgraded our agents on our machines.

Checking the agent logs, I see this:

20/07/01 11:01:03.136 PID: 23274 TID: 140327164147520 [INFO] scriptrunner - Execution of command (subprocess method): sudo /bin/bash /tmp/APM_p3g4v6cp 
20/07/01 11:01:05.108 PID: 23274 TID: 140327164147520 [WARNING] scriptrunner - Script error output:
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

sudo: no tty present and no askpass program specified
20/07/01 11:01:05.110 PID: 23274 TID: 140327164147520 [WARNING] LinuxScriptProbe - Script output does not contain expected field. Value for 'paused' is missing.
20/07/01 11:01:05.110 PID: 23274 TID: 140327164147520 [WARNING] LinuxScriptProbe - Script output does not contain expected field. Value for 'running' is missing.
20/07/01 11:01:05.110 PID: 23274 TID: 140327164147520 [WARNING] LinuxScriptProbe - Script output does not contain expected field. Value for 'images' is missing.
20/07/01 11:01:05.110 PID: 23274 TID: 140327164147520 [WARNING] LinuxScriptProbe - Script output does not contain expected field. Value for 'stopped' is missing.

Now, I recognize that text block as our "first time sudo" control string. It fires off the first time we do anything in our linux environments, before having us provide our password. So it's strange that it is doing it now.

I checked back in the logs, and saw that before the update, I got the correct data (this is the run right before the above snippet):

20/07/01 10:56:03.259 PID: 22327 TID: 140420145358656 [WARNING] scriptrunner - Script error output:
Message.running:"ContainersRunning" 
Statistic.running: 1
Message.paused:"ContainersPaused" 
Statistic.paused: 0
Message.stopped:"ContainersStopped" 
Statistic.stopped: 0
Message.images:"Images" 
Statistic.images: 3

 

So that very much indicates something changed with the upgrade. And according to my linux sysadmin, something did:

 

Jul  1 10:56:03 atnexus01a sudo:    root : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/bin/bash /tmp/APM_dzazuspg
Jul  1 10:56:03 atnexus01a sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Jul  1 10:56:03 atnexus01a sudo: pam_unix(sudo:session): session closed for user root
Jul  1 11:01:03 atnexus01a sudo: pam_unix(sudo:auth): conversation failed
Jul  1 11:01:03 atnexus01a sudo: pam_unix(sudo:auth): auth could not identify password for [swiagent]
Jul  1 11:01:03 atnexus01a sudo: pam_succeed_if(sudo:auth): requirement "uid >= 1000" not met by user "swiagent"
Jul  1 11:01:05 atnexus01a sudo: swiagent : user NOT in sudoers ; TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/bin/bash /tmp/APM_p3g4v6cp
Jul  1 11:02:00 atnexus01a sudo: pam_unix(sudo:auth): conversation failed
Jul  1 11:02:00 atnexus01a sudo: pam_unix(sudo:auth): auth could not identify password for [swiagent]
Jul  1 11:02:00 atnexus01a sudo: pam_succeed_if(sudo:auth): requirement "uid >= 1000" not met by user "swiagent"
Jul  1 11:02:02 atnexus01a sudo: swiagent : user NOT in sudoers ; TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/bin/bash /tmp/APM_s25ye73g

 

So according to this, before the upgrade, the agent was running the script as "root." After the upgrade, it started using "swiagent" - an account I didn't even realize was there. Further, it's not part of wheel, or otherwise set up to have elevated permissions.

 

$ sudo lslogins swiagent
[sudo] password for [REDACTED]:
Username:                           swiagent
UID:                                998
Gecos field:
Home directory:                     /opt/SolarWinds/Agent
Shell:                              /sbin/nologin
No login:                           yes
Password is locked:                 no
Password not required:              no
Login by password disabled:         yes
Primary group:                      swiagent
GID:                                994
Hushed:                             no
Password changed:                   2019-Dec03
Running processes:                  1

Last logs:
15:37:44 sudo[43968]: pam_unix(sudo:auth): conversation failed
15:37:44 sudo[43968]: pam_unix(sudo:auth): auth could not identify password for [swiagent]
15:37:44 sudo[43968]: pam_succeed_if(sudo:auth): requirement "uid >= 1000" not met by user "swiagent"

 

So what do I need help with?

Well, a lot.. but in this case:

The immediate fix would be to set up swiagent such that it is part of the wheel permissions and can run sudo. But we'd have to make this change on every linux server going forward.

I tried setting the agent to use different credentials (we have a primary monitor credential that the node uses to talk to the system), but that didn't appear to do anything.

Long term, though, it would be best if we set this up according to proper and best practices. Specifically, we'd like it to authenticate and perform what it needs to. Ideally we'd use the admin permissions... something we can control so that each department can keep control over elevated permissions on their boxes.

Has anyone seen anything like this? Did I miss something in the 2020.2 release notes?

0 Kudos
0 Replies