This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SAM 6.3- Upgrade- Agent Update- Analysis

Hello All,

Just wanted to share my experience regarding SAM windows agent, So that whoever is planning to do an upgrade will be extra cautious!! 

Today I have upgraded NPM 12.01 and SAM 6.3 @ one of my customers site during non Business hours early in the morning. As usual upgrade went smooth without any issues for NPM, SAM! Was excited to see HA settings and new Linux agent. emoticons_happy.png emoticons_happy.png

Once the web console was up, The real problem started... Around 350 agents started updating at a time together, It took almost 4 hours for all the 350 agents status to turn to ok from Update in progress.. I thought I was relieved but there was a different problem I started getting complaints that E-mail alerts were not having the proper information.

The CPU on SERVER12345 is currently running at 98 %. The top 10 processes running at the time of this poll are listed below:

Unable to get list of processes - Unable to schedule job on agent node 827 - required APM plugin is missing or not installed properly

So I understood that there was a problem with agent plugin and when I checked the agents, Though the agent status was OK the plugin status was either Pending or Approved state..

Agent Plugin Status

Few were in pending status few were in approved status

pastedImage_19.png

Around 50 agents the plugin was installed properly and rest were stuck..  Then I started noticing there was a latency issue. When I checked the interface utilization I was shocked to see it has grown up more than 100 % from the time, The Agent started updating emoticons_sad.png  I had a hope these agent plugins will be installed without any issues, So I was just waiting for an hour and slowly each agents started Updating the plugins properly. For all the agents to complete the agent & Plugin update it nearly took 5 hours in this site because these servers were in different domain, Different network all together.

So if you plan for agent updates,  Please consider these points,

  1. Perform in non business hours,
  2. update the agents in batches if you have a poor network connectivity
  3. Turn off the Allow automatic agent updates temporarily before starting the upgrade. pastedImage_15.png
  4. Have adequate time depending on the number of agents. \
  5. Turn of the alert if you have auto ticketing enabled for CPU,Memory Stats.

Once the plugin was installed properly, My network came back to normal state and SAM started rocking !! emoticons_grin.png  Share your thoughts and cases, might be useful for someone who is planning to upgrade....

  • Thank you for the analysis. What do you recommend for a site with 30 agents on hi speed links?

  • I would recommend against disabling automatic agent updating as previous versions are not compatible with the latest. Those agents will essentially be in a non-functional state until such time as they are updated to support the new version of Orion. Updating those agents would then be a tedious manual process.

    As a general rule I do agree with your statement about performing upgrades after hours. This advice is good not just for SolarWinds software, but really any major upgrade your performing.

    Agents upgrades run ten parallel threads, meaning that only 10 agents should be upgrading at the same time. As soon as one finishes, another one will start until all agents have been successfully upgraded. In environments where WAN bandwidth is sparse, the number of parallel threads can be reduced to limit the potential for link saturation. This will of course lengthen the amount of time it takes all agents to upgrade so be sure to take that into account.

    • Open c:\Program Files (x86)\SolarWinds\Orion\AgentManagement\SolarWinds.AgentManagement.ServiceCore.dll.config
    • Find <agentManagementServiceConfiguration … /> around line 10
    • Add: maxParallelPluginUploads=”3”
      • to get <agentManagementServiceConfiguration maxParallelPluginUploads=”3” … />
      • where “…” are any existing attributes that were there before change
    • Save changes
    • Restart “Solarwinds Orion Module Engine”

    This limits number of parallel plugin uploads to 3. Pick any number if 3 is not good enough. Just don’t use “0”. emoticons_happy.png

    Alternatively you can disable agent automatic updating entirely and rely on Group Policy or another form of package distribution like SolarWinds Patch Manager to upgrade the agent software if those are faster or more reliable.

  • squinsey will have more info as he mentioned the following to me but I believe the agent auto-update size has also increased massively in this version because it has all of the features enabled by default (NetPath, etc). So rather than a 25 MB or so installer, it's 100 MB? Looking at the manual installer on my system, it is 23.7 MB so not sure if an issue with the other system or the auto-update.

  • Thanks aLTeReGo​ , I'll drop ours to 5 and see how it goes in future.

    I assume this dll.config is updated in each revision? Can we get the value to be user configurable via a DB entry emoticons_happy.png

  • Ours are currently around 80mb each.

    All "Agent initiated" display the NetPath Plugin not being installed correctly / requires upgrade.

    It's confirmed the Netpath plugin issue is being addressed, and hopefully an update is being released soon.

  • Hi aLTeReGo,

    Thanks, Yes I am aware that it agent upgrades only 10 parallel threads, But as I mentioned all the agents were updating together all at a time and it was completely stuck in updating the agent plugins, I had no choice waiting for those to complete by its own. So that is the reason I have mentioned to turn off the automatic agent update.

    Please see the chart below during the agent upgrade process,Do you think its Normal.. emoticons_sad.png emoticons_sad.png  if its updating 10 agents at a time?

    Peak During Agent Upgrade.....

    pastedImage_2.png

    Normal Trend...

    pastedImage_5.png

  • A 10MB jump in bandwidth utilization across 10 agents equates to 1Mbps download rate per-individual agent. That is neither unusual or unreasonable. Again, to reduce this further you will want to consider limiting the number of agents that upgrade in parallel which will extend the duration it takes to upgrade all agents, but will limit the impact the upgrade process has to your available WAN bandwidth.

    The 1.5 version of Agent dependencies was unusually large and should not be a frequent occurrence with future upgrades. What was likely consuming the majority of the bandwidth for that duration of time is the updated version of the .NET Framework the Agent is dependant upon. If the monitored endpoints were receiving regular Windows Updates then this dependency should have already been satisfied and it would not be necessary for the agents to download it from the Orion server itself. The Agent self-satisfies all its own dependencies and does not download them at all if those dependencies already exist on the managed endpoint. 

  • NetPath plugin issue seems different. Do you mind sending us the Orion Server and Agent (the one has the install warning) Diags? I will PM the link. Thanks!

  • Does this limitation apply to auto-updates only or does it also limit the number of parallel updates from the Manage Agents screen?

    Also, is this system wide or is it on a per poller basis? I have seen that the update is pushed from the agent's poller.

  • mlandman  wrote:

    Does this limitation apply to auto-updates only or does it also limit the number of parallel updates from the Manage Agents screen?

    It applies to both

    Also, is this system wide or is it on a per poller basis? I have seen that the update is pushed from the agent's poller.

    It's specific to each polling engine.