SolarWinds Orion Core 2010.2.0, APM 3.5, IPSLAMGR 3.5.1, NPM 10.1, NTA 3.7, IVIM 1.0.0 - UDT is waiting to be added.
I am trying to work through the scenario for a scheduled maintenance outage for our SolarWinds environment (1 x NPM/APM/IPSLA/NTA, 3 x Add Pollers, 1 x Web Console and 1 x SQL2K8 DB - all running W2K8R2) and I'd be interested to understand something about how other large sites approach Windows, MSSQL and SolarWinds update activities. I apppreciate that someof this may be included in the documentation but so far I'm not a big fan of SolarWinds documentation and sometimes find it bloated in areas and totally missing some things that I think are important. I also don't particularly like the way it is structured.
The questions I have include:
1. How do you structure your updates? ie. all Windows updates regularly? All the various SolarWinds application updates at the same time, or individual application updates at different times to separate and maybe reduce risk?
2. When do you normally do your updates? Business hours, after hours etc.....
3. Should all polling be stopped during maintenance windows?
4. What is the best way to prevent alerts during a maintenance window?
5. Any tips or tricks for the best way to recover a SolarWinds environment, or tips or tricks to prevent it needing to be recovered in the first place?
5. Any "bigger picture" standard tips or tricks for troubleshooting/fixing an environment undergoing updates? What logs to look at, what way to test functionality, any standard responses ie. un-install/re-install job engine, run configuration wizard etc. SolarWinds support may not always be available when needed and so some basic techniques may be very useful.
6. Any other important things that might be worth suggesting.
I'm fairly new at this. I've had limited exposure to doing SolarWinds updates. I would like to try and put some process around this activity and would appreciate any thoughts or suggestions that the community may have.