Hello Thwack community! Long time lurker, first time poster.
My company is planning on how we will implement monitoring for a customer that has 5 sites. We are still very much in the beginning phases trying to decide on products and architecture. I am a big fan of Solarwinds and have worked with it for a few years but it’s always been 2 site active/active architecture I’m familiar with.
I’m having a hard time understanding how to fulfill this customer requirement “each of the 5 customer sites need to operate in isolation from one another”. All 5 sites are within a 245 mile area and they have nodes at each site that need to be monitored. I have some other coworkers recommending adding Nagios but I want to see if we can do it with just Solarwinds because Nagios is trash.
Example 4 sites go down. I’m thinking the customer means the definition of that last 1 site “operating while isolated” means:
1) Monitoring tool within minutes or seconds keeps recording (i.e. polling) monitoring metrics of any device still up during the disaster and keeps saving it to a storage (i.e. database). What if the Solarwinds database was at one of those 4 sites that went down? Does this mean we need a database at each site? If we get if we get 4x high availability licenses will that come with 4x additional polled engines that will remain dormant until a disaster?
2) All staff located at the last site up can still login monitoring console to view what is happening in the environment? If we get 4x high availability licenses will that come with the ability to view monitoring console if 4 out of 5 sites go down? I read documentation that says “It does not protect your databases or your additional web servers” but the word additional confuses me.
We want these modules NPM, NTA, NCM, UDT, NTM, SAM but we can get away with only NPM and SAM being available at all 5 sites if need be.
About 700 nodes when you add up all 5 sides
About 9000 elements