2 Replies Latest reply: Jun 7, 2012 9:12 AM by stephen.black RSS

How to correctly monitor HTTP content - Several different URLs with no dedicated servers.

stephen.black

Hope this does not come off too dumb but here is an issue that has been KILLING us.

Late last night I finally figured out the cause, and I believe I have found a solution but I wanted to get some input from the community before I start pushing out that change.

The effort will be quite significant to say the least.

So here is what I am trying to accomplish:

 

We have various web promotion pages that reside within our main URL.

Say our main is www.helloworld.com.

Currently we have the following promotion pages monitored.

www.helloworld.com/super

www.helloworld.com/special

www.helloworld.com/deals

www.helloworld.com/requests

www.helloworld.com/bonus

www.helloworld.com/hope

 

The main page resides behind an F5 and consists of a 4 server backend web env.

 

the main page was added as a node with a fixed IP address that will always be resolved to main VIP of the site..

Due to our F5 configuration I am unable to get to an IP for a webpage on each of the servers. That would be the ideal way to monitor but not possible per our net guys at this time.

So node monitor points to www.helloworld.com

I then added each promo page as a single HTTP content monitor to this parent node. I did not combine each URL into a single template with multiple HTTP content component monitors so I could easily unmanage each site when necessary.

This worked fine for several months. The site continued to grow in numbers until today where we have about 25 HTTP content monitors /sites all attached to the parent URL.

For the past few months we have been seeing bogus alarms for a single poll cycle just about every 20 minutes like clockwork. For months we have been digging through event logs and pref counters trying to find the cause. nada was for sure.

A month ago I set up identicle monitors from another data center across the country and we saw very similar if not exact behavior with the false alarms.

So last night I added something new. I took an external node monitor (google.com) and then added a single HTTP Content template to the google node.

I then assigned each of the current http content monitors to that node but all within a single application template.

Since adding the monitors this way the old monitors are still alarming about every 20 minutes as always but the new monitors are running rock solid with no alerts being recorded.

I know for sure the sites are online and stable due to another monitoring tool I have doublechecking. So it seems that how these monitors were added is the cause of our false alarms.

 

My question is this, what is the correct way to add a large number of HTTP content monitors to a VIP? using the successful manor makes unmanagement of individual pages a challenge and makes it impossible to schedule unmanagemetn in advance.

I checked the admin guide but was not able to find an example of this exact situation. I may be missing somethign basic here. Any suggestions would be most welcomed.

The image on the left is what I added that is working great.

The image on the right is the old monitors that are having false alarms at random.

 

working.jpgbroke.jpg

  • Re: How to correctly monitor HTTP content - Several different URLs with no dedicated servers.
    aLTeReGo

    I personally would recommend polling these as a single application since polling will occur serially. You may be finding that your F5 device is limiting the number of parallel page accesses coming from a single IP address.

    • Re: How to correctly monitor HTTP content - Several different URLs with no dedicated servers.
      stephen.black

      Thanks for the info Alterego. I also thought that may be a better way to monitor the URLs so I put out duplicate templates, one the way it is above and one with each URL added as a component. We saw the results were identical. We were seeing blips that turned red on a regular cycle of about 20 minutes each hour. The timing was so perfect everytime so we knew it had to be something deeper than just how we set the monitors up. After a great deal of work with the support folks at SolarWinds we were able to determine it was a bug in SAM 5.0 related to .net (or some magic stuff like that which I don't really understand). I have the patch in hand and am ugrading to SAM 5.0.1 tonight and will be applying the Buddy Drop fix. Hopefully this will address our issue. While reviewing the server with out internal support team we noticed the TCP offload was not set correctly, this was apparently causing our Orion web console to be very slow and hang at times. After fixing this setting we are seeing snappy user experiences and I am hopeful the patch tonight will address our HTTP Content issues.

       

      Thanks as always for your feedback and help, you are one of the true assets here in Thwack.

       

      Respectfully,

      Stephen