NasaGeek

Comments

  • We had a scheduled power outage in our data center this weekend, meaning our boxes would get a free power cycle. We brought them up last night and within 2 hours the IO went nuts. I saw this before leaving and recycled the SQL service on the SQL box and it is still running stable today. Avg Disk Que Length is holding at a…
  • 18 basic and 4 advanced. IOS change, we look for packet loss at certain areas, high utilization on others, cpu etc. Nothing that has caused us problems before. I can get you a more detailed list if you need. When I went to check in System Manager, the left plane blinks like it sometimes does, but this time it has not…
  • I'm not sure about Orion doing that, but you can easily do it with cron, Perl and Netsnmp, and if you wanted to graph it you could throw the results to MRTG. I do several things like this that you just can't do with Orion. Perhaps Orion will become more powerful and allow us to give it results that it can then graph. You…
  • I was not on SP5 for 9.5. So we did this, and then restarted all the services. The problem cleared after this. We also noticed that this error was associated with our Custom Pollers not polling. I suspect that the issue was a simple bug and the restarting of the services corrected it.
  • Not DNS either. This just happened recently and we have not made any changes to the system. I am opening a ticket. I will post the result from that when this is resolved.
  • We had the IO staying at 30. We tried all the usual stuff to bring it back, including starting and stopping the SQl service. The only thing that worked today was stopping and restarting the NPM service on the second polling engine.
  • If I understand correctly, this would also solve my issue with force monitoring interfaces not in an up/up status. Having to come to a seemingly quasi-supported forum to ask for future requests seems a bit unprofessional to me. I am about to start rocking the boat, not here obvioulsy. See my post in this forum "Forced…
  • This problem is not just isolated to 2k8. I have a 2k3 server that stopped sending email as soon as we upgraded to 9.5 SP5.
  • Thank you. It is a good feature that gives the users much more control. I do like the behavior to automatically not monitor "down" interfaces. The automation this provides is invaluable, but to add this control would make Orion that much more powerful, and would allow us to monitor interfaces like the Cisco SPAN ports.
  • No syslog and no traps.
  • Turning on Adaptive Read Ahead has brought the IO back to nominal levels. However, this begs the question, why is Orion performing so much more reading with 9.1 then 8.5?
  •  Yes we have SP1. Before this when the maintenance would hang, the ADQL was in the 40's.
  •  Put a sniffer on that interface to see what it is. However, I have never done this and could not tell you how to do it. Look here. wiki.wireshark.org/.../Loopback 
  • The one I was thinking of is the following, but this begs the question which one is used and if one has precedence over the other? The file is SWNetPerfMon.db, also in the root folder. And the line to modify is: ! Database Command timeout in seconds CommandTimeout=300 EDIT: Web.config is in the Inetpub Solarwinds root…
  • Controllers as in hard drive? Only one with two back planes. It is the PERC 5/i. No netflow, no other modules. 
  • The firewall is not running. netstat -na | find /i "17777" TCP 0.0.0.0:17777 0.0.0.0:0 LISTENING TCP 127.0.0.1:1789 127.0.0.1:17777 ESTABLISHED TCP 127.0.0.1:17777 127.0.0.1:1789 ESTABLISHED TCP our-solarwinds-box:1788 our-secondary-poller:17777 ESTABLISHED TCP our-solarwinds-box:2547 our-secondary-poller:17777 ESTABLISHED…
  • After looking at the issue again, I noticed that we were not losing any data points. I looked further into the disk IO and found that the high average count is related to reads, not writes as I first assumed. The write ADQL on this disk is .09, and this is why we are not losing data points. I do not have 'read ahead'…
  • We have had many issues with the polling engine and the database. One of them was very similar with our DB maintenance running all night (12 hours), and during this time losing essentially all data polled. We were able to get it down to .5-1 hours, and we lose minimal data. First let me complain about SW support. Most of…
  • Bounce. I am calling support today to ask for this feature and comment on the lack of response from the employees on this forum.
  • I found out that Juniper is going to fix this down/testing issue. That indeed it should not have been "expected behavior". However, we still have the Cisco SPAN ports that report as Up/Down. I have seen other posts where people would like this feature. Any response from a Solarwinds Employee would be appreciated. 
  • Thanks for you help.Yes, I have the 'all' selected. I have an almost similar alert for a custom poller rate that works fine with Numeric Status is Greater then.... but I was trying to get an alert anytime this counter moves. If the 'Has Changed' does not work with custom pollers, is there another tick to solve this? I'm…
  • I had the same issue of not being able to login. We are using NT domain logins for NPM. When you add NT domain users in NPM you have to include a fake password so NPM will accept the domain/user id. You use your domain id and password for NPM logins, but we found you must use the fake password when logging into Orion…
  • Support was unable to duplicate our issue. I suspect it has to do with differences in our server load. All federal government agencies are required to implement security benchmarks on their systems. If I am correct in my suspicions, this workaround may help others with similar benchmark requirements.…
  • 1.3.6.1.4.1.26866.1 - Gigamon GigavueMP http://www.gigamon.com/
  • Edit: Nevermind
  • Solarwinds still can't monitor Cisco SPAN ports. It would be nice to be able to force NPM to record these port's statistics. 
  • See if you are losing data during that time. We have to run our own Rebuild Index jobs to keep our databse clean. Everytime we run them we get the failover message. We don't get failover for the Solarwinds nightly maintenance. Since we don't lose any data so we just ignore it. I wish Solarwinds had chosen a different…
  • Cool, I did not know about this tool. I ran it and there was nothing in the AlertTestLog.csv. Good, if we see this again, I will run this to see if anything shows up. Thanks for the quick reply.
  • We saw this on hardware that was more than enough to handle the job. Support did not figure this out, but we traced the problem to some BSD devices we were polling. Once we removed these, the application became stable again. I have noticed that the performance is worse in 8.5.1 than it was with 8.1. It got better and now…