quote:Originally posted by Network_GuruIn this situation, the remote polling engine has its own SQL server. Customers then will remotely roll-up the data they need via queries to the remote SQL server.
quote:When you say "Customers then will remotely roll-up the data"
quote:Originally posted by Network_GuruDepends on the bandwidth & latency of your WAN link.Generally it is not a good idea:
Well it appears I am in this exact situation now.The DB will be separated from the secondary poller via an OC12 WAN link with 40ms latency.
Does anyone have any experience with this?What is the performance like ?Any issues with the DB or other services timing out?
Our secondary poller is across a WAN link and before we upgraded our WAN we had a latency of 40 - 50ms and didn't have a single problem with any timeouts.
It will also depend on how frequently your polling and gathering stats.
Jon
Hi Jon,
Ever since moving the DB to another site with the Primary polling engine/web server, my secondary polling engine has 1 - 2 hour gaps in any SNMP collected data.ICMP stats are fine & are collected & written to the DB every 5 minutes.Of course Solarwinds Tier 3 support was no help & used the standard disclaimer - "we do not support remote polling engines separated from the DB".
I'm sure there is a problem with the install, or perhaps something else I have missed.Documentation on setting up a secondary polling engine is sparse, to say the least...
Any words of advice would be greatly appreciated.
My GUESS at what Don? was saying was that your DB and poller would be at the remote location and do everything locally to that site. You would then use some type of reporting tool or SQL Query to gather the data you need from the remote sites (and merge them with local data queries) to populate your reports. You would have to be pretty fluent in SQL querying to accomplish this. This is what I will be doing within the next 6 months and will try to post on successes/issues.
Larry
Hi Larry:
I think many of us will be interested in your successes/issues. In our environment all data collection is centralize, so do not see any need for us to use distributed polling.
However, our concern is a disaster recovery perspective--losing the building that have our SQL cluster and production Orion poller.
G'luck
Dale
NG....I'm in the same boat as you. We have 2 pollers and the main SQL server in our mid west data center (A Pollers). I have 2 remote pollers in my DR data center (B pollers) on the east coast. The data centers are connected via P2P connections with a latency of 35ms. My graphs for the B pollers is horrible. Most of the 24 hour graphs are a bunch of dots...no real lines. My graphs for the A sites are much better but I do have some gaps over the day.When I designed this I talked to SW and even though they too said not really a good idea they felt the latency was good enough. I'm too the point where I may have to move all polling off the B pollers to the A pollers. I really don't want to do this...and I want to exhaust all avenues before jumping ship.My setup has all polling done every 120 seconds with statistic polling done every 5 minutes. Maybe dropping the polling for status back to 5 minutes would help?? My A pollers have 3500 & 2700 elements respectively and my B pollers have 1000 elements a piece. I ran the poller tuning exe and the pollers are tuned of more then Orion recommends...not by much. The servers themselves are fine...not issues. The separate MS SQL server looks ok except the physical memory utilization (reported by Orion) is 96% with the CPU running between 20-40%.Any other suggestions? Thanks.BB
Here's a little more history on my setup & issues Bryan;
I started with a single all-in-one server SLX V8.1 installation in Toronto - 7000+ elements.This server was maxxed out, but usable - no gaps in charts, just slow loading pages.I built a new DB server in Edmonton - Win2k3 x64 4-way 8Gb Ram MS SQL2005 SE.I also built a new Orion Web & Polling server next to DB server in Edmonton.
I migrated the 5Gb DB from Toronto to Edmonton, uninstalled SLX8.1 & installed SLX8.1 poller on Toronto server.I then installed full blown SLX8.1 on new server in Edmonton - migrated over 2000 elements onto new server.The polling on this new server is more agressive than the old, now remote poller, & has no issues whatsoever.
The latency from DB to remote poller is 45ms.The bandwidth is currently 45Mbps, but soon to be OC12.Polling: 789 Nodes, 3958 Interfaces, 76 Volumes, 4823 Total ElementsAverage Polling intervals - 5 minutes - ICMP & CPU & Memory, 15 minutes for Interface stats
There are no gaps in the ICMP charts at all, only the SNMP graphs have gaps - 1 to 2 hours apart.CPU utilization for the OrionNetPerfMon service has dropped dramatically from ~ 20% to 2% average.Only on a restart of the service do I see the CPU hit 25% & then settle back down to 2% after 5-10 mins.I suspect the polling engine is not actually polling at the intervals set in System Manager.
Some troubleshooting I've done so far:Netflow shows average bandwidth utilization for SQL traffic (TCP-1433) between remote poller & DB at 300kbps.Changed the Packet Size= setting in SWNetPerfMon.db file from 4096 to 8192.I installed Wireshark on my remote poller & monitored SNMP & ICMP traffic to a single node.ICMP is sent every 5 minutes as per setting in Sysmanager, but SNMP is polled at sporatic intervals.
Synopsis:The gaps are due to the remote poller not polling at set intervals as specified in the System Manager settings.What I found in the sniffer capture is the poller must poll the DB server for the current data in the table/element it needs to poll.It then polls the required NodeID & InterfaceID OID and then writes the data to the DB server.I suspect it does not receive the current data from the DB server in a timely fashion, so it just does not poll the OID & you end up with missing polls.One strange thing - after changing this to a remote poller only & removing 2000 elements & then re-running the polls per second tuning utility, it recommended settings which were double that of the original SLX install settings.Very strange indeed. After uninstalling this app & running a repair on the polling install & re-installing the polls-per-second tuner app, it now recommends a more "normal" setting.
Further steps to resolve this issue:The remote poller is running Win2k3 SP1 - upgrade to SP2 tonight, as the other 2 servers are at SP2.Play with the polls-per-second tuning & double the settings.Do some more packet captures & compare to captured data from the Primary poller which is working correctly.QoS (tag) the SQL traffic between the remote poller & DB.Upgrade WAN to OC12 (in 10 days).
I'm certain that this can be made to work (Jon Chill mentioned that he has it working with 40ms latency).