14 Replies Latest reply on Dec 3, 2009 3:51 PM by Congo

    Any problems running Orion on blade servers?

    warbird

      I am looking for general opinions on whether running polling engines on blade servers is a good idea or not?

        • Re: Any problems running Orion on blade servers?
          branfarm

          I have been running Orion NPM/NCM on HP Blades for 2 years now, and I have never had a problem. 

          One thing you will need to consider is how much space you need for a database, and/or if  your blade platform will offer fast enough disk IO for your environment.  Since I run on HP blades I only have two HD's available -- this hasn't been a problem for me since I poll < 1000 elements.  HP does offer storage blades that act as direct attached storage, but I haven't had a need.  Even if your database is massive, or you have an existing SAN infrastructure you'd like to use for your database, most (if not all) Blade vendors offer hardware to integrate into your environment -- iSCSI, FC, etc.

          If you do use HP blades, HP sells Fusion-IO drives (Branded as HP IO Accelerator) that offers 80GB, 160GB, and I think 320GB SSD storage in the HP mezzanine card format.  I've used the IO Accelerator for other applications and it works great.

          1 of 1 people found this helpful
            • Re: Any problems running Orion on blade servers?
              warbird

              No worries there.  I already have a very large Orion implementation and am looking to install a 3rd polling engine.  My boss asked about the possibility of moving to blade servers for just the polling engines, hence the question.  I have a stand alone server for the db, with the actual db living on a very speedy SAN.

              1 of 1 people found this helpful
            • Re: Any problems running Orion on blade servers?
              jtimes

              I have a large Orion deployment:  Orion Server, HSB engine, 9 Polling Engines, and a monster DB server all are on blade servers.  Been running there for the last three years.  No issues

              All are HP Blade Servers - E5450 (2HTx3.00 GHz) with 16 GB RAM; same hardware for the DB, but 32 GB RAM
              The NICs are teamed and set to fault-tolerant.  Nothing fancy ;) 

              1 of 1 people found this helpful
                • Re: Any problems running Orion on blade servers?
                  kiwi

                  Hi jtimes,

                  I am quite interested in knowing more about the sizing of your setup with blade servers. I am hesitating whether to go fot HP blades instead of HP Proliant DL360 which supports upto 6 disk spindles for the DB server.

                  My setup is 6000 elements with 50 NTA (NetfLow) interfaces and default polling intervales. Orion NPM/NTA Server and DB server.

                  What are the exact specs, especially Disks of your blade DB server ? how many disks ? RAID 1+0 ? Disks with accelerator or not ? what model/type of disks used for the DB ? are disk I/O Ok ?

                  Thanks

                  • Re: Any problems running Orion on blade servers?
                    byrona


                    I have a large Orion deployment:  Orion Server, HSB engine, 9 Polling Engines, and a monster DB server all are on blade servers.  Been running there for the last three years.  No issues

                    All are HP Blade Servers - E5450 (2HTx3.00 GHz) with 16 GB RAM; same hardware for the DB, but 32 GB RAM
                    The NICs are teamed and set to fault-tolerant.  Nothing fancy ;) 

                     



                    I am impressed and interested in your setup!

                    Can you tell us all how what your number of monitored elements breakdown looks like, what (if any) additional SW modules you are using, and what kind of stuff you are monitoring?

                    Thanks for sharing what you can!

                      • Re: Any problems running Orion on blade servers?
                        jtimes

                        The db server is a "standard" HP Blade with two physical drives.  The first drive is partitioned into two(1-OS, 1-Apps); the second drive only has NetPerfMon.db, the sys db files, and the page-file. The db's get backed-up daily.  NetPerfmon is around 50GB.
                        SQL2005 SP3 all tweaked out and 27GB memory dedicated to SQL.

                        Nightly maint takes roughly 45-60 minutes.
                        Retaining Detailed for 7
                        Retaining Hourly for 31
                        Retaining Daily for 90
                        Retaining Events for 32
                        *Retain Syslog for 32 days - not listed in swdebugMaintenance.log

                        number of monitored elements breakdown:
                        Network Elements 160771 Elements
                        Nodes 7300 Nodes
                        Interfaces 152480 Interfaces
                        Volumes 991 Volumes

                        Polling Intervals:
                        Check response time and status of each node every 240 seconds
                        Check the status of each interface every 160 seconds
                        Check status of each volume every 520 seconds
                        Rediscover each Node, Interface, and Volume every 60 minutes

                        Statistic:
                        Collect statistics from each Node every 10 minutes
                        Collect statistics from each Interface every 5 minutes
                        Collect statistics from each Volume every 15 minutes

                        I can't post the numbers of or each device type, but here are the various manufactures:  3Com, Adtran, Avaya, Cisco, F5 Labs, HP, IBM, net-snmp Linux, Nokia, Sun, Tandberg, Visual Networks, Windows 

                        Lots of UDP(s) for the F5's and Nokia's

                        200+ clients using Orion 24x7

                        Orion NPM is the only module ran

                        Still on 9.0 SP2, because all the customization re-work, oh and having to do each upgrade action 9 times...

                          • Re: Any problems running Orion on blade servers?
                            warbird

                            Your netperfmon db lives on a single drive?  As in, 1 spindle? During a shutdown/restart of all your engines, what is the avg disk queue length on that drive??  Are you running 64 bit SQL or 32 bit and utilizing AWE to be able to allocate the 27 GB of RAM to SQL?

                            • Re: Any problems running Orion on blade servers?
                              byrona

                              Thanks so much for this information.  Are the 200+ clients all accessing the Orion web interface?

                                • Re: Any problems running Orion on blade servers?
                                  kiwi

                                  Thank you very much jtimes for the details.

                                  Congo, could you please share with us your setup details, I mean same details as the one given by jtimes, especially the DB server, its disks, ... NTA ot APM only ?

                                  • Re: Any problems running Orion on blade servers?
                                    jtimes

                                    Yes its a SSD drive, and startup and shut down is... lets just say very intense.  during daily operations the average disk queue length is between 10 and 0.011.  I have seen very large spikes during monthly report generation, but nothing that would upset the  typical client experience in Orion. Honestly I don't dwell on on it (disk queue length) because the web and db performance is more than acceptable. 

                                    32 bit SQL, AWE is required to allocate that much RAM

                                    no, not all 200+ clients get on at the same time

                                  • Re: Any problems running Orion on blade servers?
                                    Congo

                                    Kiwi,

                                    I have an SLX machine running the website, APM, and  IP SLA. This is a 2x Quadcore w/ 16gb of ram. Bit of an overkill. It polls 3500 elements, ~40 IP SLA, 660 APMs, ~200 UnDPs, and a LOT of traps.

                                    Next I have a poller on hardware, 1x Quadcore w/ 8gb of ram. It polls 3 nodes, but totals 10,600 elements across those 3 nodes.

                                    Last poller is a virtual machine, we give it 2x 3.ghz cores and 2gb of ram. It polls 7070 elements.

                                    Our DB server is physical hardware. 2x Quadcore w/ 16gb of ram. 6x 146gb 15k rpm in (unfortunately) Raid 6. 64x OS and 64x SQL.

                                    For polling the default is 300s/300s/300s with a rediscovery of 1440 minutes. However, many devices and interfaces are more critical and set to 120s instead.

                                    Statistics are similar, 10m/9m/15m although many interfaces collect every minute.

                                    Retention is 35d/35d/365d/30d. I do the 35d retention on detailed statistics for monthly 95th percentile compositions.

                                    Manufacturers: Adtran, APC, Cisco, Dell, HP, Linux, Polycom, Vmware, Windows    to name a few.

                                    Fully patched across all systems for NPM, IP SLA, APM.

                                    We have approximately 100 users, and my APM would tell me that I have an average connections of 20.

                              • Re: Any problems running Orion on blade servers?
                                timf

                                We have been running pollers on both HP stand-alone servers, and HP blade servers with no issue.  Our SQL DB is on a HP blade server with SAN disk.  I have done LOTS of performance monitoring of the sql server hardware, especially disk I/O which is often where a SQL bottleneck truly is.  I've got our database running on several different LUNS to the SAN to optimize performance.  I've got the main sql install on one LUN, I've got another LUN for the main orionslx database file, I've got another LUN for the tempdb, another LUN for the transaction log, and a final LUN where i separated out the orionslx file groups for our netflow (orionslx1-4.mdf).   There was a HUGE speed increase when separating these netflow filegroups.   Each SAN LUN was set up as raid 1+0, and each one was represented by a different drive letter on the machine of course.

                                I was using perfmon and was looking at  current disk queue length, and %disk time.  All the additional LUNs created a very noticeable performance increase. 

                                Hope this helps.

                                  • Re: Any problems running Orion on blade servers?
                                    warbird

                                    timf, that helps a lot.  Thanks.  I was particularly curious about running pollers in a mixed environment of standalone servers and blades.

                                    Sounds like you went above and beyond with optimizing your db.  I moved my db to a large EVA SAN and using AWE for 32 bit SQL, bumped available memory to 6 GB.  Those 2 things are what did the trick for us, to remove the I/O bottleneck.  If we opt to run NTA again in the future, I may ping you and ask how you went about splitting out netflow data.  Sounds interesting.