This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Any problems running Orion on blade servers?

I am looking for general opinions on whether running polling engines on blade servers is a good idea or not?

  • I have been running Orion NPM/NCM on HP Blades for 2 years now, and I have never had a problem. 

    One thing you will need to consider is how much space you need for a database, and/or if  your blade platform will offer fast enough disk IO for your environment.  Since I run on HP blades I only have two HD's available -- this hasn't been a problem for me since I poll < 1000 elements.  HP does offer storage blades that act as direct attached storage, but I haven't had a need.  Even if your database is massive, or you have an existing SAN infrastructure you'd like to use for your database, most (if not all) Blade vendors offer hardware to integrate into your environment -- iSCSI, FC, etc.

    If you do use HP blades, HP sells Fusion-IO drives (Branded as HP IO Accelerator) that offers 80GB, 160GB, and I think 320GB SSD storage in the HP mezzanine card format.  I've used the IO Accelerator for other applications and it works great.

  • No worries there.  I already have a very large Orion implementation and am looking to install a 3rd polling engine.  My boss asked about the possibility of moving to blade servers for just the polling engines, hence the question.  I have a stand alone server for the db, with the actual db living on a very speedy SAN.

  • I have a large Orion deployment:  Orion Server, HSB engine, 9 Polling Engines, and a monster DB server all are on blade servers.  Been running there for the last three years.  No issues

    All are HP Blade Servers - E5450 (2HTx3.00 GHz) with 16 GB RAM; same hardware for the DB, but 32 GB RAM
    The NICs are teamed and set to fault-tolerant.  Nothing fancy ;) 

  • Hi jtimes,

    I am quite interested in knowing more about the sizing of your setup with blade servers. I am hesitating whether to go fot HP blades instead of HP Proliant DL360 which supports upto 6 disk spindles for the DB server.

    My setup is 6000 elements with 50 NTA (NetfLow) interfaces and default polling intervales. Orion NPM/NTA Server and DB server.

    What are the exact specs, especially Disks of your blade DB server ? how many disks ? RAID 1+0 ? Disks with accelerator or not ? what model/type of disks used for the DB ? are disk I/O Ok ?

    Thanks



  • I have a large Orion deployment:  Orion Server, HSB engine, 9 Polling Engines, and a monster DB server all are on blade servers.  Been running there for the last three years.  No issues

    All are HP Blade Servers - E5450 (2HTx3.00 GHz) with 16 GB RAM; same hardware for the DB, but 32 GB RAM
    The NICs are teamed and set to fault-tolerant.  Nothing fancy ;) 



    I am impressed and interested in your setup!

    Can you tell us all how what your number of monitored elements breakdown looks like, what (if any) additional SW modules you are using, and what kind of stuff you are monitoring?

    Thanks for sharing what you can!

  • The db server is a "standard" HP Blade with two physical drives.  The first drive is partitioned into two(1-OS, 1-Apps); the second drive only has NetPerfMon.db, the sys db files, and the page-file. The db's get backed-up daily.  NetPerfmon is around 50GB.
    SQL2005 SP3 all tweaked out and 27GB memory dedicated to SQL.

    Nightly maint takes roughly 45-60 minutes.
    Retaining Detailed for 7
    Retaining Hourly for 31
    Retaining Daily for 90
    Retaining Events for 32
    *Retain Syslog for 32 days - not listed in swdebugMaintenance.log

    number of monitored elements breakdown:
    Network Elements 160771 Elements
    Nodes 7300 Nodes
    Interfaces 152480 Interfaces
    Volumes 991 Volumes

    Polling Intervals:
    Check response time and status of each node every 240 seconds
    Check the status of each interface every 160 seconds
    Check status of each volume every 520 seconds
    Rediscover each Node, Interface, and Volume every 60 minutes

    Statistic:
    Collect statistics from each Node every 10 minutes
    Collect statistics from each Interface every 5 minutes
    Collect statistics from each Volume every 15 minutes

    I can't post the numbers of or each device type, but here are the various manufactures:  3Com, Adtran, Avaya, Cisco, F5 Labs, HP, IBM, net-snmp Linux, Nokia, Sun, Tandberg, Visual Networks, Windows 

    Lots of UDP(s) for the F5's and Nokia's

    200+ clients using Orion 24x7

    Orion NPM is the only module ran

    Still on 9.0 SP2, because all the customization re-work, oh and having to do each upgrade action 9 times...

  • Your netperfmon db lives on a single drive?  As in, 1 spindle? During a shutdown/restart of all your engines, what is the avg disk queue length on that drive??  Are you running 64 bit SQL or 32 bit and utilizing AWE to be able to allocate the 27 GB of RAM to SQL?

  • Im shocked also. One spindle for that tight of polling and that many elements? IS this one drive a Fusion IO ?  : )

  • Thanks so much for this information.  Are the 200+ clients all accessing the Orion web interface?

  • We have been running pollers on both HP stand-alone servers, and HP blade servers with no issue.  Our SQL DB is on a HP blade server with SAN disk.  I have done LOTS of performance monitoring of the sql server hardware, especially disk I/O which is often where a SQL bottleneck truly is.  I've got our database running on several different LUNS to the SAN to optimize performance.  I've got the main sql install on one LUN, I've got another LUN for the main orionslx database file, I've got another LUN for the tempdb, another LUN for the transaction log, and a final LUN where i separated out the orionslx file groups for our netflow (orionslx1-4.mdf).   There was a HUGE speed increase when separating these netflow filegroups.   Each SAN LUN was set up as raid 1+0, and each one was represented by a different drive letter on the machine of course.

    I was using perfmon and was looking at  current disk queue length, and %disk time.  All the additional LUNs created a very noticeable performance increase. 

    Hope this helps.