I've asked this question internally, and the answer is not that simple. Running in 64-bit actually adds overhead, and unless the program specifically needs the 64-bits, it's often actually slower in 64-bit than in 32-bit mode.
So the answer to your question is that it's not a priority except where is solves a specific problem. If we need a service to access more memory than is allowed by 32-bits, for instance, we might compile in 64-bit mode, but otherwise, it's not a priority.
It would seem to me, that all the Orion apps would benefit from memory footprint of a 64-bit OS. Perhaps that has nothing to do with the painful slowness of using Orion via the web interface, but it'd be nice to have more RAM and be able to use it without the sort of cheap trick of PAE.
From one of our architects:
Orion is fully supported on 64-bit Windows. You can put Orion on 2008 x64 and install as much RAM as you can carry. Physical Address Extensions may or may not be a “cheap trick,” but you definitely don’t need it to run Orion on a >4GB system.
I researched this extensively when I was having performance issues ~6 months ago. Correct me if I'm wrong but because Orion is a 32 bit application, even when it is run in 64 bit Windows, it will still have a maximum of only 2 GB of memory addressable. Right? It has something to do with the way 64 bit Windows emulates 32 bit mode for those apps that require it.
If that is correct, then the net gain is very small, as the OS would be able to run in other memory but Orion would still be limited to 2 GB. However, if they were to compile and release a 64 bit version of Orion, then it would be a whole new ballgame.
This is mostly correct. A 32-bit process is limited to a 2 GB address space even when running on a 64-bit OS. However, Orion is not just one process. It is many: NetPerfMonService can be using 2 GB at the same time that the website (w3wp.exe is the ASP.NET host process) is using 2 GB at the same time that the Job Scheduler, Job Engine, and each Job Engine Worker process (there can be many of these if necessary) are all each using their own 2 GB. And that's not even counting the Alerting Engine, Syslog, Traps, and especially SQL Server (which of course ought to be on its own box if you are even thinking about this question).
And even if it is running exclusively 32-bit processes, 64-bit Windows can productively use lots of extra RAM for disk caching. Don't underestimate that.
For some applications, > 2 GB address space is a key part of scalability. At this point, we don't believe that Orion needs it, especially considering that if a program doesn't actually need > 2 GB of address space, there is a performance penalty (not huge, but not neglible either) to using 64-bit pointers everywhere.
If you are hitting the 2 GB limit in any of your Orion processes, let's work on what needs to be tweaked to get it under control.
Thanks a lot for the in depth explanation!
Oh, and yes, in my case we have the SQL Server on its own dedicated system with the db itself backended on a very beefy SAN. I had run into the avg disk queue length problem on the db server and this solution has worked exceptionally well for us. Our SQL server is a 32 bit system and we were also having memory issues with it, so I implemented AWE (which is specific to MSSQL), allowing it to consume up to 6GB.
Those two things combined solved all of our hardware performance related issues. It has been blazing fast, since. :) I had always wondered why the Orion apps were not compiled for 64 bit. It is good to have a solid answer to that. So thanks again.
I have (inherited) an existing NPM 9.5 SP4 installation.
The system is:
NPM Server: Server 2003 SE, 3Ghz Xeon, 4GB RAM
SQL Server: Server 2003 SE, 3Ghz Xeon, 4GB RAM, on VMWare instance, with other SQL DBs besides NetPerfMon running.
2nd Poller: Server 2003 SE, 3Ghz Xeon, 4GB RAM
NPM, NTA, VOiP, Network Atlas, and APM running on the primary NPM server.
Network Elements 7575
Polling Engines 2
Licensed Component Monitors: 4172
There are 5855 elements on the primary NPM server, with the remaining 1720 on the 2nd Poller.
Performance is VERY poor. Initial page load with NPM is 30 seconds, minimum.
Navigating to the main NFT page from the NPM page is > 200 seconds.
I have the opportunity to reconfigure the entire system, so I want to place the SQL DB on it's own instance with 8-16GB RAM.
In addition, I think moving APM to the 2nd poller and increasing RAM to 8GB may help.
1) Should I increase RAM on the NPM server to 8GB? 16GB?
2) Would it be better to add a 3rd Poller license and server JUST for APM and load balance the Main and 2nd pollers?
3) What is the average disk queue length issue with SQL and how do I address it?
Thanks in advance...
Please confirm that all your servers are 64 bit Windows? I am running 32 bit Windows on all my servers; however, I think one of your problems has got to be the SQL server. How many hard drives does it have, what are their specs, and what sort of RAID are they configured for? Do the db's you mentioned live on that system or are they backended to a SAN of some sort?
Fire up perfmon (Windows tool) on your SQL server. Click on the average disk queue length counter at the bottom. The important number to watch is the average. They recommend the average remain below 2 * the number of physical hard drives. They also recommend RAID 10 and I highly recommend it as well, based on my experiences.
I believe they also recommend against VM's for the SQL server but someone else will have to confirm that one.
Also run perfmon on your primary polling engine and observe CPU/mem usage.
Given the number of elements you have, and depending on what numbers you find for the avg disk queue length on your SQL server, I recommend the following:
1) Move the other db's or the Orion netperfmon db so that it has its own server with its own dedicated RAID 10 array with fast disks.
2) Use the load balancer tool and move a bunch of elements over to your secondary polling engine. I have a hunch that your primary engine may be a bit overloaded.
My situation was very similar to yours in that I inherited an Orion installation that was performing so poorly it was basically broken. I had to take the 2 above mentioned steps to get things working. It has been humming along quit nicely, since. I did end up moving my netperfmon db to a VERY beefy SAN, even though they generally recommend against SAN solutions for the db.
Hoped you'd reply as it sounded like we had similar issues...inheriting a "broken" installation.
It's all 32-bit OSes, and I know having the SQL DB on a VM instance is NOT recommended, absolutely.
I am going to recommend they move the DB to a standalone server, go to Server 2003 Enterprise, increase RAM to 8 or 16GB, enable AWE, and use RAID.
In a former life, I had an NPM 9.0 installation with MANY more nodes monitored, on a SAN and it ran like lightning. So well that I was appalled at horrible THIS installation runs when I walked in last week. For all intents and purposes, it's useless.
In addition, there's some "need versus want" thinking I need to address for APM and NFT going on here. For instance, do they REALLY need to watch every server, regardless of role? EVERY Cisco router instead of JUST the core and edge routers?
Today, I am instrumenting and collecting performance data from the existing configuration to validate my recommendations for the future. Right now, they have a Ford F-150 installation (with a cracked block) that they want Ferrari F40 performance from.
Thanks for the sanity check and advice!
np. I have also had to evaluate the importance of monitoring every singe interface on every single node. A nice common ground may be to monitor every node and only monitor interfaces that are trunk ports or very important server ports.
We have been struggling with performance issues. I had our sql server (which was already at 2k8 64bit withsql 2005) re-built with the transaction logs on a different drive. This made a significant improvement. Additionally, we re-built our SLX server to 2k8 64bit with 6gb of RAM. This improved a great deal. What would really help is when the NPM and especially the web site became multi-threaded.