I think that your problems come not from the technical solution but ensuring that your proceedures and policies are well defined, communicated and understood by the staff. The roles and responsibilities should be clear.
In your example before you went live on the system you should have run a number of incident handling tests to ensure that in the event of a failure everyone understands what they should do.
"Nobody seemed to care" = "Nobody understood their role and the role of the technical solution"
I am not sure of your specific environment and it really depends on whether the technical solution has been implemented as part of the service provision and fully endorsed by management or if it has been installed and is part of a whole lot of other tools that people are using.
As part of our monitoring service we chose two monitoring solutions at the time (Nagios for Application & SW for Network) and then integrated a nagios view into the main SW portal. This was then made available for all the support staff and helpdesk with appropriate training.
Any alerts coming out of those two solutions were directed at the relevant teams who had the responsibility to respond to the alert.
If people don't trust the system because it has been considered unreliable and they think the system is crying wolf then make sure you are comfortable with the system and then relaunch it with appropriate training as the defacto alerting system.
@Miron - "nobody seemed to care" - kinda like:
There was an important job to be done and Everybody was sure that Somebody would do it.
Anybody could have done it, but Nobody did it.
Somebody got angry about that because it was Everybody's job.
Everybody thought that Anybody could do it, but Nobody realized that Everybody wouldn't do it.
It ended up that Everybody blamed Somebody when Nobody did what Anybody could have done
Another thing to try is to find out why they feel it is unreliable. Do they get too many alerts? Are they informed about the reset of an alert? Does their job depend on keeping the network operational? (especially if they were alerted of the issue prior to a user compliant)
I think Miron might be right, accountability is key.
I have to agree with Miron, your problem isn't technical. If you abandon Orion and move to a different solution you will likely have the same problem.
We have Orion create tickets in our ticketing system assigning those tickets to the appropriate groups. Those groups are then accountable for handling those tickets appropriately as part of their job. If they don't handle the tickets appropriately, they have not done their job properly and that is then handled as an HR issue.
In my experience even with the best NMS systems you will still get false positives, the important things are to always try and minimize the false positives and to make sure your team is trained properly to understand that there will be some false positives; however it's still critically important to properly investigate every incident that your NMS alerts on. If you can't do this then why even have an NMS at all?
Hope this helps!