This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Orion Events Removed Status

Dears,

    We are using Orion 9.5.1 with SQL Server 2005 SP3 on Windows Server 2008 R2 x64

We are facing a problem for three months which is all nodes shown to be removed in Event Console although they are still in Our Database and at the same time , the server stops sending mails

The problem is recovered when we restart the server or rerunning the configuration wizard

The PIC below clarify what i want to say

Please , we need to get rid off this problem as we have to interrupt our monitoring server nearly every week

Thanks in advance

  • Hi Shetoshandasa. 

    Please follow the steps below. 

    Please make sure you login with local admin and have latest backup of DB. 

    Please run a repair of your Information service as follows:

    Open C:\Documents and Settings\All Users\Application Data\SolarWinds\Installers (on a 2008 system, this will be under C:\ProgramData\Solarwinds\Installers)

    Run the InformationService.msi and select Repair.

    Also, please run a repair of your JobEngine as follows:

    Open C:\Documents and Settings\All Users\Application Data\SolarWinds\Installers (on a 2008 system, this will be under C:\ProgramData\Solarwinds\Installers)

    Run the JobEnegine.msi and select Repair.

    Please use the following steps to repair a corrupted Jobengine database file. This file will be re-written automatically when you restart your services, and will not affect any of your stored information, settings or statistics.

    Shut down all Orion services 

    - Open c:\documents and settings\all users\application data\solarwinds\jobengine\Data

    - Make a copy of the JobEngine35 - Blank.sdf file.

    - Rename the JobEngine35.sdf file to JobEngine35.sdfold

    - Rename your copy of JobEngine35 - Blank.sdf to JobEngine35.sdf

    - Right click the new JobEngine35.sdf and open Properties.

    - Uncheck the read-only attribute.

    Start up all of your Orion services again

    Please Logon to Orion Website > Admin >Polling Settings > Enable Baseline Calculation (advanced) 

    if this not fix the issue please repair Orion using Add / Remove program >select "Orion NPM "and click change it will give you an option of "Repair" 

    Once you done with "Repair" please do reboot Orion server . 

     

    *********** 

    I would strongly recommend you to upgrade to NPM 10.2 this Orion NPM introduces a number of significant enhancements to improve performance and provide greater stability in more demanding environments. 

     

    www.solarwinds.com/.../releaseNotes.htm

     

    **********

    Upgrade path 

    http://knowledgebase.solarwinds.com/kb/questions/1888/How+to+upgrade+SolarWinds+Orion+products#npm

    Let me know if this helps, or if you require any further information.

  • Dear GoldTipu,

        Many thanks for your fast response , i'll plan for an outage to follow the steps above and will feed you back

    Another Question , can i know the difference between both .sfd files !!

  • Dear Malik,

         I hope you still remember this thread ,, weeks after trying your solution and all things were seem to be good , but again this morning occurred .

    I think there is a problem between NPM Service Or Network Performance Monitor Service and the Database ,, as when the problem occur , Server stops sending mails which " from my point of view " confirms the idea declared above

    I tried to end the process of Network Performance Monitor and re-open it again and found that it began to discover all nodes with a lot of events like " Node Added "

    Is there any other solution i can try to fix this problem !!!

  • I want to add , we are now installing Orion 9.5.1 and SQL Server 2005 in Virtualized Windows Server 2008 R2 x64 in a virtual machine " Hyper-V "

    it's working well all the time except the time the problem always occur

    we were previously using Vmware ESx  but we faced alot of problems and fake alerts which forced us o go back to Hyper-V

    to be clear , we also tried to install the same database with the same version of Orion in H/W HP Server and also the problem occurs

    Please , we are badly need a solution for that problem

  • Dears,

         Any Solution exists??

  • Hi Sheto,

    As Malik suggested in his first reply, your best next step is to upgrade to version 10.2 (soon to be 10.3). I realize this may be an issue due to lapsed maintenance or something else, but there really have been a lot of improvements and bug fixes since 9.5.1.

    In my case, the Information Service v1 also had many issues (as Malik also pointed out). Upgrading to 10.2 will address this by giving you Information Service v2.

    Regards,

    Chris

  • This issue occurs when NetPerfMon service is unable to refresh elements from Orion Database. Service keeps its own internal representation of all Orion elements (Nodes, Interfaces..) in memory and tries to sync it with Orion Database every 30 s. I've seen cases when SQL query used for refreshing elements returns zero rows and reports no error during DB connectivity outage. Service then acts like all elements were removed and generates those "removed" events. Once query returns data again, service loads all elements back to memory and generates those "added" events. Simple restart of NetPerfMon service should fix the issue, there is no need to run Configuration Wizard (in case SQL is up and running).

    I recommend upgrading to Orion NPM 10.2, as well. We were able to replace old legacy VB 6.0 NetPerfMon service with brand new .NET Framework based polling services completely (Collector).

  • FormerMember
    0 FormerMember in reply to shetoshandasa

    saw this happening nore often until i migrated to v10.2

    now with the new version it seems to have worked out some of the bugs that were causing this.

    you also could be having issues with the NPM talking to the SQL server if they get jammed up it drops all the nodes then when the communication is restored it goes right to adding them back like you are seeing.

    if your orion server is getting over worked then look at moving the website off the orion to a web server unto itself.

    if the SQL server is over worked then you will need to have fewer databases on the server holding the orion databse

    just takes time to figure out where the jam is occuring