cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Orion Server

This template assesses the status of Windows services related to SolarWinds Orion servers.

Prerequisites: WMI access to the target server.

Credentials: Windows Administrator on the target server.

Monitored Components:

SolarWinds Orion Job Engine

This monitor returns the CPU and memory usage of the SolarWinds Orion Job Engine service. This service is used to perform recurring work. This service creates various Job Engine Worker processes for scalability and robustness. The job engine writes information about each job to its database.

SolarWinds Orion Module Engine

This monitor returns the CPU and memory usage of the SolarWinds Orion Module Engine service. This service is used  to talk to the database.

SolarWinds Orion Job Scheduler

This monitor returns the CPU and memory usage of the SolarWinds Orion Job Scheduler service. The Job Scheduler service dispatches work to local and/or remote job engines.

SolarWinds Syslog Service

This monitor returns the CPU and memory usage of the SolarWinds Syslog service. This service is responsible for logging events in log files.

SolarWinds Alerting Service V2

This monitor returns the CPU and memory usage of the SolarWinds Alerting Service V2. This service is responsible for evaluating alert conditions, triggering alerts and running alert actions.

SolarWinds Alerting Engine

This monitor returns the CPU and memory usage of the SolarWinds Alerting Engine service. This service is responsible for Advanced Alerting.

SolarWinds Website

This component monitor tests a web server's ability to accept incoming sessions and transmit the requested page. The component monitor can optionally search the delivered page for specific text strings and pass or fail the test based on that search. By default,  it monitors TCP port 80.

SolarWinds Job Engine v2

This monitor returns the CPU and memory usage of the SolarWinds Job Engine v2 service. This service is used to perform recurring work. This service creates various Job Engine Worker processes for scalability and robustness. The job engine writes information about each job to its database.

SolarWinds Collector Service

This monitor returns CPU and memory usage of the SolarWinds Collector service. This service takes part in data synchronization between the poller and the Orion database.

SolarWinds Collector Data Processor

This monitor returns the CPU and memory usage of the SolarWinds Collector Data Processor service. This service is responsible for volume and node data synchronization between the Collector and the Standard Poller.

SolarWinds Collector Management Agent

This monitor returns the CPU and memory usage of the SolarWinds Collector Management Agent service. This service takes part in data synchronization between the Collector and the Standard Poller.

SolarWinds Collector Polling Controller

This monitor returns the CPU and memory usage of the SolarWinds Collector Polling Controller service. This service takes part in data synchronization between the Collector and the Standard Poller.

SolarWinds Information Service

This monitor returns the CPU and memory usage of the SolarWinds Information service. This service is used by websites to talk to the database. This service is also responsible for how the pollers talk to each other.

SolarWinds Information Service V3

This monitor returns the CPU and memory usage of the SolarWinds Information service V3. This service is used by websites to talk to the database. This service is also responsible for how the pollers talk to each other.

SolarWinds JMX Bridge

This monitor returns the CPU and memory usage of the SolarWinds JMX Bridge service. The JMX Bridge is only used if you are monitoring Java Application Servers such as WebSphere, WebLogic, or Apache Tomcat via JMX.

Note: By default this monitor is disabled.

SolarWinds Trap Service

This monitor returns the CPU and memory usage of the SolarWinds Trap service. This service is responsible for catching and logging trap events.

File Count Monitor - JET Files

This monitor returns the number of JET files in C:\Windows\Temp which prevents new DB connections and causes polling to halt. This monitor should be less than 65,530. These files can be deleted. They usually stay in the system only because an application  that uses them have accessed a database has crashed and the files were not properly deleted. No more than 65KB should be in this folder.

MSMQ Messages in Queue

This is the total number of Message Queuing messages that currently reside in the selected queue. When the Data Processor receives more results into MSMQ than it is able to process and pass to the Standard Poller, MSMQ continues growing. The size of MSMQ should be near 0 almost all of the time. Some spikes may appear, but the Data Processor needs to be able to clean up the MSMQ quickly, otherwise it will not be able to handle DB blackouts or maintenance. (Standard Poller performance is affected by DB performance significantly.)

Note: Before using this counter, you should set the correct instance beginning with:
<HOSTNAME>\private$\solarwinds\collector\processingqueue

where  <HOSTNAME> - hostname (without < >) of target server.

For example: APMhost.By default, the instance is set to: <HOSTNAME>\private$\solarwinds\collector\processingqueue\solarwinds.node.hardwarehealth.wmi

All available instances can be found by running the perfmon utility and searching for “Messages in Queue” counter in the “MSMQ Queue” category.

Note: This monitor is disabled by default

Perfmon DPPL Avg. Time to Process Item

This monitor returns the time needed to process one item. If this number is 1, it means you are able to process one item per second. 0.01 means 100 items per second. The returned value should be as low as possible.

Perfmon DPPL Waiting Items

This monitor returns items in the queue pulled from the message queue but waiting for other results to be processed. This should be less than 40. If this number is holding at or above 40, this may indicate issues concerning DB response time, performance issues, or many down elements.

MSMQ Folder Size

This monitor returns the MSMQ folder size. This monitor should be less than 800 MB. MSMQ maximum size is 1GB. If the 1GB limit is reached, polling will stop working correctly.

To Increase the MSMQ size, you should open Computer Management > Features > Messaging Queuing. From here, right-click and change MSMQ Messaging 1 GB Limit to 1.5GB. For Windows Server 2003, this is found under the Storage section.

See: http://knowledgebase.solarwinds.com/kb/questions/3510/Microsoft+Message+Queue+Fills+Directory+with+O....

Process Monitor - SWJobEngineWorker2.exe

This monitor returns the number of Job Engine worker processes and its CPU and memory usage. A value of 10 or lower is acceptable. If the returned value is 100 or greater, there may be problems with jobs hanging.

Job Engine v2: Jobs Queued

This monitor returns the number of jobs waiting for execution due to insufficient resources. This value should be less than 10 at all times.

Job Engine v2: Jobs Lost

This monitor returns the number of lost jobs. This value should be zero at all times.

Job Engine v2: Jobs Running

This monitor returns the number of jobs currently running.

Job Engine v2: Worker Processes

This monitor returns the number of worker processes used. A value of 20 or lower is acceptable. If the returned value is 100 or greater, there may be problems with jobs hanging.

Job Scheduler v2: Average Execution Delay

This monitor returns the average delay, in milliseconds, between the time when the job is supposed to be executed and the time that it actually is executed. This value should be less than 100,000.

Job Scheduler v2: Results Notified Error

This monitor returns the number of errors that occurred when sending the results back. This value should be zero at all times.

RabbitMQ Service Monitor

This monitor returns information about RabbitMQ services running on a node with the Windows operating system.

SSL Listeners Port Monitor

This Monitor returns information about TCP port 5671 needed to listen on a socket that is going to be used for SSL connections. This setting is controlled by the Rabbit SSL_listeners argument to RabbitMQ.

RabbitMQ Folder Size

This monitor returns the Orion RabbitMQ folder size. If the folder is growing, RabbitMQ is writing messages not beeing delivered to disk, or the machine is under memory pressure.

Note: This monitor is disabled by default

SWIS PubSub Messages Queued

This is the total number of Message Queuing messages that currently reside in the SWIS PubSub queue. When publisher sends more messages then subscribers are able to process, or if there are any message delivery issues, RabbitMQ continues growing. The size of the queue should be near 0 almost all of the time. Some spikes may appear, but SWIS needs to be able to clean up the MSMQ quickly,

Note: This monitor is disabled by default

Last updated: 1-12-2017

Labels (1)
Attachments
Comments

Excellent! Thanks bronx.

This is a fantastic template!

Nicely done.  This should be deployed out of the box by default!

    Bronx, what was changed from 12/11/2013 to 12/18?

Corrected the description for the 2nd to last monitor - that's it.

Bronx is there any way to expose the version change log to the community?  I had the same question when the Exchange templates were last updated.

Yeah, I like this idea - I follow templates that are important to me and having some kind of changelog for them would be nice, and falls in line with my idea http://thwack.solarwinds.com/ideas/3182

That's a good idea. I'll have to think of something that I could do version to version. I suppose every time I update a template I could Say in the header, "This version fixes...."
FYI, most updates are definitional and the monitors rarely change.

I wouldn't think it's your fault, a lot of people just aren't always posting what they updated.

Even an indication of "last change to the actual monitors themselves" that excludes simple definition changes would be a solid reference point.

Probably obvious but my focus is short this morning.  Does this work well with additional pollers too?  Additional web servers?

Thanks.  We really needed this.

Jim

This template could be used against an Additional Poller or Additional Web Server, though there are certain components that would need to be disabled since they do not exist on those servers. A perfect example of this would be the "SolarWinds Alerting Engine" service, which runs only on the primary Orion server. I suggest applying the template to your Additional Pollers and Web Servers, then disabling any components in the assigned application that shown to be in a "Down" state.

I think it makes sense to monitor other instance of Orion  for core services  like Job Engine, Polling Controller, Data Processor.

Orion itself is not able to poll the information when are such core services down ...

question   Job Scheduler v2: Results Notified Error  this is always high for me and always in alert .. and when I have issues I do mention it and its kinda brushed off ... what can i check to see why its high or how can i completely resolve it

This was an error within the application template that should be corrected in the latest version of this template posted here. The "Count statistic as difference" option was not set properly in previous version of this template. That has since been corrected.

i started to use this template and i'm getting warnings related to Job Engine v2: Worker Processes, in the description i found this.

"This monitor returns the number of worker processes used. A value of 10 or lower is acceptable. If the returned value is 100 or greater, there may be problems with jobs hanging."

¿But what actions can i take? ¿How can i complete the diagnostic?

Are you using the latest version of this template posted here or are you using the version included out-of-the-box with SAM?

The one posted here

aLTeReGo something I only today thought about - is there anything additional to monitor/that could be included here for stacked pollers? Like anything to make sure that the stacked pollers are both operating?

Stacked pollers are treated no differently than standard pollers. Their metrics are consolidated and reflected within the same performance counters monitored by this template.

So I was sent this way by the video to optimize Solarwinds performance. I downloaded the latest version provided here.

Two questions:

1.) When we have an issue what are the troubleshooting tips to resolve them?

2.) My app has never gone "online", the SolarWinds Website component is listed as down (which obviously it isn't since I am in it). What do I need to do to fix this?

/applause

Mattwolf‌ this is explained at the top of the page and/or on each individual component. There are statements which highlight a bit of what is normal/what isn't normal.

That aside, each individual environment of your own is going to need some understanding of your server and what you're experiencing to be able to figure out where to troubleshoot. There are things with warnings based on the predefined values that in my personal environment are not actual a warning/critical, so I end up modifying the values accordingly.

If the website is down, maybe you want to check the configuration of that component (edit the monitor) and see what it is doing to verify.

The template includes Expert Knowledge which explains potential problems, as well as recommended remediation for when a threshold is exceeded. If your website status component is reporting a "Down" status, then it's very likely your website is operating on a different port. To update the Port, edit the assigned application, expand the "SolarWinds Website" component, and update the port with the one you are using to access your Orion web console. E.G. "8787" if you access your Orion server via http://yourorion.server.ext:8787

Hey guys - Just upgraded to 11.5.2 last night and ran Orion Platform v2015.1.2 - Hot Fix 3 and I'm seeing this today.  Should I be alarmed?  Any tips on troubleshooting?

JSV2.JPG

I believe this is discussed in some detail in the following thread

Job Scheduler v2: Results Notified Error

Does anyone else find that the JET monitoring actually matches all tmp files in the temporary directory?

This is discussed in the folllowing thread

Orion Server Health: File Count Monitor - JET Files

great post. Awesome information..

My primary server is always in the warning threshold, and frequently in the Critical threshold for Job Engine v2: Jobs Running.

so A) Is this detremental to performance?

B) Can it be fixed and how?

By default there are no thresholds defined for Job Engine v2: Jobs Running, so this should not be in either a warning or critical state unless you've defined your own thresholds for this component.

pastedImage_0.png

Thanks!  Looks like at some point I told it to use baselines.  I've updated it.

Nice work. I realized our Additional Polling Engines were not being monitored for Orion Applications, despite  4 of them not communicating with the main engine after recent security patching. I added all of our SolarWinds servers to our NOC dashboard as there is something to watch for now.

Nice work, thanks for posting.

thanks for posting this... fills a gap in my environment!

It would be a good idea to also monitor your Ephemeral Ports to make sure they don't get exhausted.

Ephemeral Port Monitor

We also split out this template into one for APEs and one for the Primary as the APEs do not use all the same services.

How about something like AppInsight for Orion

https://thwack.solarwinds.com/ideas/8022#comment-323542

This has now been resolved

Original settings are

jetoriginal.PNG

Change these to

jetnew.PNG

It looks like in late to the party, But what a great template...Thanks!

Version history
Revision #:
1 of 1
Last update:
‎02-05-2013 08:23 AM