cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 15

Job Scheduler v2: Results Notified Error

So I recently applied the following template to measure the performance of my SW environment.  Orion Server.apm-template

I have concerns regarding a specific component of this template Job Scheduler v2: Results Notified Error.  Per the document it is described as

Job Scheduler v2: Results Notified Error

This monitor returns the number of errors that occurred when sending the results back. This value should be zero at all times.

The value should be zero at all times....I have values over 10K on one of my polling engines.  In fact 3 out of 4 of my servers have some value greater than 0.  However this value isn't incrementing on any of them.  Can someone explain in more detail what this is?  Could this be old errors?  Is there a way to clear and see if any new errors develop?  What particular problem does this indicate and how do I go about resolving?

Any help would be appreciated.

0 Kudos
11 Replies
Level 7

I'm also still getting high counts after restarting required Orion services, any further resolution steps would be really appreciated.

pastedImage_0.png

0 Kudos

This counter represents serious underlying issue for my environments.

When I get this alert, its a big deal that requires immediate attention. All statistic collection for many agents and SNMP devices stop for us.

Service restart/reboots is always required for me.

Just my .02 cents in case you have a real issue -VS- a non critical statistical counter issue for the monitor itself.

Level 8

Hi everyone,

I have same problem. I have "Count as Difference" set to True and still receivig error as msawyer said. The static data is about 107k. How can i solve this problem. I couldn't find any solution on the internet. Any idea ?

Thanks for your time.

0 Kudos
Level 13

So, this metric shows number of failed attempts to deliver polling job result from Job Engine v2 service to registered job result consumer. Consumer is usually running in Collector Polling Controller service or Orion Module Engine service. If there are some active polling jobs producing results and result consumer is not running (accepting results) then you can observe quite fast growth of this counter. Job engine has buffer for results and retry mechanism on result delivery so short term growth of this counter (e.g. during restart of services when consumer service needs more time to start than job engine) is not real problem, but as this buffer is limited, permanent growth of this counter means that data from polling are being discarded and there will be gaps in historical data.

Value of this counter should be cleared by restart of Job Engine v2 service.

I just restarted the service and the counter reset. Thanks.

0 Kudos
Level 13

It is very probably cumulative error counter, so it would be logical to have it defined as "count as diference" to warn only when its value grows. Does it drop to zero when you restart orion services on given engine?

0 Kudos

I've logged the absence of the "count statistic as difference" option being set for this component monitor as a bug with this template. It is being tracked internally as FB141719.

aLTeReGo,

I have "Count as Difference" set to True, yet I am still receiving this error. Any suggestions?

OrionPerf.PNG

0 Kudos

On which metric do you have the 'Count Statistic as Difference' enabled? Also note that the effect is not immediate. You will need to wait until the next scheduled poll before the value will be updated.

0 Kudos

FYI, I'm running 10.7 with SAM 6.1 & the template has not changed for this.

My counter went from 0 yesterday to 781 today, and is stuck at 781 now.

Of course is flagged as a critical problem in the Orion server template.

....I'm modifying the template for this now.

0 Kudos

Correct. The latest template version is posted here on Thwack, but templates are not automatically updated/replaced via product upgrades.