Job Engine v2 Failing

I always have to restart the service for Job Engine v2, is there a lasting solution to this?

Some time we don't get to know when the service stops and this causes polling to stop as well. Then I have customer calling me to say real time report they are getting isn't correct.

Parents
  • is 100% correct. If you have a flapping service or stability issues, you need to reach out to support as there could be several different reasons why that is happening. Support can help pin down the problem. With the Job engine you'll want to also make sure you don't have latency or performance issues that are causing your queues to fill. But support can help you pin down the exact reason and if necessary dig into diagnostics to resolve the problem. 

  • Know if there is a auto-fix process that's come in recently? I see a new Service page in the UI under settings that seems new, cant seem to keep services stopped right now.

  • Hi Adam, that page gives you access to the platform services without an RDP session, but its purpose is the same as the "old fashioned" way of accessing the SolarWinds Platform Services Manager (formerly the Orion Service Manager). It's kind of like process and service management for Windows servers in SAM.

    There are steps that can be taken to repair the various services, if necessary, but it's best to identify the underlying cause first so there isn't a repeat. To my knowledge, there isn't an "auto-fix" for an issue like this one beyond repair functions, mostly because these kinds of problems can be caused or influenced by environmental problems (like latency issues), and until those types of problems are resolved, it wouldn't stay "fixed" I know that self-correction and the robust-ness of the services and underlying function have been improved as part of optimization over the last few years, but I wouldn't necessarily call that something new or an auto-fix beyond what the platform has always done as part of its long-standing self-correct functions. 

    I'm a little confused by your statement that you can't "keep the services stopped." All services or a few? I confess I'm pretty old school and typically use the RDP session and service manager, mostly because I can. LOL You are accessing through the webpage, so not everything is fully closed when you stop services, so that may have an influence because of dependent Windows services, though that feels like a bit of a stretch.

Reply
  • Hi Adam, that page gives you access to the platform services without an RDP session, but its purpose is the same as the "old fashioned" way of accessing the SolarWinds Platform Services Manager (formerly the Orion Service Manager). It's kind of like process and service management for Windows servers in SAM.

    There are steps that can be taken to repair the various services, if necessary, but it's best to identify the underlying cause first so there isn't a repeat. To my knowledge, there isn't an "auto-fix" for an issue like this one beyond repair functions, mostly because these kinds of problems can be caused or influenced by environmental problems (like latency issues), and until those types of problems are resolved, it wouldn't stay "fixed" I know that self-correction and the robust-ness of the services and underlying function have been improved as part of optimization over the last few years, but I wouldn't necessarily call that something new or an auto-fix beyond what the platform has always done as part of its long-standing self-correct functions. 

    I'm a little confused by your statement that you can't "keep the services stopped." All services or a few? I confess I'm pretty old school and typically use the RDP session and service manager, mostly because I can. LOL You are accessing through the webpage, so not everything is fully closed when you stop services, so that may have an influence because of dependent Windows services, though that feels like a bit of a stretch.

Children
  • So in one of my environments we're using VMAN but not really in need of Recommendations, the recs service obviously takes quite a few resources, and I used to be able to force the job to run on another poller by leaving it disabled on the other one. The primary box is running hot which is the reason it's disabled. I've found recently the system account has been starting it and setting it back to auto, which has tanked the box a few times. I think this came in with one of the late 2022 patches but i've not read anything about it.