Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 11

Integrating Monitoring Tools into ServiceNow Through Event Management

Why are we discussing this?

I didn't see any similar discussions or posts.  I'm not sure how many of you are using the event management suite in ServiceNow or something similar, but I wanted to put this out there to help you avoid some of the pitfalls we ran into when we rolled it out.  There are a lot of nice features available to help reduce outages and increase efficiency for your team and (if you have one) the EOC/NOC.  Our process started to address the creation of multiple incidents being created for the same outage due to alerts from multiple tools.  By having these alerts go through an event management process we are able to correlate the alerts, have them display on a service map, and only generate a single incident now.

Definitions you need to know

  • Event Management – The process responsible for managing events throughout their lifecycle. Event management is one of the main activities of IT operations. It is a way to consolidate all events/alerts from disparate monitoring systems in one place to give you both more information and reduce noise for your teams.  Not all events should become an alert and not all alerts should become incidents
  • Event – A change of state that has some significance for the management of an IT service or a configuration item.  These records can vary greatly in their importance from “telling you of the addition of a device to monitoring” to “telling you a Data Center is offline”.
  • Alert – A notification of a threshold breach, something has changed, or a failure has occurred. Monitoring tools create and manage alerts.  The event management process manages the lifecycle of an alert.  An alert must have first been an event.
  • Incident – An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item that has not yet affected service is also an incident.
  • Noise – Alerts that are unneeded, duplicated, or correlated to a larger issue.
  • Signal – Unique alerts that are usable, actionable, and result in either the creation of an incident or automated remediation.

Why ServiceNow?

There are other tools that can do many of the same things that I will be talking about here.  I am focusing on ServiceNow, because I have experience in building out this integration from monitoring tools to ServiceNow.  Regardless of the tool you use to handle event management, the discussion would still help your journey.

Why would I want to use an event management tool?

While tools have their own ways of handling events, alerts, and creating an incident; they do not talk to each other.  Unless you have a single tool handling all your monitoring, you will likely run into issues where two or more tools generate an incident for the same thing.  This is avoidable using tools like the Event Management module inside ServiceNow to reduce noise.  Other benefits include the ability to start tracking alerts for devices, change records can mute alerts during the change window, look for trends, create reports, and of course build automation to eliminate repetitive tasks caused by alerts (i.e. a service hung and needs to restarted).

Where do I start?

When you have multiple tools handling your monitoring for the enterprise, which tool should you start with?  Well, that depends on a lot of factors.  Is there a tool that’s super easy to integrate, is there a tool that has the most reliable alerts, or is there a tool that most of IT is using?  To throw some buzz worthy phrases at you, “Don’t try to boil the ocean”, “Get the low-hanging fruit”, and “What gets you the most bang for your buck”.  Meaning simply this; I would start small as you can always expand, knocking out the easy stuff if it makes sense, but ultimately what provides the greatest benefit with the least amount of effort.  I will leave that decision to you, but I will tell you what we did to get where we are today.

We started with a single tool that was hosting a lot of our monitoring.  There was no out of the box connector for it in ServiceNow and that tool was sending emails to open incidents.  We found that while we were unable to pull data from that tool using an API call, it did support sending the data through API.  We built out the integration so that when the tool generates an alert, it sends this to ServiceNow via the API to the event table.  This allowed me to then build rules on how to handle these events.

The next decision was to integrate with SCOM. There was an out of the box integration for this one and it was in place fast.  The same process followed: cleaning up noisy events and building rules to provide better alerts.

Various other tools used an email integration to event management.  The emails were sent in plain text in a JSON format.  These were fairly easy, but not a preferred method due to relying on additional points of failure by including our (or tool vendor) email servers and ServiceNow email servers.

Next in the line was SolarWinds.  SolarWinds was interesting for two reasons.  The first was that there was an out of the box connector.  The second was SolarWinds had a plugin for ServiceNow integration.  What I came to find was that the plugin was for incident creation and not event creation and the out of the box connector worked but needed tweaking. 

I will explain more later in the lessons learned section.  We found a few issues along the way, but it came together nicely.  I am now able to build reports for teams that show their health as reported from all tools, associate devices in a change window to their alert and mark it as in maintenance, build a customer experience dashboard, and (thanks to the work of our CMDB guy) we can feed these alerts to service maps.

What is the plan moving forward?

There are other features we haven’t started playing with, yet.  Operational Intelligence would be the feature I am most interested in pursuing.  This is a portion of the suite that collects the metric data from your tools, looks for anomalies, and proactive alerts based on machine learning. 

What lessons did you learn?

Integrating alerts was not as simple as we originally thought it would be.  I want go over each of the tools lessons learned and end on what we learned about the ServiceNow platform itself.  Hopefully sharing this information will save you from the same issues when you do your integration.

Having Vistara send the events via an API call to ServiceNow worked well.  We had a few instances of their API service dying, but over two years that’s not bad. When we started receiving the events from Vistara in ServiceNow, we found many of these weren’t actionable and built rules to silence them.  The remaining events then became alerts.  For the alerts, I made a different set of rules that provided additional information for our EOC (Enterprise Operations Center) about the alert.  That information could be things like a knowledge base article that tells them how to fix the issue, who to contact, if this should become an incident, what the severity of the incident should be, and much more.

SCOM was a bit of a pain.  We found with our instance there were two different places we had to build out the integration.  To pull the alerts from SCOM we had the connector hit one of our web servers.  The metric data was not accessible from the same server and that connector had to point directly at the DB server.  This worked well until security locked down the ports and we couldn’t connect to it anymore.  The alerting has since been moved to email integration to work around the security “features” blocking our API connection and we had to disable the metrics collection all together.

The email integrations are a stop gap until the monitoring these tools provide is moved to SolarWinds.  The plus side is that these are easy to customize and quick to set-up, but the flip side is that they have additional points of failure. Another issue we have encountered is with getting the clear messages to work.  This comes down to the message key.  A message key is what you will use in event management to separate different occurrences of the same issue and to have the clear messages associate properly with the triggered alert.  If you run into this issue work with the team sending the email to work on a unique message key for their alerts.

The SolarWinds connector pulls data from the event table in SolarWinds instead of the alerts.  Events in SolarWinds can be triggered either by the thresholds assigned to the machine directly or by a something forcing the system to write to the event log.  This means that you will need to add a filter to hide the noise you were already filtering by setting up alert actions in SolarWinds.  One of the ways we combatted that was to block any alert without and eventType of 5000 or 5001.  Event type in SolarWinds is a number that identifies what triggered the event.  A 5000 event says that an alert rule caused an entry to be written to the event log.  A 5001 event type says the issue is cleared.  That simple change in ServiceNow stopped over 9000 additional noisy alerts per day.  The biggest thing we found is that the “swEventId” does not make a good message key.  This forced us to create our own message key using the initial event time field.  An example would be like this JSON piece: {“initial_event_time”:”5/19/2018 16:43:00”, "netObjectId":"10053"}becomes a message key of 2018.

ServiceNow has several connectors out-of-the-box, however; I would still recommend having someone that can program using JavaScript go through the default connector or building your connector.  I would not change the default connector, but instead make a copy of the default if you want to make changes.  Code patches could over-write your changes to a default connector definition.  Here are a few other quick hints:

  • Map out which fields you want to use on the alert form before you start
    • Default for node is the hostname
    • Default for resource is what on the device/application has an issue (i.e. CPU for a High CPU alert)
    • Type needs to be something in the CMDB (i.e. server/application/network)
    • Severities need to be in number format (1-5 Exception to Informational)
    • Custom fields can be added to the alert form
  • Event and Alert Rules can let you change the entire alert message and field data
  • Technical services are for things like Exchange not a custom monitor
  • Discovered Services are for service maps
  • Manual Services can be for custom monitored service
  • If you plan to use the dashboard to display the health of your services, ensure that all services are using the default numbered Business criticality as the non-standard criticalities will break the dashboard display
  • Link KB articles to alerts and provide instructions for handling the alert or how to fix the issue
  • You can build out automation around what to do with alerts

Where does this leave us?

The ITOM suite has provided us a wealth of information to improve our services, increase first to know, and identify trends to avoid issues in the future. While we encountered some issues when we began this journey, the destination was well worth the trouble.  The key takeaway, planning what data you want to collect and how you want to use this information is critical to making it successful. 

Have any of you run into these or other issues?  Do you have any other suggestions/comments/concerns?

46 Replies

@dmartzall Just to recap again on severity part. So in order for me to correct differentiate Warning and Critical alerts, i first need to send the Severity variable as part of the event message correct?

And then in SNOW, what would i need to modify?

0 Kudos

this m clear...

0 Kudos
Level 12

Nice Article, I did the same approach in our environment while integrating Solarwinds with Service Now. Not only Event ID 5000 and 5001 but there were many thing we applied like filtering based on the Keywords in the Message Field and Based on the Priority. Service Now is enough flexible to identify the patterns that you pass in the event message field. Also we did parallel sync from solar winds to Service Now using Solar winds SDK to map other fields in the Service Now Event from and then passing them into Catalog form and incident form. It was like a mirroring of Data what you have in solar winds same is available in service now for further processing.

That allowed us to do more complex filtering without any performance issues and easy to manage. Today its running like awesome everything is automated. Thanks for sharing this post. I will also write my setup shortly on thwack.

Level 16

Thanks for sharing this...

On the Solarwinds part, you have mentioned about the Event id 5000 and 50001 which we werent aware of OR rather didnt check much in details...

So if we need to block anything apart from these events then how is that achieved? can you give a very high level overview?

0 Kudos

There are a few ways you can block specific SolarWinds event types within ServiceNow from becoming alerts.  You can modify the JavaScript on the connector definition, which is probably the cleanest way since it prevents ServiceNow from ever receiving these events.  The downside to doing this is that you need to have a good understanding of JavaScript to make the changes to the script.  The easier way is to use the event rules.  To use the event rule to do this:

  1. Use the search bar on the side navigation panel to search for "event rule": pastedImage_8.png
  2. Click "Event Rules"
  3. Click "New": pastedImage_9.png
  4. The first tab "Event Rule Info" should look similar to this: pastedImage_6.png
  5. The second tab "Event Filter" should look something like this: pastedImage_2.png
  6. Ensure that the rule is active and click "Submit": pastedImage_10.png

With that filter in place, you will only see the events from SolarWinds that had "alert actions" to become alerts within ServiceNow.  Please let me know if you need any additional information.

Strange. I tried this with and AND and an OR to drop all but 5000 and 5001 and it wasn't working but today it seems to be.  thanks for the quick reply.

0 Kudos

Is this now working for you because I have been having the same issue that the events outside of 5000 and 5001 are not being ignored?  

0 Kudos

yes its working for me.  I found that editing event rules sometimes does not work.

Try deleting the rule and recreating.

0 Kudos

Will do thanks 🙂

0 Kudos

perfect, i think this might help us...

i will check with my team and provide the feedback...

0 Kudos

Sure thing.  I'm happy to help.

0 Kudos

@dmartzall  need ur help again.. have u tested datastore alerts using this method of integration.. i was shocked to find yest that even thought SolarwWinds recrods the event, it doesnt get pulled by ServiceNow... I tried all possible things, but doesnt work...

0 Kudos

I didn't run into any issues with alerts from the SRM module.  Are you using that to monitor the datastore or another module?


0 Kudos

@dmartzall not SRM module... i am referring to monitoring of datastores of ESX... alerts of these doesnt get captured by ServiceNow...

0 Kudos

Double-check that you have an alert rule set up in SolarWinds for that event.  That will ensure that you have an alert with an event id of 5000 or 5001.

0 Kudos

@dmartzall Already verified all... alert is visible in console and type is 5000... but still ServiceNow doesnt pull it..

infact have a case registered with ServiceNow for this... this has a very big impact for us...

0 Kudos

@pratikmehta003 The next thing to check is the events that show up in ServiceNow.  It could be there is an event rule that is stopping it from becoming an alert.  If it is not showing up under events, the next thing would be to validate the SolarWinds job in ServiceNow to make sure nothing in the script is ignoring those events.

0 Kudos

@dmartzall its not reaching SNOW itself....yes thats what i m getting it checked by ServiceNow as from what i understand its only having one inbuilt script which pulls events from Solarwinds DB... so there is something which the script is not pulling for sure, but what is that is something only ServiceNow support can check and confirm....

Infact i already had 1 call today with them and checked some debug logs but nothing is found on those 2 even though we generated some alerts in Solarwinds during that time....

0 Kudos

@pratikmehta003 You can do a manual check to make sure that the event is being stored and able to be pulled via SNOW using Postman.  If you can see the event with a manual pull from Postman, then you know the issue is on the SNOW side.  If however you do not see it there, then the issue could be with the alert rule you built in SolarWinds.

0 Kudos

@dmartzall ANy idea how to pass the variable in alert for sweventID information which gets pulled in like below?

i m trying to use POST and i want to pass this since its a unique variable




0 Kudos

@pratikmehta003 Are you looking for what the name of the alert rule is? The swEventId should be getting passed by default.

0 Kudos