The release of Orion version 2019.4 brings a lot of excitement to the SolarWinds Service Desk team. It introduces an integration that enables a closed-loop workflow, which converts alerts detected by Orion into a service desk ticket and updates the Orion alert as the ticket is resolved. By streamlining this process, IT pros can react faster when performance issues or outages are detected. This helps expedite the resolution process, helping IT ensure the availability of the service that employees rely on to stay productive.
My good friend, tony.johnson, put together a great article on how to implement the integration, but we wanted to also share how you can maximize the value of this integration. Let’s take a look into how you can configure your alerts and your service desk for optimal results!
The SolarWinds Orion and SolarWinds Service Desk Integration
Before we jump into the configuration option, let’s talk about the value this integration brings to your IT operations. The core capability automatically converts alerts into tickets. This makes things much easier for IT pros, but that is only part of the story. The integration also:
- Brings together IT operations and service information to improve visibility of employee impacting issues, helping them react and resolve issues faster
- Improves operational efficiency by automating bi-directional communication between SolarWinds Orion and SolarWinds Service Desk
- Captures all alert data into your service records, allowing you to report on alert-generated incident trends and your team's efficiency in resolving these types of issues
To take full advantage of the integration’s capabilities, you will need to properly configure both systems. Fortunately, this can be accomplished relatively easily. The three-step process below outlines a best practice approach to implementing this integration.
Step One: Game Planning
Although this step may seem like a no-brainer, we cannot stress its importance enough. At many organizations, the teams working in the Orion platform differ from those working in the service desk. They have different roles, responsibilities, priorities, and processes that they follow. By formalizing what you are trying to accomplish with this integration you can drive better alignment and accountability across teams. Keep in mind that this step may not require you to reinvent the wheel. The Orion Platform provides hundreds of pre-configured alerts, many of which you may already have activated. Now it’s just a matter of discussing which alerts you want sent to your service desk and how those tickets should be processed. A great way to accomplish this step is to have a classic whiteboard session. Some key questions to ask in this session are:
- What types of alerts do we want sent to the service desk?
- How should we categorize them?
- Who should we assign them to?
- How do we prioritize individual tickets?
- Who should we notify when an alert-based ticket is created?
- Do we want to set individual SLA rules on the alert-based tickets?
- What information and attributes of the alert should be included in that ticket?
- The general rule is to include all beneficial attributes. Not only could this information help you diagnose the issue, but it also can be used to automatically route, categorize, and prioritize the ticket.
It is important to note that the answers to these can vary based on the different types of alerts you are sending to the service desk. For example, the desired outcomes for alerts generated by Network Performance Monitor (NPM) could vary greatly from those for Server and Application Monitor (SAM). Throughout this post, we will focus on a specific scenario, but keep in mind that the flexibility of both Orion and SolarWinds Service Desk allows this integration to support many use cases. Example Scenario: Active Directory Replication FailureThe Problem: Like many organizations, our company is running on several mission-critical applications that our employees rely on to get their work done. We are using Active Directory (AD) to ensure the right users have the proper access levels to the applications essential to their positions. To help us manage AD, we utilize Server and Application Manager (SAM) coupled with AppInsight for Active Directory for deeper visibility into this critical system. However, we have more than one domain controller, and if replication fails or is delayed, users may not be able to log in to their applications. To help address this, we want to escalate AD generated alerts for replication failures to our service desk to provide better visibility and quicker resolutions.
The Whiteboard Session:
|What types of alerts do we want to be sent to the service desk?||Active Directory Replication Failure|
|What information and attributes of the alert should be included in that ticket?||The Domain Controller Name|
|How should we categorize them?|
|Who should we assign them to?||Application Support Team|
|How do we prioritize individual tickets?||Critical|
|Who should we notify when an alert-based ticket is created?||Tier One Support Team|
|Do we want to set individual SLA rules on the alert-based tickets?||Yes, we want service restored within 2 hours|
Step 2: Configuring Orion Alerts
Now that you have a clear picture of your goals in converting an alert to a ticket, it is time to start configuring the two systems. We are going to start on the Orion Platform side, where you have two key configuration options:
- Customizing your alert attributes: Selecting the information you want included when an alert is sent.
- Adding the “Create SolarWinds Service Desk Incident” alert trigger: Setting that these specific alerts will be sent to your service desk.
Example Scenario: Active Directory Replication FailureLet’s jump back into our use-case from step one to build out our alerts.
- In the first step, we decided which attributes are to be included in the alert for “AppInsight for Active Directory: Alert me when replication fails.” We built it out to include these attributes:
- Now that you have the alert attributes set, let’s add the action to send these alerts to the service desk. Select the option below to add the action to your alert:
With the above configuration, alerts sent to your service desk will look like this:
Step 3: Configuring Your Service Desk
Now that we have our alerts configured properly, let’s start configuring the service desk. Here we will focus on three main areas:
- Building Automation rules
- Defining Service Level Agreements (SLAs)
- Creating reporting on alert-generated tickets
IT Pro Tip: When you are configuring the integration in your service desk (in the setup options), you have to designate a requester, which will be the user that all alert-generated tickets will be associated with. We recommend creating a “shell” or fake user for this requester to make it easier to configure SLAs and automation rules specific to this integration. This will also make it easier to visualize alert-generated tickets when viewing your Incident queue.
Setting Alert-Generated Incident Automation Rules
In SolarWinds Service Desk, automation rules allow you to define what actions you want to take on a ticket when it is created, commented on, or updated. These automated actions drive consistency to the way you route, prioritize, categorize, and process tickets. Setting automation rules for alert-generated tickets keeps the proper teams aware of performance issues, allowing them to quickly react to and address the situation.
Example Scenario: Automation Rule for Active Directory Replication Failure Alert
Now that we have configured the Orion side in step two, let’s build an automation rule that will triage, prioritize, and categorize the alert-generated ticket. This is a two-part process:
- First, set your conditions. When a ticket matches these conditions, the proper automated actions will take place. Here are a couple of key conditions:
- Origin: You can set conditions based on the origin of the incident, and in our case, incidents coming from “SolarWinds Orion.” This ensures the automation rules will only run for tickets generated by this integration.
- Keywords: Setting a keyword condition allows you to leverage the alert attributes we established earlier with your automation rule. In our situation, we are going to use keywords from the alert name to build out the rule.
IT Pro Tip: Using Multiple Attributes - Depending on your use case, you may want several attributes in your keyword condition when building an automation rule. To do this, you can use regular expressions for your keyword condition. For example, if you had two alert attributes you wanted to use, you could leverage the regular expression: (\s|\S)*. This allows you to search through the entire body of the incident to pinpoint your specified keyword criteria. This would look like:
- Actions: Now select what you want your automation rule to do. For our example, I want my rule to:
- Reassign the ticket to the Application Support Team
- Categorize it as an Applications/Active Directory issue
- Update the priority to Critical
- Notify the Tier One Team that the issue is happening
Voila! Your automation rule is built.
IT Pro Tip: Cloning Automation Rules - You may want to build multiple automation rules for similar types of alerts. For example, you could build two automation rules for our scenario with slightly different actions:
- When the New York domain controller (NEWYADDS01v) is down, route the alert-generated tickets to the New York support team
- When the Los Angeles domain controller (LOSADDS01v) is down, route the alert-generated tickets to the Los Angeles support team
With the help of cloning capabilities, you can easily scale variations of your automation rules. This allows you to clone an existing rule and make your modifications without starting from scratch.
Setting Service Level Agreements (SLAs) for Orion Alert-Generated Incidents
You can set up individual SLA rules for the incidents created by this integration to set expectations for response and resolution times associated with alert-generated tickets.
Before we get started, here are a few things to consider:
- In many cases, your SLA rules will rely on your previously developed automation rules. In the example above, the automation rule set the category and priority of the alert-generated ticket, both of which are criteria you can use for your SLA rule.
- Earlier, we shared an IT Pro Tip about creating a “shell” user to use as the default requester for this integration. That user can also be used to define the scope of your SLA rule, helping you ensure these rules will only apply to alert-generated incidents.
Example Scenario: SLA Rule for Active Directory Replication Failure Alert
When Active Directory is down, our employees cannot access the applications they need to do their jobs. For this reason, we want to set the expectation that any replication failure alert will be resolved within two hours. Let’s build out this SLA rule:
- Set your SLA target: For this example, I am setting a target of “Not resolved” within 2 hours.
- Define your scope: We will use the data points we set with our above automation rule in this section.
- Category = Application
- Subcategory = Active Directory
- Priority = Critical
- Requester = Orion Alerts
- Set your action: This is where you set actions that are triggered when the SLA breaches. For our example, we are:
- Assigning to Anthony Campbell (Director of IT)
- Escalating the ticket to Tier 3 Application Support
Similar to automation rules, you may want to build specific SLA rules for the different types of alerts that will be sent to your service desk. For example, you may have different expectations for tickets generated by networking alerts versus application alerts. This will help you set performance standards and measurable goals across the various scenarios that can impact your IT services.
Reporting on Orion Alert-Generated Tickets
The last thing we want to dive into is how you can leverage the service desk reports to get a different perspective on Orion alerts. tony.johnson said it best, “The Orion Platform gives you great information on when the alert was triggered, and when the alert is re-set, however, it is missing the details on what was done to resolve the alert.”
This is where the service desk can help. Here are a handful of reports available out-of-the-box with SolarWinds Service Desk that provide you a more complete picture on how alerts are processed and resolved by your teams:
- Incident Trend Reports - View the days of the week you receive the most alerts and resolve the most alert-based incidents.
- Incident Heatmap - See which times of the day you experience the most alert based incidents.
- Incident Throughput Report - Visualize how effective your team is at resolving alert based incidents.
- Service Level Breach Report - Keep track of overall SLA compliance your agents have with alert-based incidents.
IT Pro Tip: Similar to automation rules, you can use the “Incident Orion” field in the reports module. This allows you to build reports that only reflect incidents that are created by the integration.
Bringing It All Together
We’ve walked through configuring both Orion and your service desk to get optimal results with this integration. Let’s tie it all together and talk through a real-world scenario.
Your Active Directory is experiencing a replication failure. An alert is generated, which is instantly converted into a service desk ticket. This ticket is prioritized as critical and assigned to the application support team.
The Tier One team is also notified that we are experiencing an AD replication issue. They are seeing tickets submitted by end users that seem related—users are unable to sign into Salesforce.
Per our processes, a problem record is promptly created and associated with the end users and alert-generated tickets. This allows the application support team to consolidate all the tickets associated with this issue, giving them valuable data that could help them quickly diagnose the root cause of the issue and work towards a resolution.
At the same time, the Help Desk Manager posts an announcement to the employee service portal that we are experiencing an issue when logging into Salesforce and we are actively working on resolving the problem. Now employees are aware of the situation and no longer submitting tickets, saving Tier One from a barrage of inbound tickets in their queues.
The Application Support team figures out what the problem is and deploys a fix that resolves the issue. They then resolve the problem record, which resolves all attached tickets, including the one generated by Orion. The team was able to react fast, keep the organization informed on the situation, and quickly diagnose and resolve the issue. IT saves the day again.
Although the above scenario may be a common use case, it is only one of the vast number of use cases that can be supported by this integration. As you begin using this integration we would love to learn more about your use cases and what impacts they made to your team and your organization. Share your stories in the comments below!