How Can I Alert on a Backup Failure of a node

Question

So, I am aware of this thread: https://thwack.solarwinds.com/product-forums/network-configuration-manager-ncm/f/forum/55296/backup-failure-alerts And I am also aware the job logs can be emailed, but that is a very crude email or don't email approach. 
 I also like the idea of the mop up action that RichardLetts showed, and I can see that working in some cases for us but not for all. We have >100 weekly backup schedules which attempt to backup >6k devices. We don't want folks to manually trawl through log files as we already get blindsided enough by emails and so need an automated method to do this. 
 Simply stated: 
 We need to alert on any specific node that fails to backup so that it can be raised as an incident. 
 
 - personally speaking I know this is likely to raise more manual work to go and trigger a fresh backup or investigate why it is failing, but in this glorious new world of automation, this is what management want. And, let's face it, there is only so much sense you can talk to management before they start demanding stuff

stuartd · Accepted Answer

And replying to myself because I can. As we do a rolling 7 day backup process then I just added: 
 AND AttemptedDownloadTime >= GETDATE()-7 Which only shows me any failures in the last 7 days.

jm_sysadmin · Answer

I see where you are headed, and it makes sense. 
 My NOC has a dashboard with these stats (and links to a report called NCM Backup Audit). The report is powered by the SWQL below, I don't remember where this came from. Might be default, I might have found it here on Thwack, heck, I occasionally even write it myself. 
 SELECT DISTINCT
CASE WHEN CT.NodeID IS NOT NULL THEN 'BackedUp' ELSE 'NotBackedUp' END AS NCMStatus, 
N.Caption, 
N.IPAddress,
N.DetailsUrl,
N.Vendor,
N.Status,
N.Icon,
NP.LoginStatus
FROM NCM.NodeProperties as NP
JOIN Orion.Nodes N ON N.NodeID = NP.CoreNodeID
Inner JOIN Orion.Vendors OV ON N.Vendor = OV.Name
LEFT JOIN NCM.ConfigArchive CT ON NP.NodeID = CT.NodeID So I 'know' that NCM will blank out a record from the ConfigArchive when the back up fails, which is why the report works. At least I hope it works and somebody smarter than me won't point out a huge gap in my understanding. It might happen. 
 To fix the query above to make Orion alerts happy, I bridge to the NCM nodes, then to the ConfigArchive. I look only at things missing archives, but do exist in NCM. This alert assumes you back up everything in NCM, so you might need more conditions in your where statement. The whole SWQL is below that I tried. 
 SELECT top 100 Nodes.Uri, Nodes.DisplayName 
FROM Orion.Nodes AS Nodes
Left Join NCM.Nodes NCM ON Nodes.NodeID = NCM.CoreNodeID
LEFT JOIN NCM.ConfigArchive CA ON NCM.NodeID = CA.NodeID
Where CA.NodeID IS NULL and NCM.NodeID is Not Null 
 And the alert might look something like this. 
 
 I haven't done more that a brief test, but it seemed right here. Let me know what you think.

How Can I Alert on a Backup Failure of a node

Top Replies