Hi,
I'm looking to use alerts & actions to automate the diagnosis of incidents and was wondering if anyone has achieved this or would know how to do the following? Currently, I have an alarm set up with the following conditions:
If node old status != up & new status = up, execute NCM script (show version, sh ip bgp nei, show logging etc.), email the output using ${N=Alerting;M=Notes}.
This works okay and does what I want but the issue is the script cannot be run until the device is back up, which could be too long, it doesn't allow the NOC to progress with third-party providers or customers straight away.
I would like SolarWinds to do the following:
1. Detects R01 is down.
2. Log into R02 and SSH across to R01 via an inside address e.g. HSRP management IP.
3. Runs diagnostic script on R01: sh ver, sh logging, sh ip bgp summ etc.
4. Sends an email with script output using ${N=Alerting;M=Notes}.
The way I was thinking of doing this is defining a relationship between the nodes so SolarWinds knows they're a pair. Create a custom field ${HopAcrossIP} which will store the IP to SSH to the other device, and create the following script to call/[parse the information.
ssh -l ${Username} ${Node.Y.IP}
${password}
ssh ${HopAcrossIP}
${password}
[Run Script]
exit
exit
Would this be possible? Is this the best way to achieve it? I feel like it would be a lot of manual data input at the start but also lacks dynamic changes which could become a nightmare to manage.
Thanks
Dillon