Imagine you're tasked with creating a simple NPM page / view that does the following:
- The view into NPM needs no login, no authentication to access is, as long as requests to access it are from the internal network, not from the Internet.
- The view is completely Read-Only.
- The view shows only the sites that are down, based on the router. The list and notification does not include any nodes behind the router that are also unreachable.
- View's list of down sites includes only the following:
- A red node down icon
- The site name (which is inside the routers' "snmp-location" field). Somehow NPM/NCM must learn this from NCM's archived configuration files.
- The WAN provider's circuit ID, which is pulled from a separate spreadsheet that is regularly updated.
- The telephone number of the WAN provider's NOC or Service Desk
- The time the node became unreachable
- A web page view of that node's Electrical Power Company's Outage Map. Or at least a link to that map.
- NPM / NCM must the list of down routers to automatically generate some of the information above based on information learned about the router from NCM and a spreadsheet, and perhaps other sources.
- NCM or NPM issues basic troubleshooting commands via SSH to core routers to determine if a route is present to the site that is down.
For example: if router X with IP address 10.100.3.8 is reported down, NPM/NCM automatically sends a command via SSH to the appropriate core routers and issues a "show ip route 10.100.3.8" command.
- If there is, or IS NOT a next hop bgp address
- NPM/NCM notes the information in this high level view and says "A route exists for this site", or "A route is not present for this site" and includes that information in NPM's view
- NPM sends out this information via pager or e-mail
- When a router goes unreachable, a special list of advanced / administrative / C-Level / Director Level / Managerial Level users is sent an e-mail or page, separate from ordinary notifications to Network Analysts.
- NPM's view for the high-level users should display the tasks that should be done. Perhaps show a list of important users at each affected site to be notified. Perhaps behavior or displayed information varies based on custom fields, if the site is 7x24, or perhaps it's a core or distribution site, perhaps actions vary based on the down nodes' names, etc.
I've seen some interesting NOC Views in Thwack. This discussion is designed to expand on those ideas, and help others understand how to create those NOC views, or new and improved views for C-Level people. What would you do in NPM to reduce the need to wake technical staff in the night? What part of their jobs can be automated to more quickly get the WAN vendor on top of outages?
If the Help Desk sees this screen, they should see instructions about what they must do to more quickly get the WAN provider on the case, while notifying C-Level or other specified people of the outage.
Instead of notifying a Network Analyst, the Help Desk contacts the WAN provider immediately, bypassing the need to wake a Network Analyst up. The Network Analyst becomes a middle man who would only remote into work, verify NPM says the site is down, and then look up the circuit ID and the phone number to call and get the WAN provider on the task of troubleshooting it.