Greetings Thwack community!
Recently I had a Palo Alto have a problem with it's card in slot3. SolarWinds logged a set of events:
| 3/20/2021 21:38 | | wdc-ent-fw01-Slot-3 Data Processor-0 Software Packet Buffers- Volume has reappeared. Data collection resumed. |
| 3/20/2021 21:38 | | wdc-ent-fw01-Slot-3 Data Processor-1 Packet Descriptors- Volume has reappeared. Data collection resumed. |
| 3/20/2021 21:38 | | wdc-ent-fw01-Slot-3 Data Processor-0 Hardware Packet Buffers- Volume has reappeared. Data collection resumed. |
| 3/20/2021 21:38 | | wdc-ent-fw01-Slot-3 Data Processor-0 Packet Descriptors- Volume has reappeared. Data collection resumed. |
| 3/20/2021 21:38 | | wdc-ent-fw01-Slot-3 Data Processor-1 Software Packet Buffers- Volume has reappeared. Data collection resumed. |
| 3/20/2021 20:47 | | wdc-ent-fw01-Slot-3 Data Processor-2 Packet Descriptors- Volume has reappeared. Data collection resumed. |
| 3/20/2021 20:47 | | wdc-ent-fw01-Slot-3 Data Processor-2 Hardware Packet Buffers- Volume has reappeared. Data collection resumed. |
| 3/20/2021 20:47 | | wdc-ent-fw01-Slot-3 Data Processor-2 Software Packet Buffers- Volume has reappeared. Data collection resumed. |
| 3/20/2021 20:24 | | wdc-ent-fw01 - ethernet3/27 · Old Slot 4 Port 25 Interface restored. Data collection restarted. |
| 3/20/2021 20:24 | | wdc-ent-fw01 - ethernet3/26 Interface restored. Data collection restarted. |
| 3/20/2021 20:24 | | wdc-ent-fw01 - ethernet3/28 · Old Slot 4 Port 26 Interface restored. Data collection restarted. |
| 3/20/2021 20:24 | | wdc-ent-fw01 - ethernet3/25 Interface restored. Data collection restarted. |
| 3/20/2021 20:10 | | wdc-ent-fw01 - ae3 · Trust Up |
| 3/20/2021 20:08 | | wdc-ent-fw01 - ae4 · Trust Up |
| 3/20/2021 20:04 | | wdc-ent-fw01-Slot-3 Data Processor-1 Packet Descriptors- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:04 | | wdc-ent-fw01-Slot-3 Data Processor-0 Hardware Packet Buffers- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:04 | | wdc-ent-fw01-Slot-3 Data Processor-0 Packet Descriptors- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:04 | | wdc-ent-fw01-Slot-3 Data Processor-1 Software Packet Buffers- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:04 | | wdc-ent-fw01-Slot-3 Data Processor-1 Hardware Packet Buffers- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:04 | | wdc-ent-fw01-Slot-3 Data Processor-0 Software Packet Buffers- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:04 | | wdc-ent-fw01 - ethernet3/27 · Old Slot 4 Port 25 (index: 224) Interface no longer exists. Data collection terminated. |
| 3/20/2021 20:04 | | wdc-ent-fw01 - ethernet3/28 · Old Slot 4 Port 26 (index: 225) Interface no longer exists. Data collection terminated. |
| 3/20/2021 20:04 | | wdc-ent-fw01 - ethernet3/26 (index: 223) Interface no longer exists. Data collection terminated. |
| 3/20/2021 20:04 | | wdc-ent-fw01 - ae4 · Trust Down |
| 3/20/2021 20:04 | | wdc-ent-fw01 - ae3 · Trust Down |
| 3/20/2021 20:04 | | wdc-ent-fw01 - ethernet3/25 (index: 222) Interface no longer exists. Data collection terminated. |
| 3/20/2021 20:03 | | wdc-ent-fw01-Slot-3 Data Processor-2 Packet Descriptors- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:03 | | wdc-ent-fw01-Slot-3 Data Processor-2 Hardware Packet Buffers- Volume no longer exists. Data collection suspended. |
| 3/20/2021 20:03 | | wdc-ent-fw01-Slot-3 Data Processor-2 Software Packet Buffers- Volume no longer exists. Data collection suspended. |
SolarWinds managed to find nothing to generate an alert on, despite successfully logging events. Notice the interfaces involved were not down, the volumes did not generate errors, and in addition to these issues, the panSysHAState changed.
The good news here is that a failover happened at the start of the strong of events, so there was minimal impact to the users. However for an hour and a half this firewall was inoperable, and SolarWinds saw 30 events, so there is opportunity here to catch thse the next time they happen. So what do we do to make that happen?
The first thing I have done so far is to capture changes to the panSysHAState oid. The way I did that was to create a custom property to store the HAState in, then use an alert to compare panSysHAState to the toe custom property. Then after the alert is dispatched, let some time pass, then store the new state in the custom property. So during an ha failover, notifications go our to get people to look at the firewall.
I saw another thread where someone asked for help dealing with events like this, and they were told to use "volume down" and "interface down" alerts. I would like to point out that no interfaces went down, no volumes went down.
When I see these "resource no longer exists. Data collection terminated/suspended." alerts, what do I even alert on to catch them? What OIDs are these coming from? is there a way to get details of an event so that we know how to build an alert for it?
Keoki