cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 8

JunOS Chassis alarm alerts

One of our Juniper EX9214 had an Chassis error on FPC7.  We didnt get any kind of warning from Solarwinds that there was an error on the box. I looked through the resource list and am not sure how to make sure we are getting the errors that happen on the Chassis. Anyone know how to fix this issue?

Currently running Orion Platform 2016.2.100, NCM 7.5.1, NPM 12.0.1  I will be upgrading our platform to the most current versions here shortly and Server 2016 /SQL 2016.  Are there more options for Polling statistics on JunOS in the more current versions?

Thanks,

SCJB-re0> show chassis alarms

1 alarms currently active

Alarm time Class  Description

2018-01-24 19:52:11 PST  Major  FPC 7 Major Errors

{master}

SCJB-re0> show log messages | match fpc

Jan 24 19:52:11  SCJB-re0 fpc7 XMCHIP(0):XMCHIP(0): FI: Link sanity checks - Type 4, Seq Number 825, Stream 40, Link0 0x8, Link1 0x2, Link2 0x3100

Jan 24 19:52:11  SCJB-re0 alarmd[3055]: Alarm set: FPC color=RED, class="CHASSIS", reason=FPC 7 Major Errors

Jan 24 19:52:11  SCJB-re0 craftd[1913]:  Major alarm set, FPC 7 Major Errors

Jan 24 19:52:21  SCJB-re0 fpc7 XMCHIP(0):XMCHIP(0): FI: Link sanity checks - Type 1, Seq Number 1131, Stream 40, Link0 0x18, Link1 0x8, Link2 0x20

Jan 24 19:52:30  SCJB-re0 fpc7 XMCHIP(0):XMCHIP(0): FI: Link sanity checks - Type 1, Seq Number 1373, Stream 40, Link0 0x28, Link1 0x4, Link2 0x4000 ß

Jan 24 19:52:30  SCJB-re0 fpc7 TNPC CM received unknown trigger (type Queue, id 1)

SCJB-re0> show log chassisd | match "fpc | kernel | tnp"

……

Jan 24 16:23:50 ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878386

Jan 24 17:23:49  ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878384

Jan 24 18:23:48 ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878382

Jan 24 19:23:47  ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878380

Jan 24 19:52:11  send: red alarm set, device FPC 7, reason FPC 7 Major Errors

Jan 24 20:23:47 ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878378

Jan 24 21:23:51 ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878375

Jan 24 22:23:50 ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878373

Jan 24 23:23:50  ch_gencfg_chassis_startup_time_blob_set: chassis startup time set in kernel 1465981297.878370

0 Kudos
6 Replies

Here are the two UnDP we use,

Juniper - Red Alarm Count.UnDP

Juniper - Yellow Alarm Count.UnDP

Plus the Alert definition:

Juniper Red Chassis Alarm -- Alert

Aside: we write run-books for alerts, and reference them in the alert message so staff know what to do. I am not overly-specific about what to do, and the checklists are somewhat short. Here is an example -- note I am asking NOC staff use their judgement, and read messages! I am not about reproducing the vendor documentation.

Red Chassis Alarm

This alert triggers when the juniper device indicates the 'red chassis alarm' light is lit on the chassis.

Red Chassis alarms indicate some kind of hardware issue on Juniper devices; this is a high-level alert, and there may be more specific alerts for the specific issue the device is having.

Impact to Customers

This normally indicates some serious issue that needs to be investigated, in most cases there might not be an immediate impact but it could indicate some Access points are not getting power, or that one power supply has failed, or some other foreshadowing of something major.

Remediation Steps:

  1. Check the device is reachable
  2. Check device for the specific failure
    show chassis alarms
    show log messages
  3. Use your Juniper account to search for the message and determine its impact,
    the default impact should be three (3 - Low)
    if adjacent devices (Access points, UPS) are impacted increase the impact to 2
  4. Consider if an out-of-hours-page is necessary

Escalation Info

Team xxxxx  for routers, and xxxx switches

Team YYYY for other switches

Level 16

Hi

Solarwinds is not polling Juniper red /yellow status OOB

Make a universal poller alert for those 2 oid

That will alert you on any issue that trigger from the system

0 Kudos
Level 8

I've never setup a custom poller. Can you verify this looks correct? Possibly post a screen shot of how your's is setup?

pastedImage_0.png

Thanks

0 Kudos
Level 16

Sure If you can wait until tomorrow...

I can send the universal poller and the alert logic

You are right now on the wrong place..

0 Kudos
Level 8

Awesome that would be very helpful.

Thank you!

0 Kudos
Level 16

Well Richard was faster..

SolarWinds Orion Network Performance Monitor Universal Device Poller UnDP - YouTube

That youtube will help with the export of those 2 universal poller

To export the alert XML you can from the web  Manage Alerts

to RichardLetts

I will update the KB with

show system alarms

If you don't  push "request system configuration rescue save " you will have alot of yellow alert..

0 Kudos