cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Hard to monitor

With monitoring, we try to achieve end to end visibility for our services. So everything that is running for business critical applications needs to be watched . For the usual suspects like switches, servers and firewalls we have great success with that. But in all environments you have these black spots on the map that nobody is taking care of. There are two main  categories why something is not monitored, the organisational (not my department) and the technical.

Not my Department Problem

In IT sometimes the different departments are only looking after the devices that they are responsible for. Nobody has established a view over the complete infrastructure. That silo mentality ends up with a lot of finger pointing and ticket ping pong. Even more problematic are devices that are under the control of a 3rd party vendor or non IT people. For example, the power supply of a building is the responsibility of the facility management. In the mindset of the facility management monitoring has a completly different meaning to the one we have in IT. We have build up fully redundant infrastructures. We have put a lot of money and effort into making sure that every device has a redundant power supply. Only to find that it ends up in a single power cord that is going to a single diesel power generator that was build in the 1950s. The monitoring by the facility management is to go to the generator two times per day and take a look at the front panel of the machine.

TECHNICAL HARD TO MONITOR

And than you have the technical problems that can be a reason why something is not monitored. Here are some examples why it is sometimes hard to implement monitoring from a technical perspective. Ancient devices: Like the mentioned Diesel Power generator there are old devices that come from an era without any connectors that can be used for monitoring. Or it is a very old Unix or Host machine. I have found all sorts of tech that was still important for a specific task. So when it couldn´t be decommissioned it is still a dependency for a needed application or task. If it is still that important than we have to find a way to monitor it. It is needed to find a way to connect like we do with SNMP or an agent. If the devices simply support none of this connections we can try to watch the service that is delivered through the device or implement an extra sensor that can be monitored. For example of the Power generator, maybe we can not watch the generator directly but we can insert some devices like an UPS that can be watched over SNMP and shows the current power output. With intelligent PDU in every rack you can achieve even more granularity on the power consumption of your components. Often all the components of a rack have been changed nearly every two years, but the Rack and the power connector strip have been used for 10+ years. The same is true for the cooling systems. There are additional sensor bars available that feed your monitoring with data for the case the cooling plant can not deliver these data. With a good monitoring you can react before something happens.

IT IS PASSIVE

Another case are passive technologies like CWDM/DWDM or antennas. These also can only be monitored indirectly with other components that are capable of proper monitoring. With GBICs that have an active measurement / DDI interface you have access to real time data that can be implemented into the monitoring. Once you have this data in your monitoring you have a baseline and know how the damping across your CWDM/DWDM fibres should look like. As a final thought, try and take a step back to figure out what is needed so that your services can run. Think in all directions and expect nothing as given. Include everything that you can think of from climate, power and include all dependancy of storage, network and applications. And with that in mind take a look at the monitoring and check if you cover everything.

22 Comments
Level 16

Yes it is sometimes hard but not impassable....

you can monitor with RTU analog and digital inputs

so you will get how much diesel is in the tank ....

The question the CEO ask is just like analog sensor "how much?"

There should be balance between the 2...

"Hard To Monitor?"  It's nearly as hard to read, and that's unusual in a Thwack Blog.

The text on this post is unexpectedly disjointed.  I understand the content, but the presentation is distracting.  Maybe the output of the text editor used doesn't reproduce accurately when viewed in Chrome?

The carriage returns seem to automatically interrupt the natural flow, and even return in the middle of contractions.  It's very odd. Jumping lines in the middle of a sentence, before the logical end, is like putting a new paragraph in the middle of a sentence:

pastedImage_0.png

MVP
MVP

I think it's a cut and paste fail.

Level 9

Yes the format was not looking good. I cleaned up the HTML format.

There was a <p> in each line genrated from the editor. Sorry for that.

Now the text should be looking good from I format standpoint. tested with Chrome

and mobile chrome.

Thank you for yoru feedback.

I double what rschroeder and superfly99 say. Not the easiest read. Regardless...

   I am dealing with the 1-off's now: UPS, generators, time clocks, warehouse printers, conveyor belts, and others.

Level 12

One other possible category comes to mind:  the willfully unwatched.  Some network segments are isolated, often for valid reasons, and no one wants to let the NMS watch those segments.  Whether for reasons of security, compliance, or other concerns, the decision was made somewhere along the line to keep those so isolated that they remain unmonitored.  I can't say how often those situations occur, but if I had to guess, it's probably more common than one might think.

Level 20

We all probably have some of these hard to monitor or one off devices.  They can be a real pain to deal with for sure.

MVP
MVP

There will likely always be something un-monitored for some reason be it is legacy gear without proper interfaces, in an isolated network, gear you don't own/have any access to,   Then there are the out of the blue requests for things that people assume you can do...

Level 17

looks good now

Level 17

I think everything that could be monitored where I am, but is not; get's a bi-weekly or monthly test. 

*We'll know there is an issue when we fail the schedule test

*I would much rather be able to work with some groups to get their devices on a watch list, rather than just a ping cycle from their desk.

Thanks for your excellent work!

+1 for not my department

Level 14

Not completely related, but the diesel generator reference brought back a memory.  Picture a massive government server room.  The kind that takes most of a floor and serves an entire section of the country.  Friday night, a big electrical storm rolls through.  We lose power, but we have an UPS.  We hear the diesel generator on the roof automatically power up.  All is good.  However, about an hour later the whole server room dies.  We can still hear the generator running, but the server room has no power.

When the facility was built, the output of the generator was never tied into the server room.  Monitoring of this system would have been helpful.

Oh no! Murphy in high action!

Assumptions will be the end of I.T. . .

Level 16

We'll know problem...

In many Data centers they don't re size the environment sometime it's the redundant power source and sometimes it's cooling...

MVP
MVP

inconceivable..

Naw, knowing how government and big business have less than efficient reputations, some folks might say "inevitable."

Personally, I'm favorably impressed when I see effective and efficient designs and implementations in big business and government. 

Level 14

Power and cooling are issues where we are right now.  In a relatively new (10 year old) building no less.

Level 14

Sometimes there appears to be light at the end of the tunnel.

MVP
MVP

Hopefully it is not a train headed your way....

Level 14

but "train"ing is a good thing, right?

MVP
MVP

unless it is a storm..

About the Author
work for 15+ years in the networking industry. I have worked for many different sectors like industry, car manufactors and government. I am a monitoring enthusiast and have done Monitoring for large scale enviroments. I blog at networkautobahn.com and my recently started podcast can be found at networkbroadcaststorm.com