cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Halloween 2016 and the Data Center: Where the Stuff of Nightmares Lurk

Level 15

Double double toil and trouble, fire burn and cauldron bubble!

Scale of dragon, tooth of pup...turn this downtime into up!

Well, it was worth a shot…

With Halloween just around the corner, it’s gotten us to thinking about how nice it would be to fix all the terrifying data center and network woes out there simply by casting a magical witch’s spell. After all, trouble always seems to be lurking just beneath the surface, ready to wreak havoc at any moment. Of course, we all know too well that when issues brew with our infrastructure or applications, there are no magical words to turn the dark of night into day. I mean, when was the last time you used eye of newt to ward off a data breach?

Of course, there are some data center and network catastrophes in which it may seem like a magic potion is the only hope. You know, the stuff that haunts your nightmares, the stuff that requires the most advanced IT sorcery to quell.

On that note, in honor of the most frightful day of the year, we want to hear what your greatest data center and network fears are. The troubles that are almost even too terrible to name.

So, using the comments section below, please describe your greatest IT fears. Share your thoughts with us by October 19 and we’ll give you 250 toes of frog, err, I mean, THWACK points in return.

167 Comments
Level 12

My greatest and worst nightmare would be the motherboard on the server will blowup beyond repair and take days for the parts to come in and everything will be down and everyone will be screaming at me to repair it because they cant get anything done.

Level 16

My worst nightmare ever, far worse than a computer meltdown was when I used to care for 'everything electronic' in a large office building.

The break room TV broke so I went online and ordered another one first thing in the morning, no big deal....

....Until about 2:30pm when about half of the office missed their daily soap operas.

It became very apparent what the most critical device in the company was

Level 9

Would have to be upper management not understanding the importance of redundancy. Out ERP system is at headquarters and all plants communicate with it almost nonstop for labels and shipping information to get product out. We have already had the headquarters WAN connection die and had to wait for hardware to arrive to replace it before. I always fear the day it dies again and we have this problem as it slows/stops shipping and product moving to a halt. Trying my hardest to get the money to add a secondary line for redundancy for auto fail-over.

Level 10

I defiantly fear a disaster.  We keep having DR tests and builds, but we literally have nothing actually working and taking live traffic.  Even if we did, it would have to be a localized disaster (Fire, tornado) to work, as our DR site is in the same metro.  Pretty convinced if we lose our DC, we will lose the business. 

....and the next morning they found the hook hanging from the door handle of rack 4, row 5.....*shivers* 

Level 13

I saw this one happen at my last job a few short years ago.  It wasn't pretty.

Level 13

I agree with kdeal​. We are almost finished our DR site...the last little bit is to set up printing cheques and my biggest fear (besides someone hiding under my bed or in my closet) is that something bad will happen before I finish and am able to do a full test...

Level 9

Becoming a statistic in the hospital ransomware hit list despite our best efforts...

MVP
MVP

I am fearful of the redundant connections we have to everything not working like they are supposed too. Even though we test them throughout the year, I know there will be one time where all the crap will hit the fan so to speak., Besides that the other thing is the Generator not kicking on one day when there is a power failure.

I have lots of fears. Some irrational; others more rational. I think my leading fear is "when" my Orion installation takes an unscheduled dirt nap - for whatever reason - because not only will "I" be flying blind, so will all of my datacenters, applications, and happy little systems. Having been through some meltdowns of various varieties - I've seen where the C-Levels run for information; straight to my NOC screens. Sure I've planned for contingencies, I've tested my backups, I have cold hardware waiting in the wings. I can hear the cautionary voices screaming, "But Jeremy - JUST GO TEST IT" but to that I retort - trying to actually convince people to prepare for war during peace time - can be exhausting so I continually try to aim small, miss small to eventually hit the bigger target. HA; I'm looking directly at you.

Level 13

At my last job, we built a new data center due to campus remodeling.  Part of the requirements we asked for were a generator and redundant AC units.  Well, they put everything into place, but over time we discovered some problems. 

The first was they tested the generator weekly, which is a good thing, but they didn't program the power switching right and switched the data center over to the generator when they tested.  This resulted in a power drop to the small building as they switched it over.  The data center was protected by UPS, but not our workstations and other electronics outside the data center.  It forced reboots of all electronic systems.  It blew out the Cisco TelePresence controller completely... twice.

The second was the AC units.  They didn't put those on the generator, so when we had our first real power outage.  Well, the data center heated up in under 10 minutes and things tried to auto-shutdown.  We lost some SAN hard drives in the event.

The third was also AC related.  They didn't program the fail over correctly.  The weekly switch over was working fine, but when one of the units failed, the switch over didn't happen due to a communication problem between the units.  Data Center overheating started, but since we were in the building, we heard all the server fans spin up and ran in to manually switch over the AC units.  If that had happened when we were out at night or on a weekend.... 

We had requested in our specifications that all these conditions would be met so they wouldn't happen, but somehow they still existed for one reason or another.

Level 12

Two very real, and one almost true, scenarios I fear. The first is walking in and finding the data center or the DR center under water. We have had several near misses related to water, one from torrential rainfall, with our data center that houses the main SAN. The second is coming in and finding that the roof caved in under a huge snowfall the night before right into the DR center. Being in northeast wisconsin, this one is a potential as well. The contractor that built the building our DR center is in has had some roof related issues in a few of their other buildings

MVP
MVP

Scary Clowns running around on backhoes digging all around IT shops  and street corners looking for that missed fiber or twisted pair.

Actually my last 2 jobs have had me in the flight path of an airport.  Previous was DFW international so we always had large and heavies flying overhead either on approach or takeoff.  Currently it is a small field with pilot training so most of the traffic is light but it can handle 737 sized aircraft.  Thus it is not a matter of if, but rather when we may be struck.....

My biggest fear, and it is one I used during a tabletop BCP/DR exercise, is a successful, and destructive, network hack against our corporate network during the wee hours of the morning when no one is at their desks (This company has made a conscious decision not to have a 24x7 NOC). By the time the Infrastructure Group wakes up there could be nothing left.

MVP
MVP

I don't know about a datacenter fear, but a datacenter reality.  Our old datacenter had a water main break in the server room.  The entire server room filled with water.  Luckily, we had a basement that took most of the water (a few feet of standing water that is).  We had already started building a new datacenter at this time, so the datacenter move became very important in a very short time.

Level 14

My biggest fear is loss of our SolarWinds monitoring, take today when one of network team configured an ASA to send syslog into our system, at some 2,000 messages a second.

The DB filled so rapidly we couldn't keep up, filled the logs whilst trying to bulk delete the unwanted messages and ended up having to split the database cluster and put one half offline the DB whilst it recovered.

And in doing so proved that our HA SQL cluster wasn't quite configured correctly or tested fully.

#facepalm

Oh well, another day, another grey hair (or two).

Level 13

Speaking of water mains, the local fire marshal REQUIRED water sprinklers in our data center.  Umm..  Electricity and water don't mix.  At least that marshal agreed to a higher temp failing sensor than the real data center fire protection.

OK, you DID say "show us your greatest IT FEARS", not actual personal first-hand data center events I had to clean up after . . .

1.  Water in the data center:

pastedImage_0.png

pastedImage_1.pngpastedImage_2.png

2.  Data center fires:

pastedImage_3.pngpastedImage_4.png

I'm also not particularly looking forward to:

  • My name or my company's name showing up in the headlines for anything negative
  • Compliance fines
  • Lost, stolen, hacked, or changed data
  • Level3 network being down :  http://www.networkworld.com/article/3127062/lan-wan/level-3-acknowledges-network-outage.html
  • And the rest are the usual--war, flood, famine, pestilence, teenagers, malware, someone going road-rage on me, anti-social events of all types, and running out of Heinz Ketchup.  I'm not too found of the cafeteria being out of chocolate milk, either.
Level 7

Our DR tests only fail over specific parts of the application....never a full DR test...when the network crashes and the applications fail to fully fail over...that is going to be a terrible day.

Level 9

rschroeder wrote:

Pretty much this one.  We have our colocation at a Switch SUPERNAP with a 10G primary link to Level 3.  Our backup links are Switch's "blended" circuit that jumps off of multiple providers.  The problem is a LOT of the routes still go through Level 3 so when Level 3 goes down half of our connections to remote sites still don't work on our backups.  *shiver*

Other than that it would pretty much take a nuke to knock out Switch and our servers there.  Top notch facilities.

MVP
MVP

What - Level 3 go down ?

Level 12

You must have had an awesome day Tuesday then!!!

Level 9

The greatest fear I've had had come true was warning my management that the intended placement of refrigerator-sized line-conditioner/UPS directly under the water-loop chiller (due to mandatory "advice" by the city's inspectors/structural engineers because of the weight of the batteries) was a hazard because of condensation and leaks getting into the large box of batteries, and being told by our people "that can never happen".  Less than 2 seconds later, we all heard the drip...

MVP
MVP

Reminds of my previous life (decades ago) as an operator, our data center was under the roof/parking deck of a portion of our building.  It leaked.

We went from plastic tarps to channel the water to barrels between the "roof" and the drop ceiling.  Eventually we went to trays the shape of ceiling tiles but about 2" deep to capture water.  In either case we would have to use a wet/dry vac to keep the water from building up.

Then we had the building across the street blow up one sunday morning.  Reminded me of a good 4.4+ on the richter scale.  It moved interior walls but we did not have a single head crash in any of the DASD farms (IBM, Tandem)

Level 12

What sound followed the drip? A zzzzzZZZZT, POP, or AAAHHHHHH?!?! lol

Level 8

I have been at this for a long time and fears have ceased to be relevant to what I do. However, I have concerns that I try to ensure do not turn into nightmares.

  1. I am concerned we will need to access our disaster recovery site and won't be able to reach it.
  2. I am concerned that I will miss something, which will result in failing to protect our organization, be it from malicious agents, power failures of critical systems, etc.
  3. I am concerned that my advice will not be heeded and as a result the organization will suffer.
  4. I am concerned that I will offer bad advice, which will be heeded and as a result the organization will suffer.

This is why I strive everyday to ensure none of these events have a minimal chance of coming to pass.

Level 10

My worst nightmare working with a Disaster Recovery company is that we would have our own disaster.   With the impending Hurricane getting ready to hit the east coast, having our own Disaster could be haunting!   Cannot always account for Mother Nature and her ways of splendor!

Level 10

My worst nightmare would be a repeat of five hospitals being down for three days because of a network loop caused by bug in our Cisco code.  Plus I had to sit in the NOC for 12 hours looking at a dashboard with everything down.

Although hackers are a more practical and immediate fear, lest we forget more possible negative events, have I mentioned The Carrington Event recently (Solar storm of 1859 - Wikipedia, the free encyclopedia ), and how today's IT infrastructure will be affected by it when it occurs again?

If you want to find something that will negatively impact virtually every aspect of your I.T. life, and your home life, and that of your family and everyone you know and communicate with, this event fits the bill pretty well.

Level 15

someone getting curious about the "Example Reboot Job" in NCM...

2016-10-05_16-39-48.jpg

Level 9

The very next sound was our Facilities Guy saying "Sonofa...." because he was the one that stood there and swore to us there would never be a leak.  The next sound after that was the thump of a trashcan being placed under the leak.  If we hadn't been there, right then, arguing with management that there was an unsafe situation, we probably would have found out after catastrophic failure and/or a fire.  A drip pan had previously been installed in the suspended ceiling... with the drain on the _high_ end (and no moisture detection installed).  There was an inch of water in a tray that was about 3' wide and 10' long, hanging by wires *right* *over* a 100A three-phase power conditioner.  The drip was because the crimped seam finally started to fail after holding water for weeks...

MVP
MVP

My fear is another large customer fooling around with IPAM and removing all of their DHCP scopes from the DHCP server. That little misclick required a server restore.

pastedImage_0.png

Level 14

Current worst fear is that our company suffers a major reportable security breach (by creepy clowns not currently in the news, the kind that hide behind screens)...

Level 8

My biggest fear now is that we'll go back to managing everything with spreadsheets.

Greatest fear, the new Cisco UCS will go down and our SAP and VMware environments running on it will go away effectively crippling the business.   Recovery is an option but the time it would take is still crippling.

Second would be a communication failure.   With the current hurricane warnings here, in Florida, 10/6/2016, I was trying to let management know the seriousness of the current communication structure.  Being in rural areas you get what you can get with internet.  we are blessed with a 50 mb fiber circuit, but that fiber is also carrying you voice, and the geniuses at Bright house networks ran it down the power poles.  Hurricane force winds are around 74+ mph.  The current power poles are small and wooded.  lots of trees and things that can easily fly around.  What is the likelihood that something will take down a line or pole?  Well its pretty high chance.  The good news is that we wont be able to call and complain because our PRI will be down with our internet and potentially power. 

Breaches are also a concern but much less that a general hardware or communication failure.

For some reason the fetal position is starting to look better and better...

rschroeder​ feeling the text term LOL....  teenagers!!!!!!!!!!!!!!!!   freaking fantastic....    I am not sure about the Ketchup, i am a BBQ sauce and ranch or honey mustard fan, never been fond of Ketchup.   Try fry sauce sometime if you are ever out west, it will change your life.   And i give you a big AMEN to the chocolate milk.   Sadly we don't have a cafeteria. 

Level 16

My son and I were discussing this last night.  They predict a 12 hour warning before a major storm, but all we can do is start printing every config and start sharpening our pencils.... 

Yes I agree with you rschroeder​ this is something we have no control over and could change our lives...

Level 16

everyone at work will realize the truth.....

Wow! A fellow SAP/Thwackster. I didn't think there were too many of us. Tell me, are you involved at all with SAP administration?

Level 10

We have our VDI environment hosted on one server.  If the motherboard failed, at least i could say "I told you so" on the way to the unemployment line.

Level 20

lolol those pictures @rschroeder

swimming in the DC!

Level 10

With Hurricane Matthew looming out in the distance, and about half of our foot print in areas a mandatory evacuations it's quite sobering. If I had to say the thing that scares me the most is that we have something critical hanging out there and not being backed up. Even worse that we will need to restore something from a back up and we won't be able to. That's not my area of responsibility and I have complete confidence in my teammates doing their job, just still can't shake that as an irrational fear.

MVP
MVP

It is better to have the have the fear and have it be unrealized than to be totally confident and have everything dashed, smashed, folded, spindled, and otherwise mutilated.  That way you are looking and testing for the weak link as well as being aware that there may be a weak link knowing Murphy likes to throw a wrench and other stuff into the mix at the worst possible time.

Level 13

Fear is the path to the dark side...fear leads to anger; anger leads to hate; hate leads to...suffering!

Okay... am I the only one who is thoroughly impressed with the water-tightness of the server room in the first picture? And how about the cleanness? You can see through that water like a fishtank. That means that room is really clean. Kudos!  <golf clap!>

I don't like to be a "Chicken Little" running about screaming "The sky is falling! The sky is falling!"

But we're foolish to ignore the impact the next major solar coronal ejection will have if it is facing our Earth.

Scientists say they think there's a 12% chance of this happening in the next decade.  People bet on the lottery (pick six numbers from a pool of 49 numbers) and face odds of roughly one in 14 million.

12% odds is one in nine.

pastedImage_0.png

White House prepares for catastrophic solar flares that could end civilization | Daily Mail Online

Time to be afraid - preparing for the next big solar storm: Kemp| Reuters

pastedImage_3.png

pastedImage_4.png

I ran into a coworker who'd done this very thing--removed the DHCP scope for a Radiology subnet in numerous sites.  As leases expired or devices moved about, health care providers were calling to ask if the network was down.

I'm not advertising Infoblox in any way, shape, or form, but Infoblox has a "Trash" folder that contains all deleted subnets / scopes, and restoring deleted subnets is literally as simple as dragging the subnet out of the Trash and dropping it back onto the active subnets view.  It's instantaneous and effective, and so simple and intuitive that anyone can do it without making a mistake.

Problem resolved with one click!  I'm impressed with Infoblox as a DHCP/IPAM solution.

MVP
MVP

are you sure the rest of the building/floor is not at the same level ?

Note the fish in the foreground . . .

Suffering leads to . . . budget increases--or employees moving on to better environments!

Level 12

Users with mapped public drives on servers and CryptoLocker....