Skip navigation

I hope everyone enjoyed the long holiday weekend, and that it was alert-free for you and your company. We hosted 17 people at my home, including some family members we have not seen in years. It was a good time, but tiring, so I'm going to use this week to recharge my batteries a bit.

 

Of course, that won't stop me from putting together The Actuator this week. So here's a bunch of links I found on the Intertubz that you may find interesting. Enjoy!

 

Thinking Beyond the Network Layer: Why the Entire Attack Surface Counts

If you think IoT is a security issue, wait until microservices hit you right in the DevOps.

 

Security Risks of TSA PreCheck

This is a tough one for me. I fly a lot, and I feel the TSA is fairly useless with regards to security. However, the alternatives presented for removing the TSA are often those that infringe upon a right to privacy, or are overly militaristic. Not sure I want to give up those rights in exchange for a few extra minutes of security theater when I fly.

 

Four Cybersecurity Resolutions for 2017

This article could have just stopped at "move beyond passwords," but that advice is worth sharing as much as possible.

 

Virgin America flight delayed after passenger Wi-Fi hotspot named 'Samsung Galaxy Note 7'

There's no mention if the passengers beat this person with their complimentary SkyMall magazines.

 

Canada declares ‘high-speed’ internet essential for quality of life

Well, since they figured out healthcare already, now they can focus on the important things.

 

SQL Server on Linux: How? Introduction

Some light holiday reading for adatole , sharing some of the details on how the SQL Server team ported their code to Linux. Okay. I might be the only one that finds this interesting, not just from a tech geek viewpoint, but from a business viewpoint, too.

 

Quite possibly my favorite gift this year:

eat-bacon.jpg

Among your warm holiday memories probably lurks the recollection of searching through a strand of lights to find the one that takes down the entire string. Through the years, most of us have either seen this play out, or experienced that frustration firsthand.

 

The frustration, tedium, and time required for even the savviest of today’s IT professionals to unravel issues in a complex IT infrastructure can be a lot like that downed string of lights. Often, the search for that one glitch, that single, solitary issue, puts the entire system in peril.

 

That’s because today’s technology infrastructures – and federal IT infrastructure is no different – rely heavily on application stacks. These stacks are made up of layers that work in concert to run critical components supporting application delivery. The problem is that one faulty layer can adversely impact the entire operation, just like that one bad Christmas light bulb.

 

So where does one begin to tackle this complex and evolving IT infrastructure and all of its potential issues? Rather than hunting and pecking for problems, the solution is obtaining a holistic view of the entire stack – constantly monitoring the parts – to either identify issues more effectively for quickened resolution, or even fix problems before they arise.

 

End-to-end monitoring of the IT infrastructure is essential. There are platforms available to do just that, including solutions whose sole focus is ensuring critical monitoring. Therefore, it is important for federal IT teams to consider deploying solutions that provide:

 

  • Complete application performance monitoring
  • A performance view across on-premises, virtualized, and cloud components
  • Consolidated metrics
  • Alerts that allow for faster identification and resolution of issues

 

Many hands make for lighter work

 

Today’s issues are complex and require highly skilled professionals. To help ensure that your team is ready to tackle those critical issues head-on, establish cross-departmental collaborative processes so that every team member is involved in identifying problems and working toward solutions. This collaborative approach will avoid silos within a team and help ensure that the organization is stronger and better prepared to confront issues as they arise.

 

Empowerment can work wonders to enhance an IT team’s performance. It helps to keep the team functioning as a whole and collaborating to make sure issues are quickly resolved, and also that every team member feels a sense of ownership in the outcome.

 

Bring back the holiday cheer

 

The IT department’s spirits need not be dampened by a darkened string of lights!  That is, not if proper processes are put into place to ensure the team is collaborating and taking a holistic approach to an agency’s complex IT infrastructure. Provide them with proper processes, empowerment and the essential tools to monitor the network, and they’ll be in the spirit all through the year.

 

Find the full article on Government Computer News.

Here we are in week 4 of the challenge, with just one more to go. I cannot stress enough how amazed, impressed, and moved I have been with some of the responses. And with others amused, informed, and even entertained. The heartfelt and thoughtful responses from both occasional posters and regulars alike has become a cathartic way for me to take 30 days assessing the year from various perspectives: how was 2016 for the IT industry, for my career, and for my personal growth with so often takes a back seat to other things which appear, in the moment, to be priorities but of course are not once the clarity of hindsight kicks in.

 

I know from the comments that many of you feel the same.

 

So what caught my eye this week?

 

Day 17: Awaken

Head Geek Destiny Bertucci (Dez) offered up a poem by Naima (Micah - Wikipedia ) on the theme of the day and how this poem has inspired her. In response,

THWACK MVP rschroeder wrote, “Uffda!  That's quite a poem.  But some topics within it are too heavy for a Saturday morning.  So I'll stay cozy in my bed and wonder why I'd awaken so early on a cold and snowy northern Minnesota morning.”

 

Another THWACK MVP, jeremymayfield, had this to offer:

“When I think of awaken, I think of an inner spirit.   As a Christian I feel that we awaken our spirits when we are baptized, and we have to continue to recharge and feed that spirit with scripture, service, and prayer.  We can awaken our love for something and someone.   We can also awaken a desire or need to be better.   I find that there are time we awaken a sleeping giant in ourselves which can both inspire and drive us to great heights than we could ever imagine.”

 

And steven.melnichuk@tbdssab.ca was far more realistic about the meaning of “awaken” in his daily life:  “my mind has been awaken to the idea that my kids wake up really early. My body, not so much.”

 

Day 18: Ask

In his fourth essay of the challenge, senior product manager Ben Garves (thegreateebzies) took us on a trip down search engine memory lane, causing silverwolf to echo the sentiments of many: 

“Oh gawd... memories...gawd now I feel old...im not crying, YOU'RE crying!”

 

Meanwhile bsciencefiction.tv opined that

“I think in IT, as important as knowing things is knowing who to ask things. We cannot know everything, although I sometimes think my friends ssd and mikegale do know everything.”

 

And mldechjr took a biblical turn, saying,
ASK, AND YE SHALL RECEIVE
YE HAVE NOT, BECAUSE YE ASK NOT
“Although this is a biblical reference it is true with many things in life.  I found out many times when dealing people that that they might have been willing to do something or participate if they had been asked. That goes both ways, if you want to join in, ask and you will probably be welcomed with open arms.  Don't be silent!”

 

Day 19: Judge

For the 19th day of the challenge, resident poet network defender captured the thoughts of many of the other posters, offering this verse:

Try not judge your fellow man

We’re all dealt a different hand,

Go walk a mile in his shoes

Try expanding your narrow views.

 

But steven.melnichuk@tbdssab.ca quoted the thoughts of one of IT’s greatest philosophers, Yoda:

“Size matters not. Look at me. Judge me by my size, do you? Hmm? Hmm. And well you should not. For my ally is the Force, and a powerful ally it is. Life creates it, makes it grow. Its energy surrounds us and binds us. Luminous beings are we, not this crude matter. You must feel the Force around you; here, between you, me, the tree, the rock, everywhere, yes. Even between the land and the ship.” - Yoda

 

And THWACK MVP rschroeder began (as he does almost every day) with a definition and then his thoughts:

Day 19: Judge:    to form an opinion about, through careful weighing of evidence and testing of premises

I seek to reduce my judging other people and systems as much as possible.

Where judging must occur, I hope to do it impartially and only by fact and experience, leaving rumor or opinion or prejudice behind.

 

Day 20: Fulfill

The post for day 20 came from Anne Guidry, the editorial genius behind much of the content you see. She offered the key to her quiet brilliance when she wrote “I believe in making an effort, showing up, doing the hard work that has to be done, because work that is hard is fulfilling.”

 

This inspired mtgilmore1 to share a quote from Joel Osteen:

"We live in a culture that relishes tearing others down. It's ultimately more fulfilling, though, to help people reach their goals. Instead of feeling jealous, remember: If God did it for them, He can do it for you."

 

Meanwhile, THWACK MVP Radioteacher echoed an idea expressed by many for this day’s word:

“I am really fortunate.  At the end of most every day I see progress.  It is a very fulfilling work.”

 

And EBeach was quite pragmatic (and complimentary) when they said, “Thwack order got fulfilled in 3 days. How nice it is. Got the hoodie.”

 

Day 21: Love

Senior Director Jenne Barbour’s (jennebarbour) writing MO is to take her experiences of the day and relate them to the word for that day. Which led to a post full of thoughts about food and family.

 

That, in turn, inspired tomiannelli to write, “My music collection is not larger by some measures, yet it contains 245 songs [16 hours] with Love in the title. Myriad ways of describing an intense emotion. […] Expressing love to another, just in words, can be scary, but worth the risk. Taking that leap feels like you are falling off a cliff but focusing on love will make you miss the ground and that is all it takes to fly!”

 

THWACK MVP rschroeder Pulled a quote from Heinlein’s Stranger in a Strange Land: “Love is that condition in which the happiness of another person is essential to your own.”

 

But Richard Phillips caught me off guard when he opened up to the community, saying “Learn to appreciate the love(s) that you have in your life while you have them. Learn to reciprocate that love. Build memories so that you have something to hang onto once "it's gone."”

But you really need to read his whole post for context and to appreciate the elegance of his thoughts.

 

Day 22: End

Ironically, Head Geek Patrick Hubbard (patrick.hubbard) begins his challenge posting series with this word.

 

Meanwhile, network defender poesizes (is that even a word?)

It’s the end of the world as we’ve known it

The PC Police have shown it

You can’t say what you think

Or the snowflakes will sink

I think society has finally blown it.

 

But pattic quoted one of the four modern apostles (John, Paul, George, and Ringo)

And in the end the love you take

Is equal to the love you make

  • Paul McCartney

 

And THWACK MVP rschroeder offered both sides of the coin, saying “The end of a hard project.  The end of a tough week.  The end of a bad time.  It's like a super hero that arrives in time to prevent me from making a big mistake.  I can THINK the thoughts, but must never give voice to them.

 

The end of a wonderful project.  The end of a great week.  The end of a fun time.  It's like a refreshing experience that lightens one's burdens and gives one strength to turn and look ahead at the next adventures awaiting us.”

 

Day 23: Begin

On the last day of week 4 of the challenge, miseri pondered what it means to begin: “To begin can mean many things - a second chance, start of a new thing or even to take another way on the journey you are.  I am always excited about beginnings but we sometimes can get tired of new things when we think we have made enough use of them. The best way to live life is to view each day, each moment and each opportunity as the beginning of a stage in your life.

 

THWACK MVP jeremymayfield continued this line of personal introspection, saying, “How can I begin to begin.  Its often said when you fail you can always begin again.   In know this personally.   […]  To truly begin again is to let yourself be free of the pains, and issues that had driven you away from where you were going, or who you were and the person you wanted to be. Take time to reflect on the truly important things in life.  Make sure you appreciate what you have, and if you don't now is your time to begin again.”

 

And THWACK MVP rschroeder wrapped up by saying,

I'd begin to see the end of conflict, not its start.

A fresh new love, not a broken heart

A fresh sunrise at the break of day

Or a full moon's rise--that's my way

 

The cure that brings hope

A new DHCP scope

A flower's new bloom,

Not democracy's doom

 

I hope you all have been enjoying these posts as much as I have. I’m looking forward to what next week has to offer.

 

  • Leon

The story so far:

 

  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)
  4. It's Not Always The Network! Or is it? Part 4 -- by Tom Hollingsworth (networkingnerd)
  5. It's Not Always The Network! Or is it? Part 5 -- by John Herbert (jgherbert)
  6. It's Not Always The Network! Or is it? Part 6 -- by Tom Hollingsworth (networkingnerd)
  7. It's Not Always The Network! Or is it? Part 7 -- by John Herbert (jgherbert)

 

As 2016 draws to a close, Amanda finds that there's always still room for the unexpected. Here's the eighth—and final—installment, by Tom Hollingsworth (networkingnerd).

 

The View From The Top: James (CEO)

 

The past year has been interesting to say the least. We had a great year overall as a company and hit all of our sales goals. Employee morale seems to be high and we're ready to push forward with some exciting new initiatives as soon as we get through the holidays. I think one thing that has really stood out to me as a reason for our success is the way in which our IT staff has really started shining.

 

Before, I just saw IT as a cost center of the business. They kept asking for more budget to buy things that were supposed to make the business run faster and better, but we never saw that. Instead, we saw the continual issues that kept popping up that caused our various departments to suffer delays and, in some cases, real work stoppages. I knew that I had to make a change and get everyone on board.

 

Bringing Amanda into a leadership position was one of the best decisions I could have made. She took the problematic network and really turned it around. She took the source of all our problems and made it the source of all the solutions to them. Her investment in the right tools really helped speed along resolution time on the major issues we faced.

 

I won't pretend that all the problems in this business will ever go away. But I think I'm starting to see that developing the right people along the way can do a great job of making those problems less impactful to our business.

 

The View From The Trenches: Amanda (Sr Network Manager)

 

Change freezes are the best time of the year. No major installations or work mean maintenance tickets only ... and a real chance for us all to catch our breath. This year was probably one of the most challenging that I've ever had in IT. Getting put in a leadership role was hard. I couldn't keep my head down and plug away at the issues. I had to keep everyone in the loop and keep working toward finding ways to fix problems and keep the business running at the same time.

 

One thing that helped me more than I could have ever realized was getting the right tools in place. Too often in the past, I found myself just guessing at solutions to issues based on my own experiences. As soon as I faced a problem that I hadn't seen before, my level of capability was reset to zero and I had to start from scratch. By getting the tools that gave me the right information about the problems, I was able to reduce the time it took to get things resolved. That made the stakeholders happy. And when I shared those tools with other IT departments, they were able to resolve issues quickly as well, which meant the network stopped getting blamed for every little thing that went wrong.

 

I think in the end my biggest lesson was that IT needs to support the business. Sales, accounting, and all the other areas of this company all have a direct line into the bottom line. IT has always been more about providing support at a level that's hard to categorize. I know that James and the board would always groan when we asked for more budget to do things, but we did it because we could see the challenges that needed to be solved. By finding a way to equate those challenges to business issues and framing the discussion around improving processes and making us more revenue, I think James has finally started to realize how important it is for IT to be a part of the bigger picture.

 

That's not to say there aren't challenges today. I've already seen how we need to have some proper change control methods around here. My networking team has already implemented these ideas, and I plan on getting the CTO to pass them around to the other departments as well. Another thing that I think is critical based on my workload is getting the various teams here to train across roles. I saw it first hand when James would call me for a network issue that ended up being a part of the storage or virtualization team. I learned a lot about those technologies as I helped troubleshoot. They aren't all that different from what we do. I think a little cross training for every team would go a long way in helping us pinpoint issues when they come up instead of dumping the problem on the nearest friendly face.

 

The View From The Middle

 

James called Amanda to his office. She went in feeling hopeful and looking forward to the new year. James and Amanda sat down with one of the other Board members to discuss some items related to Amanda's desire to cross-train the departments, as well as improving change controls and documentation. James waited until Amanda had gone through her list of discussion items. Afterwards, he opened with, "These are some great ideas Amanda, and I know you want to bring them to the CTO. However, I just got word from him that he's going be moving on at the end of the year to take a position in a different company. You're one of the first people outside the Board to know."

 

Amanda was a bit shocked by this. She had no idea the CTO was ready to move on. She said, "That's great for him! Leaves us in a bit of a tough spot though. Do you have someone in mind to take his spot? Mike has been here for quite a while and would make a great candidate." James chuckled as he glanced over at the Board member in the room. He offered, "I told you she was going to suggest Mike. You owe me $5."

 

James turned back to Amanda and said, "I know that Mike has been here for quite a while. He's pretty good at what he does but I don't think he's got what it takes to make it as the CTO. He's still got that idea that the storage stuff is the most important part of this business. He can't see past the end of his desk sometimes." James continued, "No, I think we're going to be opening up applications for the CTO position outside the company. There are some great candidates out there that have some experience and ideas that could be useful to the company."

 

Amanda nodded her head in agreement with James's idea.

 

James then said, "However, that doesn't fix our problem of going without a CTO in the short term. We need someone that has proven that they have visibility across the IT organization; that they can respond well to problems and get them fixed while also having the ability to keep the board updated on the situation."

 

James grinned widely as he slid a folder across the table to Amanda. He said, "That's why the board and I want you to step in as the Interim CTO until we can finish interviewing candidates. Those are some big shoes to fill, but you have our every confidence. You also have the support of the IT department heads. After the way you helped them with the various problems throughout the year, they agreed that they would like to work with you for the time being. We’ll get some professional development scheduled for you as soon as possible. If you’re going to be overseeing the CTO’s office for now, we want to help you succeed with the kind of training that you’ll need. It’s not something you get fixing networks every day, but you’ll find it useful in your new role when dealing with the other department heads."

 

Amanda was speechless. It took her a few moments to find her own words. She thanked James profusely. She said, "Thank you for this! I think it's going to be quite the challenge but I know that I can help you keep the IT department working while you interview for a new CTO. I won't let you down."

 

James replied, "That's exactly what I wanted to hear. And I fully expect to see your application in the pile as well. There's nothing stopping us for taking that "interim" title away if you're the right person for the job. Show us what you're capable of and we'll evaluate you just like the other candidates. Your experience so far shows that you've got a lot of the talents that we're looking for."

 

As Amanda stood up to leave with her new title and duties, James called after her, "Thanks for being a part of our team Amanda. You've done a great job around here and helped show me that it's not always the network."

Only a few shopping days remaining before Christmas! I hope you and yours are settling in for a long holiday weekend. We host our families each year, meaning there will be fifteen people here on Sunday. Naturally I ordered the Roast Beast for the main dish.

 

Anyway, here's a bunch of links I found on the Intertubz that you may find interesting, enjoy!

 

My Yahoo Account Was Hacked! Now What?

Well, for starters, you could try explaining why you are using Yahoo for email.

 

Verizon Rethinking The Yahoo Purchase Deal After Breach

If Verizon even has an ounce of intelligence at the highest levels they will run, not walk, away from Yahoo before the end of the week. I cannot imagine they would still want to go through with this deal.

 

Over 8,800 WordPress Plugins Have Flaws: Study

Yes, and this is why I make an effort to keep my plugins to a minimum and get them updated frequently. Well, that was the plan until a failed update crashed my blog for two days. So I got to play Wordpress Admin last week. Yay.

 

AWS Launches Managed Services

Amazon takes a step into the future of IT by launching this service. I would expect a lot of this managed service to be automated and built into the cost, just as Microsoft is currently doing with Azure.

 

Cisco Is Shutting Down Its Cloud

If you are like me you will read this and think "Cisco has a cloud?", followed by "Why?"

 

DevOps Will Underpin the Fourth Industrial Revolution

Setting aside their inflated sense of self-importance, DevOps now feel that they are about to start an industrial revolution. The author of this article should be shown a history book on technology: DevOps isn't new, it's just a marketing buzzword. And I can only hope that as a result of this "revolution" DevOps will find a way to automate away the use of the term "DevOps", right before they automate themselves out of a job.

 

Evernote’s new privacy policy raises eyebrows

Because awful privacy violations shouldn't be limited to governments here comes EverNote to do their part in making the world a little less safer.

 

My son having a lengthy discussion about requirements and deliverables:

EGL-xmas.jpg

As government agencies continue their IT modernization initiatives, administrators find themselves in precarious positions when it comes to security. That’s the overall sentiment expressed in the 2016 Federal Cybersecurity Survey1. The report found that efforts to build more modern, consolidated, and secure information technology environment networks increase security challenges, but management tools offer a potential antidote to the threats.

 

Modernization increased IT security challenges

 

Federal administrators managing the transition from legacy to modernized infrastructure face enormous challenges. The transition creates a large amount of IT complexities that burden administrators who must manage old and new systems that are very different from one another.

 

Many noted that consolidation and modernization efforts increase security challenges due to incomplete transitions (48 percent), overly complex enterprise management tools (46 percent), and a lack of familiarity with new systems (44 percent). Other factors included cloud services adoption (35 percent), and increased compliance reporting (31 percent).

 

However, 20 percent believe the transition toward more modern and consolidated infrastructures ultimately will net more streamlined and secure networks. They said replacing legacy software (55 percent) and equipment (52 percent), the adoption of simplified administration and management systems (42 percent), and having fewer configurations (40 percent) will help secure networks once the arduous transition phase is complete.

 

Foreign governments tie internal threats as chief concerns

 

For the first time, respondents said that foreign governments are just as much of a cybersecurity threat as untrained internal workers. In fact, 48 percent called out foreign governments as their top threat—an increase of 10 percentage points over our 2015 survey2.

 

That’s not to say that insider threats have been minimized. On the contrary. The number of people who feel insiders pose a major threat is still higher than it was just two years ago.

 

Investing in the right security tools can help mitigate threats

 

Patch management software is among the solutions administrators invest in and use to great effect, with 62 percent indicating that their agencies partake in the practice. Of those, 45 percent noted a decrease in the time required to detect a security breach, while 44 percent experienced a decrease in the amount of time it takes them to respond to a breach.

 

Respondents noted security information and event management (SIEM) solutions as highly effective in combating threats. While only 36 percent stated that their agencies had such tools in place, administrators who use SIEM tools felt significantly more equipped to detect just about any potential threats.

 

While a majority of respondents still feel their agencies are just as vulnerable to attacks now as a year ago, it is good to see an increase in the number of respondents who feel agencies have become less vulnerable. This is likely due to the fact that administrators have become highly cognizant about the potential threats and are using the proper solutions to fight them.

 

The Federal Cybersecurity Summary Report contains more statistics and is available for free. You might empathize with some of the findings and be surprised by others.

 

Find the full article on Signal.

 

Endnotes:

1SolarWinds Federal Cybersecurity Survey Summary Report 2016; posted February 2016; http://www.solarwinds.com/assets/surveys/cybersecurity-slide-deck.aspx

2SolarWinds Federal Cybersecurity Survey Summary Report 2015; posted February 2015; http://www.solarwinds.com/resources/surveys/solarwinds-federal-cybersecurity-survey-summary-report-2015.aspx

It was another incredible week for the December Writing Challenge, and I continue to be amazed, touched, and humbled by the responses everyone is sharing. You folks are bringing yourself to these comments and we are all enriched because of it. I’ve never been so happy and excited to distribute THWACK points.

 

Here are some of the responses that caught my eye this past week:

 

Day 10: Count

Ben Garves (thegreateebzies) was our lead contributor again for this post, waxing philosophic on the very nature of what it is to count.

 

Meanwhile, joshyaf  shared the hopeful and instructive thought:

“Count your blessings. They are numerous.

Count on others. As you are counted on.

Count to ten when you are angered. Your mother was right, it helps.”

 

Kimberly Deal took the theme in a personal direction, saying, “I just hope, that when all is said and done, that my time here counted for something good.  We've already got enough something not good.  I'd like to be something good, just this once.”

 

And jamison.jennings decided to bring us back around to the technical, with

“SELECT CONVERT(date,[DateTime])as Date, COUNT(TrapID) as Traps

FROM dbo.Traps

GROUP BY CONVERT(date, [DateTime])

ORDER BY CONVERT(date, [DateTime]) DESC”

 

Day 11: Trust

Day 11 marked the first (but not last) lead post from Kevin Sparenberg (@KMSigma). If you use our online demo (http://demo.solarwinds.com) then you owe a debt of gratitude to Kevin and his team.

 

Peter Monaghan, CBCP, SCP, ITIL ver.3 voiced an experience common among THWACK denizens: “I have found "trust" to be a misunderstood word in IT, especially monitoring. For example, "I don't trust our monitoring." blah! blah! blah!”

 

In what is become a daily (and much welcomed) tradition in the challenge, network defender gave us a short poem:

Trust in your knowledge

You know the way,

You'll find the problem

And save the day.

 

Finally, tomiannelli gave us another detailed diagram – this time of the hierarchy of trust – and followed it up with this personal thought:

“I could not comprehend living life without explicitly trusting others. I thought how fearful a life like that must be. To wait for a person to exhibit most if not all of the attributes shown above before you trust them could take a lot of interaction and time.

 

Day 12: Forgive

Head Geek Destiny Bertucci (Dez) returned to the lead poster role, sharing some techniques for dealing with negativity and how it has helped her grow in her life.

 

mtgilmore1 invoked Robert Frost, with the quote, "Forgive me my nonsense, as I also forgive the nonsense of those that think they talk sense."

 

miseri Was one of several folks who shared an unattributed quote about the personal benefits of forgiveness: “To forgive is to set a prisoner free and discover that the prisoner was you.

 

And mldechjr also shared some of their insights and techniques for what many of us said was a very challenging action: “Almost every time I get mad at something or someone it pays to stop, take a breath and realize how many stupid things I have done in my life.  It then becomes easier to forgive any trespass that I was upset about. Especially knowing the forgiveness I have been shown over the years.”

 

While I normally find three stand-outs per day, I couldn’t NOT include an LOTR quote from one of our THWACK MVP’s, rschroeder

Chapter 8:  The Scouring Of The Shire:

‘Very well, Mr. Baggins,’ said the leader, pushing the barrier aside. ‘But don’t forget I’ve arrested you.’

‘I won’t,’ said Frodo. ‘Never. But I may forgive you.'

 

Day 13: Remember

Many people can guess that managing the Head Geeks is a lot like wrangling five caffeinated toddlers in a room full of puppies. Fewer people know that this job falls to Jenne Barbour (jennebarbour), along with all the other responsibilities entailed in being the senior director of Marketing here at SolarWinds. Despite this, she signed up to share a memory about memories.

 

In response, imm an included both an unattributed quote: “When you feel like quitting, remember why you started.”

…and their own thought on this: “Remember because if you don't then no else will remember it for you :-)”

 

Michael Kent shared some extremely apropos words from Stephen Hawking, "Remember to look up at the stars and not down at your feet. Try to make sense of what you see and about what makes the universe exist. Be curious. And however difficult life may seem, there is always something you can do, and succeed at. It matters that you don't just give up."

 

But jokerfest got the last laugh in with, “I keep forgetting to remember things.”

 

Day 14: Rest

If you have ever enjoyed any of the SolarWinds videos (whether they were the funny ones, the training, the teaser trailers for a new version of our products, or SolarWinds Lab) then you have enjoyed the work of Rene Lego (legoland) and her incredible team of video wizards. As the lead writer for day 14, Rene opened up about the challenges she faces and her complicated relationship with the concept of “rest”.

 

silverwolf took the word and turned it into an instructive acronym:

Relax  -relax yourself, you don't need to rush all the time.

Enjoy -enjoy the relaxing time, find something you like doing, maybe pick up a puzzle or go grab your favorite book, do you like drawing? Go draw something.

Sleep -make sure you get enough sleep. Lack of sleep is not good for your body or the mind, go catch some zZzZzZzz's your body needs it.

Trust - trust in yourself. Listen for the hints that your mind and body are telling you, Rest is GOOD.

 

In another unattributed quote, mtgilmore1 shared

“If you get tired, learn to rest, not quit.”

 

And desr gave us a more telling update to the old cliché, saying:  “No rest for the intelleigent”

 

Day 15: Change

Diego Fildes Torrijos (jamesd85) returns for his second lead post, commenting on what is at the heart of all the massive change that we see in the world (spoiler: it’s us)

 

THWACK MVP Radioteacher  Dec 19, 2016 12:10 AM

noted that, “Change is happening whether you adapt to it or not.  Do not be left behind.”

 

And mlotter elaborated on the same idea, saying, “I think change starts with having the right mindset for that change. You can talk about change all day long but what are you truly changing. Ive seen people change the way they do things or the way they communicate but when you investigate further you find they are still doing the same things with the same outcome. They are just arriving at it a different way. “

 

And network defender  added a thought wrapped in poetry with,

Change is coming

It will never quit,

Learn to adapt

And get over it.

 

Day 16: Pray

Day 16 gave one of the more “charged” words in the series, and of course the THWACK community did not disappoint with thoughts that were profound, varied, and overall respectful to the beliefs of others while remaining true to themselves.

 

  1. jamison.jennings opined that, “Prayer is simply conversation with God. In any good relationship there is conversation. You don't say a bunch of fancy words or magic incantations with your friends, you simply speak what's on you heart. Same goes with God, but remember, it's a relationship...the more the conversation, the better the relationship.”

 

Meanwhile, joshyaf Offered two sides of the same coin, saying, “Praying isn't defined entirely as talking to a deity. Sometimes prayer is just reflection on yourself or the situation. I believe in my own prayer to God and it is a necessity in life for me. Though, maybe just a little insight for those that don't. Meditation can be a form of prayer.”

 

And unclehooch put it succinctly with, “If you only pray when you’re in trouble… You’re in trouble!”

 

That wraps up week 3 of the challenge. Look for my week 4 wrap up sooner rather than later!

The story so far:

 

  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)
  4. It's Not Always The Network! Or is it? Part 4 -- by Tom Hollingsworth (networkingnerd)
  5. It's Not Always The Network! Or is it? Part 5 -- by John Herbert (jgherbert)
  6. It's Not Always The Network! Or is it? Part 6 -- by Tom Hollingsworth (networkingnerd)

 

What happens when your website goes down on Black Friday? Here's the seventh installment, by John Herbert (jgherbert).

 

The View From Above: James, CEO

 

It's said, somewhat apocryphally, that Black Friday is so called because it's the day where stores sell so much merchandise and make so much money that it finally puts them 'in the black' for the year. In reality, I'm told it stems from the terrible traffic on the day after Thanksgiving which marks the beginning of the Christmas shopping season. Whether it's high traffic or high sales, we are no different from the rest of the industry in that we offer some fantastic deals to our consumer retail customers on Black Friday through our online store. It's a great way for us to clear excess inventory, move less popular items, clear stock of older models prior to a new model launch, and to build brand loyalty with some simple, great deals.

 

The preparations for Black Friday began back in March as we looked ahead to how we would cope with the usual huge influx of orders both from an IT perspective and in terms of the logistics of shipping so many orders that quickly. We brought in temporary staff for the warehouse and shipping operations to help with the extra load, but within the head office and the IT organization it's always a challenge to keep anything more than a skeleton staff on call and available, just because so many people take the Friday off as a vacation day.

 

I checked in with Carol, our VP of Consumer Retail, about an hour before the Black Friday deals went live. She confirmed that everything was ready, and the online store update would happen as planned at 8AM. Traffic volumes to the web site were already significantly increased (over three times our usual page rate) as customers checked in to see if the deals were visible yet, but the systems appeared to be handling this without issue and there were no problems being reported. I thanked her and promised to call back just after 8AM for an  initial update.

 

When I called back at about 8.05AM, Carol did not sound happy. "Within a minute of opening up the site, our third party SLA monitoring began alerting that the online store was generating errors some of the time, and for the connections that were successful, the Time To First Byte (how long it takes to get the first response content data back from the web server) is varying wildly." She continued "It doesn't make sense; we built new servers since last year's sale, we have a load balancer in the path, and we're only seeing about 10% higher traffic that last year and we had no trouble then." I asked her who she had called, and I was relieved to hear that Amanda had been the first to answer and was pulling in our on call engineers from her team and others to cover load balancing, storage, network, database, ecommerce software, servers, virtualization and security. This would be an all hands on deck situation until it was resolved, and time was not on the team's side. Heaven only knows how much money we were losing in sales every minute the site was not working for people.

 

The View From The Trenches: Amanda (Sr Network Manager)

 

So much for time off at Thanksgiving! Black Friday began with a panicked call from Carol about problems with the ecommerce website; she said that they had upgraded the servers since last year so she was convinced that it had to be the network that was overloaded and this was what was causing the problems. I did some quick checks in Solarwinds and confirmed that there were not any link utilization issues, so it really had to be something else. I told Carol that I would pull together a team to troubleshoot, and I set about waking up engineers across a variety of technical disciplines so we could make sure that everybody was engaged.

 

I asked the team to gather a status on their respective platforms and report back to the group. The results were not promising:

  • Storage: no alerts
  • Network: no alerts
  • Security: no alerts relating to capacity (e.g. session counts / throughput)
  • Database: no alerts, CPU and memory a little higher than normal but not resource-limited.
  • Load Balancing: No capacity issues showing.
  • Virtualization: All looks nominal.
  • eCommerce: "The software is working fine; it must be the network."

 

I had also asked for a detailed report on the errors showing up with our SLA measurement tool so we knew what out customers might be seeing. Surprisingly, rather than outright connection failures, the tool reported receiving a mixture of 504 (Gateway Timeout) errors and TCP resets after the request was sent. That information suggested that we should look more closely at the load balancers, as a 504 error occurs when the load balancer can't get a response from the back end servers in a reasonable time period. As for the hung sessions, that was less clear. Perhaps there was packet loss between the load balancer and those servers causing sessions to time out?

 

The load balancer engineers dug in to the VIP statistics and were able to confirm that they did indeed see incrementing 504 errors being generated, but they didn't have a root cause yet. They also revealed that of the 10 servers behind the ecommerce VIP, one of them was taking fewer sessions over time than the others, although the peak concurrent session load was roughly the same as the other servers. We ran more tests to the website for ourselves but were only able to see 504 errors, and never a hung/reset session. We decided therefore to focus on the 504 errors that we could replicate. The client to VIP communication was evidently working fine because after a short delay, the 504 error was sent to us without any problems, so I asked the engineers to focus on the communication between the load balancer and the servers.

 

Packet captures of the back end traffic confirmed the strange behavior. Many sessions were establishing without problem, while others worked but with a large time to first byte. Others still got as far as completing the TCP handshake, sending the HTTP request, then would get no response back from the server. We captured again, this time including the client-side communication, and we were able to confirm that these unresponsive sessions were the ones responsible for the 504 error generation. But why were the sessions going dead? Were the responses not getting back for some reason? Packet captures on the server showed that the behavior we had seen was accurate; the server was not responding. I called on the server hardware, virtualization and ecommerce engineers to do a deep dive on their systems to see if they could find a smoking gun.

 

Meanwhile the load balancer engineers took captures of TCP sessions to the one back end server which had the lower total session count. They were able to confirm that the TCP connection was established ok, the request was sent, then after about 15 seconds the web server would send back a TCP RST and kill the connection. This was different behavior to the other servers, so there were clearly two different problems going on. The ecommerce engineer looked at the logs on the server and commented that their software was reporting trouble connecting to the application tier, and the hypothesis was that when that connection failed, the server would generate a RST. But again, why? Packet captures of the communication to the app tier showed an SSL connection being initiated, then as the client sent its certificate to the server, the connection would die. One of my network engineers, Paul, was the one who figured out what might be going on. That sounds a bit like something I've seen when you have a problem with BGP route exchange...the TCP connection might come up, then as soon as the routes start being sent, it all breaks. When that happens, it usually means we have an MTU problem in the communication path which is causing the BGP update packets to be dropped.

 

Sure enough, once we started looking at MTU and comparing the ecommerce servers to one another, we discovered that the problem server had a much larger MTU than all the others. Presumably when it sent the client certificate,  it maxed out the packet size which caused it to be dropped. We could figure out why later, but for now, tweaking the MTU to match the other servers resolved that issue and let us focus back on the 504 errors which the other engineers were looking at.

 

Thankfully, the engineers were working well together, and they had jointly come up with a theory. They explained that the web servers ran apache, and used something called prefork. The idea is that rather than waiting for a connection to come in before forking a process to handle its communication, apache could create some processes ahead of time and use those for new connections because they'd be ready. The configuration specifies how many processes should be pre-forked (hence the name), the maximum number of processes that could be forked, and how many spare processes to keep over and above the number of active, connected processes. They pointed out that completing a TCP handshake does not mean apache is ready for the connection, because that's handled by the TCP/IP stack before being handed off to the process. They added that they actually used TCP Offload so that whole process was taking place in the NIC, not even on the server CPU itself.

 

So what if the session load meant that the apache forking process could not keep up with the number of sessions coming inbound? TCP/IP would connect regardless, but only those sessions able to find a forked process could continue to be processed. The rest would wait in a queue for a free process, and if one could not be found, the load balancer would decide that the connection was dead and would issue a 504. When they checked the apache configuration, however, not only was the number of preforked processes low, but the maximum was nowhere near where we would have expected it to be, and the number of 'spare' processes was only set to 5. The end result was that when there was a burst of traffic, we quickly hit the maximum number of processes on the server, so new connections were queued. Some connections got lucky and were attached to a process before timing out; others were not so lucky. The heavier the load, the worse the problem got, but when there was a lull in traffic, the server caught up again but now when traffic hit hard again, it only had 5 processes ready to go, and connections were delayed while waiting for new processes to be forked. I had to shake my head at how they must have figured this out.

 

Their plan of attack was to increase the max session count and the spare session count on one server at a time. We'd lose a few active sessions, but avoiding those 504 errors would be worth it. They started on the changes, and within 10 minutes we had confirmed that the errors had disappeared.

 

I reported back to Carol and to James that the issues had been resolved, and when I got off the phone with them, I asked the team to look at two final issues:

 

  1. Why did we not see any session RST problems when we tested the ecommerce site ourselves; and
  2. Why did PMTUd not automatically fix the MTU problem with the app tier connection?

 

It took another thirty minutes but finally we had answers. The security engineer had been fairly quiet on the call so far, but he was able to answer the second question. There was a firewall between the web tier and the app tier, and the firewall had an MTU matching the other servers. However, it was also configured not to allow though, nor to generate the ICMP messages indicating an MTU problem. We had shot ourselves in the foot by blocking the mechanism which would have detected an MTU issue and fixed it! For the RST issue, one of my engineers came up with the answer again. He pointed out that while we were using the VPN to connect to the office, our browsers had to use the web proxy to access the Internet, and thus our ecommerce site (another Security rule!). The proxy made all our sessions appear from a single source IP address, and through bad luck if nothing else, the load balancer had chosen one of the 9 working servers, then kept using that some server because it was configured with session persistence (sometimes known as 'sticky' sessions).

 

I'm proud to say we managed to get all this done within an hour. Given some of the logical leaps necessary to figure this out, I think the whole team deserve a round of applause. For now though, it's back to turkey leftovers, and a hope that I can enjoy the rest of the day in peace.

 

 

>>> Continue to the conclusion of this story in Part 8

The story so far:

 

  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)
  4. It's Not Always The Network! Or is it? Part 4 -- by Tom Hollingsworth (networkingnerd)
  5. It's Not Always The Network! Or is it? Part 5 -- by John Herbert (jgherbert)

 

Things always crop up when you least expect them, don't they? Here's the sixth installment, by Tom Hollingsworth (networkingnerd).

 

The View From Above: James, CEO

 

One of the perks of being CEO is that I get to eat well. This week was no exception, and on Tuesday night I found myself at an amazing French restaurant with the Board of Directors. The subject of our recent database issues came up, and the rest of the Board expressed how impressed they were with the CTO's organization, in particular the technical leadership and collaboration shown by Amanda. It's unusual that they get visibility of an individual in that way, so she has clearly made a big impact. Other IT managers have also approached me and told me how helpful she is; I think she has a great career ahead of her here. As dessert arrived and the topic of conversation moved on, I felt my smartwatch buzzing as a text message came in. I glanced down at my wrist and turned pale at the first lines of the message on the screen:

 

URGENT! We have a security breach...

 

I excused myself from the table and made a call to find out more. The news was not good. Apparently, we had been sent a message saying that our customer data has been obtained, and it will be made available on the black market if we don't pay them a pretty large sum of money. It made no sense; we have some of the best security tools out there, and we follow all those compliance programs to the letter. At least, I thought we did. How did this data get out? More to the point, would we be able to avoid paying the ransom? And even if we paid it, would the data be sold anyway? If this gets out, the damage to our reputation alone will cause us to lose new business, and I dread to think how many of our affected customers won't trust us with their data any more. The security team couldn't answer my questions, so I hung up and made another call, this time to Amanda.

 

 

The View From The Trenches: Amanda (Sr Network Manager)

 

I used to flinch every time I picked up phone calls from James. Now I can't help but wonder what problem he wants me to solve next. I must admit that I'm learning a lot more about the IT organization around here and it's making my ship run a lot tighter. We're documenting more quickly and anticipating the problems before they happen, and we have the Solarwinds tools to thank for a large portion of that. So I was pretty happy to answer a late evening call from James earlier this week, but this call was different. The moment he started speaking I knew something bad had happened, but I wasn't expecting to hear that our customer data had been stolen and was being ransomed. How far did this go? Did they just take customer data, or have they managed to extract the whole CRM database?

 

It's one thing to be fighting a board implementing bad ideas, but fighting hackers? This is huge! We're about to be in for a lot of bad press, and James is going to be spending a lot of time apologizing and hoping we don't lose all our customers. James told me that I am part of the Rapid Response Team being set up by Vince, the Head of IT Security, and I have the authority to do whatever I need to do to help them find out how to get this fixed. James says he's willing to pay the ransom if the team is unable to track the breach, but he's worried that unless we find the source, he'll just be asked to pay again a week later. I grabbed my keys and drove to the office.

 

I had barely sat down at my desk when Vince ran into my office. He was panting as he fell into one of my chairs, and breathlessly explained the problem in more detail. The message from the hacker included an attachment - a 'sample' containing a lot of sensitive customer data, including credit card numbers and social security numbers. The hacker wanted thousands of dollars in exchange for not selling it on the black market, and there was a deadline of just two days. I asked Vince if he had verified the contents of the attachment. He nodded his head slowly. There's no question about it. Somebody has access to our data.

 

I asked Vince when the last firewall audit happened. Thankfully, Vince said that his team audited the firewalls about once a month to make sure all the rules were accurate. I smiled to myself that we finally had someone in IT that knew the importance of regular checkups. Vince told me that the kept things up to date just in case he had to pull together a PCI audit. I told him to put the firewalls on the back burner and think about how the data could have been exfiltrated. He told me he wasn't sure on that one. I asked if he had any kind of monitoring tool like the ones I used on the network. He told me that he a Security Incident and Event Management (SIEM) tool budgeted for next year. Isn't that always the way? I told him it was time we tried something out to get some data about this breach fast. We only had a couple of days before the hacker's deadline, so we needed to get some idea of what was going on, and quickly.

 

While the security engineers on the Rapid Response team continued their own investigations, Vince and I downloaded the Solarwinds Log and Event Manager (LEM) trial and installed it on my Solarwinds server. It only took an hour to get up and running. We pointed it at our servers and other systems and had it start collecting log data. We decided to create some rules for basic things, like best practices, to help us sort through the mountain of data we just started digesting. Vince and I worked to put in the important stuff, like our business policies about access rights and removable media, as well as telling the system to start looking for any strange file behavior.

 

As we let the system do its thing for a bit, I asked Vince if the hacker could have emailed the files out of the network. He smiled and told me he didn't think that was possible because they had just finished installing Data Loss Prevention (DLP) systems a couple of months ago. It had caught quite a few people in accounting sending social security numbers in plain text emails, so Vince was sure that anything like that would have been caught quickly. I was impressed that Vince clearly knew what he was doing. He only took over as Head of IT Security about nine months back, and it seems like he has been transforming the team and putting in just the right processes and tools. His theory was that it was some kind of virus that was sending the data out a covert channel. Being in networking, I often hear things being blamed on the latest virus of the week, so I reserved my judgement until we knew more. All we could do now was wait while LEM did its thing, and the other security engineers continued their efforts as well. By this time it was well after midnight, and I put on a large pot of coffee.

 

When morning came and people started to come into work, we looked at the results from the first run at the data. Vince noted a few systems which needed to be secured to fall completely within PCI compliance rules. There was nothing major found, though; just a couple of little configurations that were missed. As we scrolled down the list though, Vince found a potential smoking gun. LEM had identified a machine in sales that had some kind of unknown trojan. On the same screen, the software offered the option to isolate the machine until it could be fixed. We both agreed that it needed to be done, so we removed the network connectivity for the machine through the LEM interface until we could send a tech down to remove the virus in person. More and more people were coming online now, so perhaps one of those systems would provide another possible cause.

 

We kept pushing through the data; we were now 18 hours into the two-day deadline. I was looking over the list of things we needed to check on when a new event popped up on the screen. I scrolled up to the top and read through it. A policy violation had occurred in our removable device policy rule. It looked like someone had unplugged a removable USB drive from their computer, and the system was powered off right after that. I checked the ID on the machine: it was one of the sales admins. I asked Vince if they had a way of tracking violations of the USB device policy. He told me that there shouldn't have been any violations as they had set a group policy in AD to prevent USB drives from being usable. I asked him about this machine in particular. Vince knitted his eyebrows together as he thought about the machine. He told me he was sure that it was covered too, but we both decided to walk down and take a look at it anyway.

 

We booted up the machine, and everything looked fine as it did the usual POST and came up to the Windows login screen. Wait, though; the background for the login screen was wrong. We have a corporate image on our machines with the company logo as the wallpaper. It wasn't popular but it also prevented incidents with more colorful pictures ... like the one I was looking at right now. Wow. Somehow this user had figured out how to change their wallpaper. I wondered what else this could mean. Vince and I spent an hour combing through the system. There were lots of non-standard things we found; lots of changes that shouldn't have been possible with our group policies (including the USB device policy), and the browser history of the user was clean. Not just clean from a perspective of sites visited, but completely cleared. Vince and I started to think that this system's user was someone we wanted to chat with.

 

I called James and told him we had a couple of possibilities to check out. He asked us to get back to him quickly; he had notified the rest of the Board, and they were pushing to hear that we had a solution as quickly as possible. Vince and I returned to my office and I scanned the SIEM tool for any new events while Vince contacted one of his team to arrange to have the suspect computer removed and re-imaged. Five minutes in, another event popped up. The same suspect system with the group policy had triggered an event for the insertion of a USB drive. I printed out the event, and Vince and I hurried back to the sales office to find out who had turned the computer on. We found the user hard at work, typing away; until, that is, we walked up to his desk. A flurry of mouse clicks later, he was back at his desktop. Vince asked him if he had anything plugged into his computer that wasn't supposed to be there. The user, a young man called Josh, said that he didn't. Vince showed him the event printout showing a USB drive being plugged in to the computer, but Josh shook his head and said that he didn't know what that was all about.

 

Vince wasn't having any of it. He started asking the sales admin all about the unauthorized changes on the machine that violated the group policies in place on the network. The sales admin didn't have an answer. He started looking around and stammering a bit as he tried to explain it. Finally, Vince said that he had enough. It was obvious something was going on and he wanted to get to the bottom of it. He told Josh to step away from the computer. Josh stood up and moved to the side, and Vince sat down at the computer, clicking around the system and looking for anything out of place. He glanced at the report from the Solarwinds SIEM tool, which showed that the drive was mounted in a specific folder location and not as a drive. As soon as he started clicking in the folder structure, Josh got visibly nervous. He kept inching closer to the chair and looked like he was about to grab the keyboard. When Vince clicked into the folder structure of the drive, his eyes got wide. Josh's head dropped and he stared resolutely at the carpet.

 

The post-mortem after that was actually pretty easy. Josh was the hacker who had stolen the information from our database. He had stored a huge amount of customer records on the USB drive and was adding more every day. He must have hit on the idea to ask us to pay for the records as a ransom, and he might have even been planning on selling them even if we paid up, although we'll never know. Vince's team analyzed the hard drive and found the exploits Josh had used to elevate his privileges enough to reverse the group policies that prevented him from reading and copying the customer data. We later found those privilege escalations in the mountain of data the SIEM collected. If we'd only had this kind of visibility before, we might have avoided this whole situation.

 

James came down to deal with the issue personally. Josh was pretty much frog-marched into a conference room, with James following close behind. The door slammed shut and the ensuing muffled shouting gave me some uncomfortable flashbacks to the day that my predecessor, Paul, was fired. Then Sam from Human Resources arrived with two of our attorneys from Legal in tow, and half an hour later Josh was being escorted from the building. I'm not privy to the exactly what the attorneys had Josh sign, but apparently he won't be making any noise about what he did.

 

From my perspective, I've built a really good relationship with the security team now, and of course, they've asked to keep Solarwinds Log and Event Manager. LEM paid for itself many times over this week, and there's no question that at some point it will help us avoid another crisis. For now though, James told Vince and I to take the rest of the week off. I'm not going to argue; I need some sleep!

 

 

>>> Continue reading this story in Part 7

ORIGSIN2014003_DellOtto-covSm.jpg

(image courtesy of Marvel)

 

...I learned from "Doctor Strange"

(This is the last installment of a 4-part series. You can find part 1 here, part 2 here and part 3 here)

 

Don't confuse a bad habit that works for a good habit

The Ancient One observes that Strange isn't, "…motivated by power or the need for acclaim. You simply have a fear of failure." He replies, "I guess that fear is what made me a great doctor." She calls him on this little bit of b.s., saying,

 

"Fear is what has held you back from true greatness. Arrogance and fear still keep you from learning the simplest and most significant lesson of all."

 

Strange asks, "Which is?"

The answer? "It's not about you."

 

After 30 years in IT, I've come to realize that our daily work is full of positive rewards for poor choices. We work long hours, come in early after an overnight change control, check systems on our days off, learn new skills for work on our own time, don't venture too far from a network connection, just in case, and so on. We do this because we are rewarded for giving 110 percent. We’re lionized (at least for a moment) when we manage to bring up the crashed system in record time; we receive bonuses and other incentives for closing the largest number of tickets, and so on.

 

But that doesn't make any of those behaviors good.

 

I'm not saying that sometimes putting in longer hours, or more effort, or rushing to help rescue a system or team is a bad thing. But our motivation for doing so – like fear of failure – should be identified for what it is and dealt with honestly.

 

RTFM before you try running commands

After being firmly warned about the perils of manipulating time, Strange grumps, "Why don't they put the warnings before the spell?" Later, he repeats this sentiment as the villain is hoisted on his own mystical petard.

 

Often, we find a potential solution and rush pell-mell into implementation without testing, or, as in the case with code, you find in the middle of a long forum thread, without reading to the end to find out it doesn't really address your issue, and, in fact, breaks several other things. Or worse, you discover that someone decides to be a smart@ss and tell you the solution is to run rm-fr / as root. If you don't read down to the next post, you may never find that warning that tells you this would erase all the files on your system.

 

This is the reason all IT pros should know the magical incantation, RTFM.

 

Being flawed doesn't mean you're broken.

Kaecilius, the villain of the movie, points out at one point that Kamar Taj is filled with broken souls to whom the Ancient One teaches "parlor tricks, and keeps the real power for herself.” While the second half of that sentiment is clearly not true, the first half has some merit. Look closely and you can see that each character you meet in the mystical fortress is flawed, either externally (in the case of Master Hamir, who is missing his left hand) or internally (as with Mordo, battling his inner demons). What is interesting is that, while some of the characters succumb to obstacles related to these flaws, none allow themselves to be defined by those flaws.

 

It is obvious to the point of cliché that none of us are perfect. Nor have any of us had perfect IT training, or career paths, or experiences. But those flaws, deficiencies, and missteps do not invalidate us as people, nor do they disqualify us as credible sources of IT expertise.

 

Artist Allie Jenson once said,

 

"I am proud of my flaws and mistakes. They are the building blocks of my strengths and beauty.”

 

In fact, the Japanese practice of Kintsugi is the art of taking flaws in an object and emphasizing them to create even greater beauty in the piece.

 

We need to remind ourselves that the ways in which we live with – and  sometimes overcome –our flaws are often what makes us special.

 

The path to mastery is not easy, but simple

Sitting at the feet of the Ancient One, Strange despairs of learning the secrets of the magic she offers. "But even if my fingers were able to do that," he says, "How do I get from here..." (indicating where he's sitting) "...to there." (pointing to where she sits.) She asks, "How did you become a doctor?" He answers, "Study and practice. Years of it."

 

Over the course of my 30-year career in IT, I’ve had the privilege to work with an astounding number of brilliant minds. These talented engineers and designers have unselfishly passed along hints and secrets on a daily basis. For that, I am sincerely grateful.

 

Even so, none of what we do comes easily. It requires, as Doctor Strange observed, study and practice, and often years of it to truly develop mastery. And usually in IT, the thing we're trying to master is a moving target, morphing from one form to another as technology continues to evolve at a breakneck pace.

 

But despite that, the mastery we acquire is rarely as impossible as it feels on that first day when we attempt to write our first line of code, configure our first router, or install our first server.

 

 

Even if words aren’t spells, they have power and must be treated with care

In the moments before Strange exposes the secret of the Ancient One's long life, she warns him, "Choose your next words very carefully, Doctor Strange." Not heeding her warning, Strange barrels on. In doing so, he sows the seeds of distrust and anger that ultimately lead to his friend Mordo becoming a lifelong nemesis.

 

It's important to recognize that nothing that Strange said was wrong. Nor was he wrong in challenging the Ancient One's choices. But doing so publicly, and in anger, and using the words he did, created more problems than he could have ever predicted.

 

In IT, we place great value in the Truth. In fact, I’ve written about it a lot lately:

 

But there is a difference between being honest and being insulting; between being assertive and aggressive; between uncovering the truth and exposing faults purely for the sake of diminishing.

 

It's an undeniable reality that the world has become more crass. Dangerously so, in fact. Not just as IT professionals, but as good faith participants in humanity, we have the ability and responsibility to change that trend, if we can. It means that even when we understand the pure facts, that we, like Doctor Strange, also choose our words very carefully.

 

Never doubt, diminish, or dismiss your value or importance

Denying that magic exists, Doctor Strange exclaims, "We are made of matter and nothing more. We're just another tiny, momentary speck in an indifferent universe." This is the point at which the Ancient One opens Strange’s eyes to the infinitude of reality, and asks, "Who are you in this vast multiverse, Mr. Strange?" The question is not meant to diminish Strange, but to point out that there is, in fact, a place and role and opportunity for greatness for every living being.

 

Walk into the convention hall at Cisco Live!, Microsoft Ignite, VMWorld, or CeBIT, and you begin to grasp the enormity of the IT community. In doing so, it's easy to believe that nothing we have to say or contribute is new or even meaningful in any way. We fall into the trap of being a technological Ecclesiastes, thinking there's nothing new under the sun.

 

The truth is that nothing could be further from the truth. It is our experiences, and our willingness to share them, that makes IT such a vibrant profession and community of individuals. Our struggles provide the motivation for solutions that otherwise would never be imagined. It is the intersection of our humanity with our abilities that create the compelling stories that inspire the next generation of IT professionals.

 

Did you find your own lesson when watching the movie? Discuss it with me in the comments below.

4.lunarranging

 

It takes a radio signal about 1.28 seconds to get to the Moon (about 239,000 miles away), and about 2.5 seconds for round trip communication between our secret moon base and Earth. So, therefore this common SQL Server error message number 833...

 

SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\SQL\database.mdf] in database [database]. The OS file handle is 0x0000000000000000. The offset of the latest long I/O is: 0x00000000000000

 

...implies that the round trip time is over 15 seconds, so using 7.5 seconds (as a minimum estimate, we really don't know how long it is taking) we see the underlying SAN disks are over 1,396,500 miles away, or about 5.8 times as far away as the Moon. No, I don't have any idea how they got there, either. But how else to explain this error? For all I know this SAN could be on Mars!

 

Now, I've seen this error message many times in my career. The traditional answers you find on the internet tell you to look at your queries and try to figure out which ones are causing you the I/O bottleneck. In my experience, this guidance was wrong more than 95% of time. In fact, this is the type of guidance that usually results in people wanting to just throw hardware at the problem. I've seen that error message appear with little to no workload being run against the instance.

 

In my experience the true answer was almost always "shared storage" or "the nature of the beast that is known as a SAN". Turns out that when several servers share the same storage arrays you can end up being a victim to what is commonly called a "noisy neighbor". One workload, on one particular server, causing performance pain for a seemingly unrelated server elsewhere.

 

What's more frustrating is that sometimes the only hint of the issue is with the SQL Server error message. Often the conventional tools used to monitor the SAN don't necessarily show the problem, as they are focusing on the overall health of the SAN and not on the health of specific applications, database servers, or end-user experience.

 

And just when I thought I had seen it all when it comes to the error message above, along comes something new for me to learn.

 

Snapshot and Checkpoints

No, they aren't what's new. I've been involved with the virtualization of database servers for more than eight years now and the concept of snapshots and checkpoints are not recent revelations for me. I've used them from time to time when building personal VMs for demos and I've seen them used sparingly in production environments. Why the two names? To avoid confusion, of course. (Too late.)

 

The concept of a snapshot or checkpoint is simple: to create a copy of the virtual machine at a point in time. The reason for wanting this point in time copy is simple as well: recovery. You want to be able to quickly put the virtual machine back to the point in time created by the snapshot or checkpoint. Think of things like upgrades or service packs. Take a snapshot of the virtual machine, apply changes, sign off that everything looks good, and remove the snapshot. Brilliant!

 

How do they work?

For snapshots in VMWare, the documentation is very clear:

When you create a snapshot, the system creates a delta disk file for that snapshot in the datastore and writes any changes to that delta disk.

So, that means the original file(s) used for the virtual machine become read-only, and this new delta file stores all of the changes. To me, I liken this to the similar "copy-on-write" technology in database snapshots inside of SQL Server. In fact, this VMWare KB article explains the process in the same way:

The child disk, which is created with a snapshot, is a sparse disk. Sparse disks employ the copy-on-write (COW) mechanism, in which the virtual disk contains no data in places, until copied there by a write.

OK, so we know how they work, so let's talk about their performance impact.

 

Are they bad?

Not at first, no. But just like meat left out overnight they can become bad, yes. And the reason why should be very clear: the longer you have them, the more overhead you will have as the delta disk keeps track of all the changes. Snapshots and checkpoints are meant to be a temporary thing, not something you would keep around. In fact, VMware suggests that you keep a snapshot for no more than 72 hours, due to the performance impact. Here's a brief summary of other items from the "Best practices for virtual machine snapshots in the VMware environment" KB article:

 

  • Snapshots are not backups, and do not contain all the info needed to restore the VM. If you delete the original disk files, the snapshot is useless.
  • The delta files can grow to be the same size as the original files. Plan accordingly.
  • Up to 32 snapshots are supported (unless you run out of disk), but you are crazy to use more than 2-3 at any one time.
  • Rarely, if ever, should you use a snapshot on high-transaction virtual machines such as email and database servers.
  • Snapshots should only be left unattended for 24-72 hours, and don't feed them after midnight, ever.

 

OK, I made that last part up. You aren't supposed to feed them, ever, otherwise they become like your in-laws at Christmas and they will never leave.

 

So, snapshots and checkpoints can have an adverse affect on performance! And I found out about it through this Spiceworks thread, then from other articles on the internet that detailed this very same issue.

 

So this performance issue wasn't exactly an unknown, but rather new to me since I hadn't come across issues related to snapshots or thought to check for them in production. And, from what I can tell, most people don't have this experience either, hence the reason for scratching our heads when we see the affects of snapshots and checkpoints on our database servers.

 

Do I have one?

I don't know, you'll need to look for yourself. For VMware, you have three methods as detailed in this KB article:

 

1. Using vSphere

2. Using the virtual machine Snapshot Manager

3. Viewing the virtual machine configuration file

4. Viewing the virtual machine configuration file on the ESX host

 

Yes, that's four things. I didn't write the KB article. You can read it for yourself. Consider number 4 to be a bonus option or something. Or maybe they meant to combine the last two. Again, I didn't write the article, I'm just pointing you to what it says.

 

Now, for Hyper-V, we can look at the Hyper-V Manager GUI as well, which is essentially similar to using vSphere. But we could also use the Hyper-V Powershell cmdlets as listed here. In fact, this little piece of code is all you really need:

PS C:\> get-vm "vmName" | get-vmsnapshot

Also worth mentioning here is that Virtualization Manager tracks snapshots as well. You can find information about sprawl and snapshots here.

 

Summary

Snapshots and checkpoints are fine tools for you to use, but when you are done with them you should get rid of them, especially for a database server. Otherwise you can expect to see a lot of disk latency and high CPU as a result. And should you see such things but your server team reports back that everything looks normal, I hope this post will stick in your head enough for you to remember to go looking for any rogue snapshots that may exist.

With Christmas just under two weeks away most of the corporate world is in what I call "holiday mode", that period of time when work needs to get done but the urgency wanes as everyone is forced to balance work and holiday tasks. Toss in a few snow days that close or delay school and it's easy to see how work schedules can be hectic for a period of time well beyond the holiday season.

 

Of course that won't stop me from putting together the Actuator each week. So here's a bunch of links I found on the Intertubz that you may find interersting, enjoy!

 

'Crime as a Service' a Top Cyber Threat for 2017

Just a reminder that things can, and will, get worse before they get better.

 

Microsoft to offer option of 16 years of Windows Server, SQL Server support through new Premium Assurance offer

Just what we wanted for Christmas, six more years of supporting Windows 2008 and SQL Server 2008!

 

Six maps that show the anatomy of America’s vast infrastructure

Because I like maps and I think you should too, here's one showing roads, railroads, and even the state of disrepair of our bridges. I'd like to see these same graphs over time, to get a sense if we are falling further behind on infrastructure upkeep.

 

Who needs traditional storage anymore?

Eventually we will reach a point where all the "nerd knobs" are taken away. We won't be tuning hardware. The traditional resource bottlenecks (memory, CPU, disk, network) will be out of our hands.

 

We’re set to reach 100% renewable energy — and it’s just the beginning

Say what you want about Google, but their efforts in this area are quite admirable.

 

How I detect fake news - O'Reilly Media

The downside to this is the amount of effort it takes to verify a story. Trusted resources are hard to come by these days. This is especially true when most "news" programming is essentially editorials and opinions. Gone are the days of merely reporting on an event, now we are subjected to an endless (and mindless) spouting of opinions AS facts, leading to mass confusion for everyone.

 

Using AWS Lambda to call and text you when your servers are down

Hey folks, I just wanted you to know that tools like this already exist. No need to reinvent the wheel here, you know.

 

A primer on blockchain

In case you were wondering what all the hype was about blockchain, here's an easily digestible infographic.

 

Total Cost of Ownership (TCO) Calculator

In case you needed to provide some data to your CFO on reasons to migrate to the cloud.

 

Last week in Orlando I delivered 4 sessions in 3 days at 2 events, but this was by far my favorite:

thwack - 1.jpg

It can be tough to get a good handle on government agencies’ increasingly complex database environments. Today, federal database administrators are in charge of everything ranging from on-premises solutions to cloud or hybrid systems. DBAs are like the central nervous system of the human body -- they are in charge of disseminating information throughout the entire agency.

 

That’s a big responsibility, and things are not going to get much easier anytime soon. The amount of data will skyrocket, and concerns surrounding security, efficiency and cost will continue. Fortunately, there are a few ways DBAs can reduce headaches and database management complexities.

 

1. Make sure that everything is on the same page, especially when it comes to application response times.

 

In order to streamline processes, it’s vitally important to ensure that all databases have a common set of goals, metrics and service-level agreements. Acceptable application response times will vary depending on unique needs.

 

Work with management to determine appropriate response times, and then implement the solutions that can deliver on that agreement. If applications aren’t responsive, or databases aren’t doing their jobs, then productivity and uptime could be significantly impacted, affecting the delivery of the agency’s mission.

 

2. Carefully document your processes and implement log and event management.

 

To help keep a close eye on all of the data that’s passing through a network and to ensure its security, establish a documentation system. Begin by documenting a consistent set of processes for database backup and restore, data encryption, detection of anomalies and potential security threats.

 

Log and event management tools can send alerts when suspicious activity is spotted in the log data. By doing so, you’ll be able to respond to them in a timely manner and automatically kill suspicious applications.

 

3. Reduce workload costs by planning ahead.

 

If you are considering moving to the cloud, there are a couple of things to keep in mind. First, carefully map out a strategy and establish guidelines. Be sure to deploy on a certified platform, and plan everything to ensure that the transition is seamless.

 

Second, consider moving to cloud solutions with lower licensing costs or to open source software, which is often less expensive. Remember that the goal of a DBA is not only to help provide colleagues with better, faster and more secure data access, it’s also to help save the agency money.

 

4. Keep things in perspective so you don’t go crazy.

 

No one said database administration was going to be easy. Government data is a tough business, and it’s only going to get tougher.

 

But, it can also be incredibly rewarding. Think of it: DBAs are the foundation of everything that happens in the agency. They control where the information goes, whether or not critical applications are working properly and, in effect, how effectively the agency completes its mission.

 

Yes, a DBA’s role is extremely complex. But making a few simple adjustments can reduce that complexity, ensuring that information keeps pumping and the agency’s vital operations stay healthy.

 

Find the full article on Government Computer News.

The story so far:

 

  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)
  4. It's Not Always The Network! Or is it? Part 4 -- by Tom Hollingsworth (networkingnerd)

 

Easter is upon the team before they know it, and they're being pushed to make a major software change. Here's the fifth installment, by John Herbert (jgherbert).

 

The View From Above: James (CEO)

 

Earlier this week we pushed a major new release of our supply chain management (SCM) platform into production internally. The old version simply didn't have the ability to track and manage our inventory flows and vendor orders as efficiently as we wanted, and the consequence of that has been that we've missed completing a few large orders in their entirety because we have been waiting for critical components to be delivered. Despite the importance of this upgrade to our reputation for on-time delivery (not to mention all the other cost savings and cashflow benefits we can achieve by managing our inventory on a near real-time basis), the CTO has been putting this off for months because the IT teams have been holding back on giving the OK. Finally the Board of Directors had enough with the CTO's push back, and as a group we agreed that there had been plenty enough time for testing, and the directive was issued that unless there were documented faults or errors in the system, IT should proceed with the new software deployment within the month.

 

We chose to deploy the software over the Easter weekend. That's usually a quieter time for our manufacturing facilities, as many of our customers close down for the week leading up to Easter. I heard grumbling from the employees about having to work on Easter, but there's no way around it. The software has to launch, and we have to do whatever we need to do to make that happen, even if that means missing the Easter Bunny.

 

The deployment appeared to go smoothly, and the CTO was pleased to report to the Board on Monday morning that the supply chain platform had been upgraded successfully over the weekend. He reported that testing had been carried out from every location, and every department had provided personnel to test their top 10 or so most common activities after the upgrade so that we would know immediately if a mission-critical problem had arisen. Thankfully, every test passed with flying colors, and the software upgrade was deemed a success. And so it was, until Tuesday morning when we started seeing some unexplained performance issues, and things seemed to be getting worse as the day progressed.

 

The CTO reported that he had put together a tiger team to start troubleshooting, and opened an ongoing outage bridge. This had the Board's eyes on it, and he couldn't fail now. I asked him to make sure Amanda was on that team; she has provided some good wins for us recently, and her insight might just make the difference. I certainly hope so.

 

The View From The Trenches: Amanda (Sr Network Manager)

 

With big network changes I've always had a rule for myself that just because the change window has finished successfully, it doesn't mean the change was a success, regardless of what testing we might have done. I tend to wait a period of time before officially calling the change a success, all the while crossing my fingers for no big issues to arise. Some might call that paranoia, and perhaps they are right, but it's a technique that has kept me out of trouble over time. This week has provided another case study for why my rule has a place when we make more complex changes.

 

Obviously I knew about the change over the Easter weekend; I had the pleasure of being in the office watching over the network while the changes took place. Solarwinds NPM made that pretty simple for me; no red means a quiet time, and since there were no specific reports of issues, I really had nothing to do. On Monday the network looked just fine as well (not that anybody was asking), but by Tuesday afternoon it was clear that there were problems with the new software, and the CTO pulled me in to a war room where a group of us were tasked to focus on finding the cause of of performance issues being reported with the new application.

 

There didn't seem to be a very clear pattern to the performance issues, and reports were coming in from across the company. On that basis we agreed to eliminate the wide area network (WAN) from our investigations, except at the common points, e.g. the WAN ingress to our main data center. The server team was convinced it had to be a network performance issue, but when I got them to do some ping tests from the application servers to various components of the application and the data center, responses were coming back in 1 or 2ms. NPM also still showed the network as clean and green, but experience has taught me not to dismiss any potential cause until we can disprove it by finding what the actual problem is, so I shared that information cautiously but left the door open for it to still be a network issue that simply wasn't showing in these tests.

 

One of the server team suggested perhaps it was an MTU issue. A good idea, but when we issued some pings with large payloads to match the MTU of the server interface, everything worked fine. MTU was never really a likely cause--if we had MTU issues, you'd have expected the storage to fail early on--but there's no harm in quickly eliminating it, and that's what we were able to do. We double checked interface counters looking for drops and errors in case we had missed something in the monitoring, but those were looking clean too. We looked at the storage arrays themselves as a possible cause, but checking Solarwinds Storage Resource Monitor we confirmed that there were no active alerts, there were no storage objects indicating performance issues like high latency, and there were no capacity issues, thanks to Mike using the capacity planning tool when he bought this new array!

 

We asked the supply chain software support expert about the software's dependencies. He identified the key dependencies as the servers the application ran on, the NFS mounts to the storage arrays and the database servers. We didn't know about the database servers, so we pulled in a database admin and began grilling him. We discovered pretty quickly that he was out of his depth. The new software had required a shift from Microsoft SQL Server to an Oracle database. This was the first Oracle instance the DB team had ever stood up, and while they were very competent monitoring and administering SQL Server, the admin admitted somewhat sheepishly that he really wasn't that comfortable with Oracle yet, and had no idea how to see if it was the cause of our problems. This training and support issue is something we'll need to work on later, but what we needed right then and there was some expertise to help us look into Oracle performance. I was already heading to the Solarwinds website because I remembered that there was a database tool, and I was hopeful that it would do what we needed.

 

I checked the page for Solarwinds' Database Performance Analyzer (DPA), and it said: Response Time Analysis shows you exactly what needs fixing - whether you are a database expert or not. That sounded perfect given our lack of Oracle expertise, so I downloaded it and began the installation process. It wasn't long before I had DPA monitoring our Oracle database transactions (checking them every second!) and starting to populate data and statistics. Within an hour it became clear what the problem was; DPA identified that the main cause for performance problems was occurring on database updates, where entire tables were being locked rather than using more a granular lock, like row-level locking. Update queries were being forced to wait while the previous query executed and released the lock on the table, and the latency in response was having a knock-on effect on the entire application. We had not noticed this at the weekend because the transaction loads were so low out of normal business hours that this problem didn't raise its head. But why didn't this happen on Monday? On a hunch I dug into NPM and looked at the network throughput for the application servers. As I had suspected, the Monday after Easter showed the servers handling about half the traffic that hit it on the Tuesday. At a guess, a lot of people took a 4-day weekend, and when they returned to work on Tuesday, that tipped the scales on the locking/blocking issue.

 

While we discussed this discovery, our supply chain software expert had been tapping away on his laptop. You're not going to believe this, he said, It turns out we are not the first people to find this problem. The vendor says that they posted a HotFix for the query code about a week after this release came out, but I just checked, and we definitely do not have that HotFix installed. I don't know how we missed that, but we can get it installed overnight while things are quiet, and maybe we'll get lucky. I checked my watch; I couldn't believe it was 7.30PM already. We really couldn't get much more done that night anyway, so we agreed to meet at 9AM and monitor the results of the application of the HotFix.

The next morning we met as planned, and watched nervously as the load ramped up as each time zone came on line. By 1PM we had hit a peak load exceeding Tuesday's peak, and not a single complaint had come in. Solarwinds DPA now indicated that the blocking issue had been resolved, and there were no other major alerts to deal with. Another bullet dodged, though this one was a little close for comfort. We prepared a presentation for the Board explaining the issues (though we tried not to throw the software expert under the bus for missing the HotFix), and presented a list of lessons learned / actions, which included:

 

  • Set up a proactive post-change war-room for major changes
  • Monitor results daily for at least one week for changes to key business applications
  • Provide urgent Oracle training for the database team (the accelerated schedule driven by the Board meant this did not happen in time)
  • Configure DPA to monitor our SQL Server installations too

 

We wanted to add another bullet saying "Don't be bullied by the Board of Directors into doing something when we know we aren't ready yet", but maybe that's a message best left for the Board to mull on for itself. Ok, we aren't perfect, but we can get better each time we make mistakes, so long as we're honest with ourselves about what went wrong.

 

 

>>> Continue reading this story in Part 6

I’m a little late, but I wanted to do a quick wrap-up of last week’s challenge.

Day 3: Search

Richard Phillips  Dec 5, 2016 12:28 PM

Reminded us that it’s not all about immediate gratification: “In a day when search means typing something into a browser and getting back an answer we need to remember that the search isn't just about getting an answer, it's about learning and gaining data that can be used now and in the future.”

 

Meanwhile, EBeach

Waxed philosophical with the quote: “A man travels the world over in search of what he needs and returns home to find it – George A. Moore.”

 

And tomiannelli Added some philosophical thoughts of his own, but closed with:

“Remember that sometimes not getting what you want is a wonderful stroke of luck.”

  ― Dalai Lama XIV

 

Day 4: Understand

mlotter pointed out that “A true mark of Successful person is the ability to listen to understand instead of listening with the intent to reply.”

 

Not to be outdone, mtgilmore1 Invoked the first man on the moon, with, "Mystery creates wonder and wonder is the basis of man's desire to understand."  Neil Armstrong

 

And THWACK MVP bsciencefiction.tv  Countered one word with another when he said, “To me learning something new requires one of two two things acceptance or understanding.” And followed it up with a clip of Comedian Michael Jr as he explains the power of Why.

 

Michael Jr: Know Your Why - YouTube

 

Day 5: Accept

desr Kept it short and sweet: “Accept who you are and what you are that is all that matters. If you accept yourself your true beauty will shine.”

 

silverwolf started off with enthusiasm: “Working in IT, Studying in IT,  even waay back in elementary school...or even before that, I've always known I would be involved with IT somehow....one way or another, with computers...with technology! I Accepted that fact a looong time ago. I mean... it was all so COOL!  It STILL IS! It's an ADVENTURE!” and then brought a series of awesome Star Trek memes.

 

And in what I hope is a purposeful mis-spelling, wbrown said, “Sometimes we just have to except that everything has an acception”

 

Day 6: Believe

Several folks started out their entries with some type of definition of the word of the day, but bleggett managed to weave the word into their analysis of the definition of the word (how meta): “Interestingly, the etymology of believe is not quite what I expected.  Not unbelievable, but I believe it's credible. 

Online Etymology Dictionary

 

  1. bsciencefiction.tv also started off with a definition, but with a bit more detail: “Belief is objective. Unfortunately, today belief is too often accepted as subjective. By quick definition:

An objective perspective is one that is not influenced by emotions, opinions, or personal feelings - it is a perspective based in fact, in things quantifiable and measurable. A subjective perspective is one open to greater interpretation based on personal feeling, emotion, aesthetics, etc.”

 

sparda963 Dec 6, 2016 4:42 PM reflected on how belief informs his work: “I am not much of a believer honestly. I don't take it on faith that something is going to work unless I know it is going to work. Far to many people who I have worked with in IT over the years believe that just because they did something that it will work exactly as their believe it will. This often does not turn out to be the case.”

 

Day 7: Choose

Day 7 is noteable because it marked the first post by another member of the SolarWinds staff – in this case Head Geek Destiny Bertucci. In response, Richard Phillips highlighted one of the biggest career divides that IT Pro’s encounter: “Ahh, Decisions, Decisions. I've heard lots of complaints over the years "That boss couldn't do my job half as good as I do and he makes so much more money" But what you see in successful people and those that "climb the ladder." is their ability to make decisions, quickly and decisively. They own the fact that they won't always be the best or sometimes even the correct decision, but they own it and own the results. That's what keeps them moving and that's why people follow them.”

 

prowessa chose to use their entry to express their support: “I choose to go the write path and be a better Thwackster.”

 

And Michael Kent quoted one of the greatest sysadmins in history (if you judge by the length of his beard), Albus Dumbledore, who said “Dark times lie ahead of us, and there will be a time when we must choose between what is easy and what is right”

 

Day 8: Hear

Radioteacher  Dec 11, 2016 1:29 AM

Related the word to his experience on the set of THWACKcamp this year: “I am in the white first year ThwackCamp shirt below. When sqlrockstar spoke on stage right I would stare at the back of Patrick's head knowing that from the cameras point of view it would look like I was looking at sqlrockstar. In some ways that it made it easier to focus on his voice, hearing what was said and reacting.”

 

Meanwhile, nedescon reflected on how hearing loss has affected their perception of the value of receiving information: “I think the most important thing to take away is there is a lot of noise that will get in the way if you let it. I truly am the only thing that I have any power or influence over. In other words, I can give my attention, but I can't always get another's.”

 

And steven.melnichuk@tbdssab.ca  succinctly pointed out the difference between hearing and listening: “hearing is one of the 5 senses of the human body...there is no skill involved. Listening is a skill and an art, and requires focus and commitment...and separates managers from leaders.”

 

Day 9: Observe

SolarWinds product marketing specialist Diego Fildes Torrijos submitted the lead essay for day 9, analyzing the way we observe the world around us (and how often we choose not to).

 

  1. bsciencefiction.tv  pointed out how the power of observation can serve us both personally and as monitoring enthusiasts: “The response to the "It's the [insert scapegoat here...but we know they are going to say 'network']" is a tool like Orion that allows you to observe your environment in near real time, coorelate data points and makee a logic deduction. Observer the world around you and make wise decisions. Observe the environment you are monitoring and make informed decisions.”

 

Then jamison.jennings  Offered a suggestion for the next SolarWinds product line: “In the future, when AI becomes more of a regular integral part of IT, SolarWinds should reserve OBSERVE for a name for their AI module. Observe will be the watcher for changes that happen to nodes on the network and learn and adapt to the ever changing network. New interface turned up over night during a maintenance window, Observe automagically starts monitoring. Drives removed and new ones added, no problem. Observe sees those "down" volumes and will go and look for new ones and add them in without relying on a scheduled network discovery.”

 

And Steve Hawkins  Used a quote by Andrew Carnegie to elaborate on how observation can inform our understanding: "As I grow older, I pay less attention to what men say. I just watch what they do." Sometimes the most important statements people make is how they react to a particular situation.

 

Keep the amazing comments coming (they’re worth 200 THWACK points each day) and tune in every day for the next essay challenge. Thank you to all the amazing contributors!

Starting Thursday, I'll be in Israel to meet some customers, attempt to eat my body weight in kosher shwarma, and speak at DevOpsDays Tel Aviv.

 

Since I'll be tweeting about it (@LeonAdato and @DevOpsDaysTLV) incessantly I figured I would give you all a heads up and let you know what I hope to achieve and hope to learn. You know, besides how much shwarma I can eat before it kills me. But what a way to go!

 

First, very much like my time at DevOpsDays Ohio, I hope to continue to have conversations about monitoring in a world of "cattle, not pets."

 

Second, I am looking forward to soaking up as much knowledge as I can as our industry continues the shift from on-premises to cloud. Seeing how companies big and small are adapting to the new reality of computing is both exciting to me as a veteran of IT and a source of great insight for where monitoring may be going in the future.

 

Finally, I am eager to see how the flavor of DevOps changes outside of the United States. You see, even within the U.S., there are nuances. In Austin, the crowd was almost entirely developers-who-do-ops. But in Ohio, it was 70% operations folks who were coming to grips with how they've also become developers. So I expect the event in Tel Aviv is going to teach me some more about this amazing, vibrant, and diverse community.

 

More to come on this after the event next week!

The story so far:

 

  1. It's Not Always The Network! Or is it? Part 1 -- by John Herbert (jgherbert)
  2. It's Not Always The Network! Or is it? Part 2 -- by John Herbert (jgherbert)
  3. It's Not Always The Network! Or is it? Part 3 -- by Tom Hollingsworth (networkingnerd)

 

The holidays are approaching, but that doesn't mean a break for the network team. Here's the fourth installment of the story, by Tom Hollingsworth (networkingnerd).

 

The View From Above: James (CEO)

 

I'm really starting to see a turn around in IT. Ever since I put Amanda in charge of the network, I'm seeing faster responses to issues and happier people internally. Things aren't being put on the back burner until we yell loud enough to get them resolved. I just wish we could get the rest of the organization to understand that.

 

Just today, I got a call from someone claiming that the network was running slow again when they tried to access one of their applications. I'm starting to think that "the network is slow" is just code to get my attention after the unfortunate situation with Paul. I decided to try and do a little investigation of my own. I asked this app owner if this had always been a problem. It turns out that it started a week ago. I really don't want to push this off on Amanda, but a couple of my senior IT managers are on vacation and I don't have anyone else I can trust. But I know she's going to get to the bottom of it.

 

 

The View From The Trenches: Amanda (Sr Network Manager)

 

Well, that should have been expected. At least James was calm and polite. He even told me that he'd asked some questions about the problem and got some information for me. I might just make a good tech out of the CEO after all!

 

James told me that he needed my help because some of the other guys had vacation time they had to use. I know that we're on a strict change freeze right now, so I'm not sure who's getting adventurous. I hope I don't have to yell at someone else's junior admin. I decided I needed to do some work to get to the bottom of this. The app in question should be pretty responsive. I figured I'd start with the most basic of troubleshooting - a simple ping. Here's what I found out:

 

icmp_seq=0 time=359.377 ms

icmp_seq=1 time=255.485 ms

icmp_seq=2 time=256.968 ms

icmp_seq=3 time=253.409 ms

icmp_seq=4 time=254.238 ms

 

Those are terrible response times! It's like the server is on the other side of the world. I pinged other routers and devices inside the network to make sure the response times were within reason. A quick check of other servers confirmed that response times were in the single digits, not even close to the bad app. With response times that high, I was almost certain that something was wrong. Time to make a phone call.

 

Brett answered when I called to the server team. I remember we brought him on board about three months ago. He's a bit green, but I was told he's a quick learner. I hope someone taught him how to troubleshoot slow servers. Our conversation started off as well as expected. I told him what I found and that the ping time was abnormal. He said he'd check on it and call me back. I decided to go to lunch and then check in on him when I got finished. That should give him enough time to get a diagnosis. After all, it's not like the whole network was down this time, right?

 

I got back from lunch and checked in on Brett The New Guy. When I walked in, he was massaging his temples behind a row of monitors. When I asked what was up, he sighed heavily and replied, "I don't know for sure. I've been trying to get into the server ever since you called. I can communicate with vCenter, but trying to console into the server takes forever. It just keeps timing out."

 

I told Brett that the high ping time probably means that the session setup is taking forever. Any lost packets just make the problem worse. I started talking through things at Brett's desk. Could it be something simple? What about the other virtual machines on that host? Are they all having the same problem?

 

Brett shrugged his shoulders. His response, "I'm not sure? How do I find out where they are?"

 

I stepped around to his side of the desk and found a veritable mess. Due to the way the VM clusters were setup, there was no way of immediately telling which physical host contained which machines. They were just haphazardly thrown into resource pools named after comic book characters. It looked like this app server belonged to "XMansion" but there were a lot of other servers under "AsteroidM". I rolled my eyes at the fact that my network team had strict guidelines about naming things so we could find it at a glance, yet the server team could get away with this. I reminded myself that Brett wasn't to blame and kept digging.

 

It took us nearly an hour before we even found the server. In El Paso, TX. I didn't even know we had an office in El Paso. Brett was able to get his management client to connect to the server in El Paso and saw that it contained exactly one VM - The Problem App Server. We looked at what was going on and figured that it would work better if we moved it back to the home office where it belonged. I called James to let him know we fixed the problem and that he should check with the department head. James told me to close the ticket in the system since the problem was fixed.

 

I hung up Brett's phone. Brett spun his chair back to his wall of monitors and put a pair of headphones on his head. I could hear some electronic music blaring away at high volume. I tapped Brett on the shoulder and told him, "We're not done yet. We need to find out why that server was halfway across the country."

 

Brett stopped his music and we dug into the problem. I told Brett to take lots of notes along the way. As we unwound the issues, I could see the haphazard documentation and architecture of the server farm was going to be a bigger problem to solve down the road. This was just the one thing that pointed it all out to us.

 

So, how does a wayward VM wind up in the middle of Texas? It turns out that the app was one of the first ones ever virtualized. It had been running on an old server that was part of a resource pool called "SavageLand". That pool only had two members: the home server for the app and the other member of the high availability pair. That HA partner used to be here in the HQ, but when the satellite office in El Paso was opened, someone decided to send the HA server down there to get things up and running. Servers had been upgraded and moved around since then, but no one documented what had happened. The VMs just kept running. When something would happen to a physical server, HA allowed the machines to move and keep working.

 

The logs showed that last week, the home server for the app had a power failure. It rebooted about ten minutes later. HA decided to send the app server to the other HA partner in El Paso. The high latency was being caused by a traffic trombone. The network traffic was going to El Paso, but the resources the server needed to access were back here at the HQ. So the server had to send traffic over the link between the two offices, listen for the response, and then send it back over the link. Traffic kept bouncing back and forth between the two offices, which saturated the link. I was shocked that the link was even fast enough to support the failover link. According to Brett's training manuals, it barely met the minimum. We were both amused that the act of failing the server over to the backup cause more problems than just waiting for the old server to come back up.

 

Brett didn't know enough about the environment to know all of this. And he didn't know how to find the answers. I made a mental note to talk to James about this at the next department meeting after everyone was back from vacation. I hoped they had some kind of documentation for that whole mess. Because if they didn't, I was pretty sure I knew where I could find something to help them out.

 

 

>>> Continue reading this story in Part 5

Wow, can you believe it? 2016 is almost over, the holidays are here I didn’t even get you anything!   It’s been a bit of a wild rollercoaster of a year through consolidation, commoditization, and collaboration!

 

I’m sure you have some absolute favorite trends or notable things which have occurred here throughout 2016.  Here are some that in particular have been a pretty recurring trend throughout the year.

 

 

  • Companies going private such as Solarwinds (closed in February), DellEMC (closed in September)
  • Companies buying other companies and consolidating industry like Avago buying Broadcom (Closed Q1), Brocade buying Ruckus (Closed Q3), Broadcom buying Brocade (Initiated in October)
  • Or companies divesting of assets like Dell selling off SonicWall and Quest, and Broadcom selling off Brocade’s IP division

 

 

Alright so that’s some of the rollercoaster at least a small snapshot of it, and the impact those decisions will have on practitioners like you and I only time will tell (I promise some of those will be GREAT and some of those, not so much!)

 

But what else, what else?! Some items I’ve very recently discussed include.

 

 

All three of these net-net benefit in the end really means that we will continue to see better technology, with deeper investment and ultimately (potentially) lower costs!

 

On the subject of Flash though if you haven’t been tracking the Density profiles have been insane this year alone and that trend is only continuing with further adoption and better price economics with technology like NVMe.  I particularly love this image as it reflects the shrinking footprint of the data center while reflecting our inevitable need for more.

 

Moores Law of Storage.png

 

 

This is hardly everything that happened in 2016 but these are particular items which are close to my heart and respectively my infrastructure.   I will give a hearty congratulation to this being the 16th official “year of vdi” a title we continue to grant it yet continues to fail to fulfill on its promises.  

 

Though with 2016 closing quickly on our heels there are a few areas you’ll want to be on the watch for in 2017!

 

  • Look for Flash Storage to get even cheaper, and even denser
  • Look to see even more competition in the Cloud space from Microsoft Azure, Amazon AWS and Google GCP
  • Look to Containers to become something you MIGHT actually use on a regular basis and more rationally than the very obscure use-cases promoted within organizations
  • Look to vendors to provide more of their applications and objects as Containers (EMC did this with their ESRS (Secure Remote Support)
  • Obviously 2017 WILL be the Year of VDI… so be sure to bake a cake
  • And strangely with the exception of pricing economics making adoption of 10GigE+ and Wireless wave2 we’ll see a lot more of the same as we saw this year, maybe even some retraction in hardware innovation
  • Oh and don’t forget, more automation, more DevOps, more “better, easier, smarter”

 

But enough about me and my predictions, what were some of your favorite and notable trends of 2016 and what are you looking to see coming forward looking to 2017?

 

And if I don’t get a chance to… Happy Holidays and a Happy New Year to ya’ll!

After the network perimeter is locked down, servers are patched, and password policies enforced, end-users themselves are the first line of defense in IT security. They are often the target for a variety of attack vectors making them the first step of triage when a security incident is suspected. Security awareness training, which should be a part of any serious IT security program, should be based in common sense, but what security professionals consider common sense isn’t necessarily common sense for the average end-user.

 

In order to solve this problem and get everyone on the same page, end-users need the awareness, knowledge, and tools to recognize and prevent security threats from turning into security breaches. To that end, a good security awareness program should be guided by these three basic principles:

 

First, security awareness is a matter of culture.

 

Security awareness training should seek to change or create a culture of awareness in an organization. This means different things to different security professionals, but the basic idea is that everyone in the organization should have a common notion of what good security looks like. This doesn’t mean that end-users know how to spot suspicious malformed packets coming into a firewall, but it does mean that it’s part of company culture to be suspicious of email messages from unknown sources or even from known sources but with unusual text.

 

The concerns of the organization’s security professionals need to become part of the organization's culture. This isn’t a technical endeavor but a desire to create a heightened awareness of security concerns among end-users. They don’t need to know about multi-tenant data segmentation or versions of PHP, but they should have an underlying concern for a secure environment. This is definitely somewhat ambiguous and subjective, but this is awareness.

 

Second, security awareness training should empower end-users with knowledge.

 

After a culture of security awareness has been established, end-users need to know what to actually look for. A solid security awareness program will train end-users on what current attacks look like and what to do when facing one. This may be done simply with weekly email newsletters or required quarterly training sessions.

 

End-users need to actually learn why it’s not good to plug a USB stick found in the parking lot into their computer, and users need to get a good feel for what phishing emails look like. They should know that they can hover over a suspicious link and sometimes see the actual hidden URL, and they should know that even that can be faked.

 

Ultimately, they need to know what threats look like. The culture of awareness makes them concerned, and knowledge gives them the ability to identify actual problems in the real world.

 

Third, security awareness training is concerned with changing behavior.

 

The whole point here is that end-users take action when there is suspicion of malicious activity. Security awareness training is useless if no one takes action and actually acts like the first line of defense they really are (or can be).

 

A good security awareness program starts with culture, empowers end-users with knowledge, and seeks to change behavior. This means making significant effort to provide end-users with clear directions for what to do when encountering a suspected security incident. Telling users to simply “create a ticket with the helpdesk” is just not enough. End-users need clear direction as to what they can actually do in the moment when they are dealing with an issue. This is where the whole “first line of defense” becomes a reality and not just a corporate platitude.

 

For example, what should end-users actually do (or not do) when they receive a suspected phishing email? The directions don’t need to be complicated, but they need to exist and be communicated clearly and regularly to the entire organization.

 

Security awareness training is the most cost-effective part of a security program in that it doesn’t require purchasing millions of dollars of appliances and software licenses. There is a significant time investment, but the return on investment is huge if done properly. A strong security awareness training program needs to be based in common sense, change culture, empower end-users with knowledge, and change behavior.

ST128_PINUP.jpg

(image courtesy of Marvel)

 

...I learned from "Doctor Strange"

(This is part 3 of a 4-part series. You can find part 1 here and part 2 here)

 

Withhold judgment and give respect when seeking answers

Standing outside the door to Kamar Taj, having just been saved from muggers, Strange is still glib and sarcastic about the nature of the environment he is in. Mordo stops him and says,

 

"I was in your place, once. I, too, was disrespectful. So might I offer you some advice? Forget everything that you think you know."

 

Recently, I was involved in a discussion about monitoring containers. I said,  "Maybe I'm being naive, but it seems like we already solved this problem. Only it was 2001 and we called it LPARs running on AIX." There was some nervous laughter, a few old-timers got the joke, and the rest of the group explained how containers were completely different, and all that old stuff wouldn't apply.

 

I wrote about this a year ago ("Respect Your Elders") and that sentiment still holds true. If you are not willing to give respect and credence to older ideas (if not older IT pros), then you are going to insult a lot of people, miss a lot of solutions, and spend a lot of extra time fixing old problems all over again.

 

Redundancy is your friend

In the movie, we discovered that the world is protected from mystical threats by three Sanctum Sanctorums, located in London, Hong Kong, and New York. When London falls, the world is still protected by the other two. Only after Hong Kong falls can the world be overwhelmed by hostile forces.

 

The message to us in IT is clear: failover systems, disaster recovery plans, high availability solutions, and the rest are all good things.

 

To say any more about this would be redundant.

 

Find a teacher and trust them to lead you

Stephen Strange travels to Kathmandu, to the mystical school of Kamar Taj, and meets the Ancient One. His mind is opened to the existence of magic in the world, and he begs to be accepted as a student. The Ancient One then guides Strange in his journey to master the mystical arts, monitoring his progress and helping him avoid pitfalls along the way. Later, she rebukes him by saying, "When you came here you begged me to teach you. Now I'm told you question every lesson and prefer to study on your own."

 

The correlating lesson for us in IT is that many of us tend to fall into the trap of solitary study. We find our education in the form of online blog posts, web-based tutorials, and PDFs. But there is something to be said for having a teacher, a mentor who understands you; where you started, where you'd like to go, how you learn best, and what your shortcomings are. If you are learning a single skill, self-directed learning is a great way to go. But when you are thinking about your career, it's worth taking the time to find a trusted advisor and stick with them. They will often see things in you that you cannot see in yourself.

 

Be comfortable with confusion

At one point in the story, Strange complains, "This doesn't make any sense!" The Ancient One replies, "Not everything does. Not everything has to." The point in the movie is that Strange has to let go of his need for things to make sense before he engages with them. Sometimes it needs to be enough to know that something simply is, regardless of how. Or that something works a particular way, irrespective of why.

 

"Yes, but now I know how it works," is what I say after I've burned hours de-constructing a perfectly working system. It's not that the education wasn't important, it's that it may not have been important at that moment. When our need for things to make sense impedes our ability to get on with our daily work, that's when we need to take a step back and remember that not everything has to make sense to us now, and that inevitably, some things in IT will never make sense to us.

 

When events pull you a certain direction, take a moment and listen

In the middle of a fight, Strange reaches for an axe hanging on the wall, only to have his semi-sentient cloak pull him toward a different wall. Despite repeated attempts to get the weapon, the cloak insistently pulls him away, until Strange finally realizes that the cloak is trying to tell him about an artifact that would restrain, rather than harm, his opponent. (For comic book geeks, those were a more down-to-earth version of the Crimson Bands of Cytorrak).

 

Despite our best laid plans and deepest desires, sometimes life pushes us in a different direction. This isn't strictly relegated to our career plans. Sometimes you believe the best solution lies with a particular coding technique, or even a specific language. Or with your chosen hardware platform, a trusted vendor, or even a specific software package.

 

And yet, despite your rock-solid belief that this is the best and truest way to achieve your goal, you can't seem to get it done.

 

In those moments, it's useful to look around and see where events are pushing you. What is over there? Is it something useful?

 

Even if others label it useless, be proud of the knowledge you have

During surgery, the anesthesiologist quizzes Doctor Strange on his musical knowledge, asking him to identify Chuck Mangione's hit, "Feels So Good." Later on, in an aside that goes by too fast for many in the audience, Strange tells his colleague that he traveled to Kathmandu. She asks "Like the Bob Seger song?" He responds, "Beautiful Loser album, 1975, A-side, third cut? Yes. In Nepal."

 

No, having this knowledge didn't help our hero save the day, but it was still a tangible part of who he was. Strange is a gifted doctor, an unapologetically arrogant ass, a talented sorcerer... and an unashamed music geek.

 

We in IT have to remember that we are also whole people. We're not just storage engineers or SysAdmins or pen testers or white hat hackers. We have other aspects of our lives that are important to us, even if they aren't central or even relevant to the plot of our story. They provide richness and depth of character. We shouldn't lose sight of that, and we shouldn't ignore our need for hobbies, interests, and non-IT outlets in our life.

 

Did you find your own lesson when watching the movie? Discuss it with me in the comments below. And keep an eye out for parts 4, coming next week.

It would be really easy to just post this link to Sam Harris’ TED talk and say “Discuss!” Sam Harris: Can we build AI without losing control over it?

 

But for you, busy people, let me distill some of Sam’s points and add a few of my own.

Sam does a brilliant job of pointing out that we’re not as worried about the impact of artificial intelligence as we should be.

 

“Death by science fiction is fun. Death by famine is not fun.”

 

If we knew we would all die in 50 years from a global famine, we’d do a heck of a lot to stop it. Sam is concerned that there’s a risk to humans once artificial intelligence surpasses us and it will, it’s only a matter of time.

 

"Worrying about AI safety is like worrying about over population on Mars."

 

So, we’re using a time frame as an excuse? That we shouldn’t worry our pretty little heads about it because it won’t occur in our lifetime? In half my lifetime, I’ve gone from having an Amstrad CPC 6128 running DOS to now carrying the Internet in my pocket. Also, I have kids and hopefully one day grandkids, so I’m a little worried for them.

 

Information processing is the source of intelligence.  And we wouldn't consider for a moment the option that we'd ever stop improving our technology. We will continue to improve our intelligent machines until they are more intelligence than we are and they will continue to improve themselves.

 

Elon Musk’s OpenAI group released Universe this week, providing a way for machines to play games and use websites like humans do. That's not a big deal if you’re not worried about the PC beating you at GTA V. It's a slightly bigger deal if you are a travel agent and the machines can now use comparison websites and book the cheapest fare without you. And while you’d hopefully have a more compass screening what you’d do online, do the machines have one? Could they get cheeky and ship their enemies glitter, or something more sinister?

 

Robert "Uncle Bob" Martin (author of The Clean Coder and other books), sets out 10 rules for software developers that he calls "The Scribe's Oath". One of those rules is you will not write code that does harm. But the issue isn't that a human will write code that shuts down a city's water treatment plant. The issue is that we're writing code that constructs deep learning neural networks, allowing machines to make decisions by themselves. We're enabling them to become harmful on their own, if we're not able to code a sense of morals, ethics and values into them.

 

Then we get into the ethical debates. If there's only two outcomes for an incident with a self-driving car, one that preserves the life of the driver and one that preserves the life of a pedestrian or another car driver, which one should the machine choose? Do we instil a human-like self-preservation/survival instinct?

 

Is this all the fault of (or a challenge for) the software developer? How does this apply to systems administrators & systems architects?

 

We've talked about autonomic computing before. If we are configuring scripted and self-healing systems, are we adding to the resilience of the machines and will this ultimately be detrimental to us? How outlandish does that seem right now though - that we'd enable machines to be so self-preserving that they won't even die if we want them too? We've even laughed in these comments about whether the machines will let us pull the power plug on them. Death by science fiction is funny.  But the machines can now detect when we are lying, because we built them to be able to do that. Ooops.

 

Technosociologist Zeynep Tufekci says “We cannot outsource our responsibilities to machines. We must hold on ever tighter to human values and human ethics." Does this means we need to draw a line about what machines will do and what they won’t do? Are we conscious in our AI developments about how much power and decision making control we are enabling the machines to have, without the need for a human set of eyeballs and approval first? Are we building this kind of safety mechanism in? With AI developments scattered among different companies with no regulation, how do we know that advances in this technology are all ethically and morally good? Or are we just hoping they are, relying on the good nature of the human programmers?

 

Ethics in AI has come up a few times in the comments of my articles so far. Should we genuinely be more worried about this than we are? Let me know what you think.

 

-SCuffy

Updating your Active Directory Schema is something that needs to be done from time to time whether we like it or not. It is done to either support a new version of the OS Domain controller or because an AD integrated application such as Exchange, Skype for Business or SCCM requires the update. Regardless of the reasons the mere mention of an Active Directory ( AD) Schema update would make administrators cringe. The dreaded fear of the schema update is mostly due to the fact that this an update that cannot be undone. There is no uninstall button that allows you to reverse your changes. Things would get complicated if you have AD-integrated applications or have third party applications that also extended your schema.

 

Active Directory is like a beating heart

 

For those not sure what Active Directory is, it is a database of objects that represents users, computers, groups etc in your network, as well as being used for authentication and authorization. The schema is the component of Active Directory that defines all the objects with classes and attributes. For each version of Windows Server Domain Services, for instance the schema is different between AD 2003 and AD 2008 and AD 20012. When you introduce a new Domain controller with newer version OS you will need to update your schema.

 

 

I sometimes refer to AD as the heart of the network. The flow of the network, your enterprise objects, pass through this beating heart and if it has a brief hiccup or is slowed down it can affect the overall function of your network. Users not being able to login to their computers can have major impacts to the business and productivity loss can cost lost dollars. A non-working heart can be almost paralyzing for some businesses.

 

Upgrade all things NOW!

 

If there is mention of a schema update most would tend to delay an upgrade until they felt it was “safe”. Now this push of new product releases every 18 -24 months by Microsoft, it has introduced a re-thinking of sorts. In effort to reduce the fear and increase upgrades they have made these schema updates a little less painful and sometimes almost transparent. With each new release they simplify and make it easier to deploy and update.

 

With Windows server 2012 they made that process simpler by simplifying the upgrade process. The functions of with adprep and /forestprep, /domainprep have now been wrapped up into the Active Directory Domain Services role installation process making the process much easier through a few click of next. You can still use the command and do it manually if you want to be old school.

 

 

Schema updates are almost required for every Exchange Service Pack or major CU update now. The same can be said for other Microsoft applications such as Skype for Business and SCCM. They have made it so easy that in some cases, by installing the Application update such as a CU for Exchange 2013 the schema update process was built into the application. Given that the account you were using to run the Exchange update had all the appropriate permissions to update AD the schema, the update would be easy and seamless.


I think the level of fear of schema updates has decreased somewhat in past several years with administrators having to do it more often and the process to update keeps getting easier by Microsoft. Now if you have third party applications that extend your schema that may not be as pain free. As with any upgrade/update, you should always plan accordingly and test as much as possible, even the simple point and click ones.

I'm in Orlando this week for SQL Live as well as the Orlando SWUG meeting. With three sessions to deliver at SQL Live, working the booth there, and the session at the SWUG it is going to be a busy week. If you are in the Orlando area I hope you can stop by and say hello.

 

Anyway, here's a bunch of links I found on the Intertubz that you may find interersting, enjoy!

 

Canadian Money May Contain Animal Fat, Bank Of Canada Confirms

Gives new meaning to "put your money where your mouth is", because it's so tasty!

 

Everything announced at AWS re:Invent 2016

Oh, yeah, AWS: reInvent was this past week, in case you didn't know. Here's the list of everything they announced. I like the idea of Snowmobile, wonderful marketing gimmick. And now I know why Amazon operates with such thin margins, so they can afford things like Snowmobile.

 

Apple is reportedly using drones to beat Google Maps

Well, if the drones use Apple Maps, this might take a while.

 

34 tips to boost iPhone & iPad battery life

After fighting with applications slowly consuming the memory on my iMac this week I figured I would share some tips on battery life for your iPhone. Yes, I am assuming you have an iPhone, because I know you don't have a Windows Phone, but many of the tips work for Android as well.

 

Five Things You Need to Know About the U.K.’s Mass Surveillance Law

In case you had not heard about this, but the UK government has made it legal to do some questionable data collections.

 

NYPD plans to expand Smart car fleet to replace scooters

Not only are these cars adorable to look at, but I'm guessing they are a stepping stone to autonomous patrolling and data collection.

 

Big Data Poised to Get Much Bigger in 2017

All I want for Christmas is data, of course. Let's just hope it doesn't sit out and ROT all year.

 

Am I in the Christmas spirit yet? You be the judge:

jeep3.jpg

Challenge

 

In previous weeks, we talked about application-aware monitoring, perspective monitoring and agent/responder meshes to get a decentralized view of how our network is functioning.

 

With our traditional network monitoring system (NMS), we have a device-level and interface-level view. That's becoming less and less true as modern software breaks the mould of tradition, but it's still the core of its functionality. Now we have added perspective monitoring and possibly some agent/responder monitoring to the mix. How do we correlate these so that we have a single source of meaningful information?

 

Maybe We Don't?

 

Describing the use of the phrase "Single Pane of Glass" (SPoG) in product presentations as "overused" is an understatement. The idea of bringing everything back to a single view has been the holy grail of product interface design for some time. This makes sense, as long as all of that information is relevant to what we need at the time. With our traditional NMS, that SPoG is usually the dashboard that tells us whether the network is operating at baseline levels or not.

 

Perspective monitoring and agent/responder meshes can gather a lot more data on what's going on in the network as a whole. We have the option of feeding all that directly into the NMS, but is that where we're going to get the best perspective?

 

Data Visualization

 

We're living in a world of big data. The more we get, the less likely it becomes that we will be able to consume it in a meaningful way. Historically, we have searched for the relevant information in our network and filtered out what isn't immediately relevant. Big data is teaching us that it's all relevant, at least when taken as a whole.

 

Enter log aggregators and data visualization systems. Most of the information that we're getting from our decentralized tools can be captured in such a way as to feed these systems. Instead of just feeding it into the NMS, we have the option of collecting all of this data into custom visuals. These can give us single view not only of where the network is experiencing chronic problems, but of where we need to adjust our baselines.

 

Whether we're looking at Elastic Stack, Splunk, Tableau or other tools. The potential to capture the gestalt of our network's data and present it usefully is worthwhile.

 

Which?

 

What if there's something in all this that indicates unacceptable performance or a failure? Yes, that should raise alerts in our NMS.

 

This isn't an either/or thing. It's a complementary approach. There's no reason why the data from our various agents and probes can't feed both. Depending on the tool that's used, the information can even be forwarded directly from the visualizer, simplifying the collection process.

 

The Whisper in the Wires

 

Depending on what we’re looking for, there’s more than one tool for the job. Traditionally, we’re observing the network for metrics that fall outside of our baselines, particularly when those have catastrophic impact to operations. This is essential for timely response to immediate problems. Moving forward, we also want a bird’s eye view of how our applications and links are behaving, which may require a more flexible tool.

 

Has anyone else looked at implementing data visualization tools to complement their NMS dashboards?

The Rolling Stones once wrote a song about how time waits for no one, but the inverse is also true today. These days, no one waits for time; certainly not government personnel who depend on speedy networks to deliver mission-critical applications and data.

 

Fortunately, agency administrators can employ deep packet-level analysis to ensure the efficiency of their networks and applications. Packet-level analysis involves capturing and inspecting packets that flow between client and server devices. This inspection can provide useful information about overall network performance, including traffic and application response times, while fortifying network security.

 

Before we get into how this works, let’s take a minute to go back to the concept of time – specifically, network response time (NRT), also known as network path latency. NRT measures the amount of time required for a packet to travel across a network path from sender to receiver. When latencies occur, application performance can be adversely impacted.

 

Some applications are more prone to latency issues, and even lower bandwidth applications aren’t completely immune. End-users commonly think that these problems are the result of a “slow network,” but it could be the application itself, the network, or a combination of both.

 

Packet analysis can help identify whether the application or network is at fault. Managers can make this determination by calculating and analyzing both application and network response time. This allows them to attack the root of the problem.

 

They can also use analysis to calculate how much traffic is using their networks at any given time. This is critically important for two reasons: first, it allows administrators to better plan for spikes in traffic, and second, it can help them identify abnormal traffic and data usage patterns that may indicate potential security threats.

 

Additionally, administrators can identify which applications are generating the most traffic. Packets can be captured and analyzed to determine data volume and transactions, among other things. This can help managers identify applications and data usage that may be putting a strain on their networks.

 

The challenge is that, traditionally, packet-level analysis has typically been either too difficult or expensive to manage. There’s a free powerful open source tool called Wireshark, but it’s also a bit difficult to wrangle for those who may not be familiar with it. Many proprietary tools are full-featured and easier to use, but expensive.

 

The good news is that some standard network monitoring tools now include packet analysis as another key feature. That makes sense, because packet analysis can play an important – and very precise – role in making sure that networks continue to run efficiently. As a result, federal IT administrators now have more options to reach deep into their packets and honor the words that Mick Jagger once sang: “Hours are like diamonds. Don’t let them waste.”

 

Find the full article on our partner DLT’s blog, TechnicallySpeaking.

This is the last of a 3-part series, which is itself is a longer version of a talk I give at conferences and conventions.

You can find part 1 here, and you can find part 2 here

Now that I'm wrapping it up, I would love to hear your thoughts, suggestions, and ideas in the comments below!

 

In the last two sections of this series, I made a case for WHY unplugging should be important to us as IT Professionals, and I began to dig into specific examples of HOW we can make unplugging work for us. What follows are some additional techniques you can adapt for your own use, as well as some ways to frame your time away so that you avoid the FUD that can come with trying something new and potentially different from what our colleagues are doing.

 

Perspective is Key

Along with planning, another key to successfully disconnecting is to develop a healthy perspective.

 

Try this for the next few days: Note how you are contacted during real emergencies (and how often those emergencies actually happen).

 

It's easy to fall into the trap of answering every call, jumping screens at the sound of a bell or tweet, checking our phone at two-minute intervals, and so on, when NOTHING is actually that important or urgent.

 

Develop an awareness of how often the things you check turn out to be nothing, or at least nothing important.

 

Change the way you think about notifications. Mentally re-label them interruptions and then see which matter. Pay attention to the interruptions. That's where you lose control of your life.

interruptions.jpg

 

If someone really needed you or needed to tell you something, they wouldn't do it in a random tweet. They wouldn't tag you in a photo. They probably wouldn't even send it as a group text. When people want you to know something, they use a very direct method and TELL you.

 

So once again, take a deep breath. Learn to reassure yourself that you aren't going to miss anything important. Honest.

 

Prioritization is Key

For people like me, going offline is pretty much an all or nothing deal. As I said earlier, if it has an on switch, it's off limits for me and my family.

 

But that doesn't have to be the case. You can choose levels of connectivity as long as they don't get the best of you.

 

A good example of this is your phone. Most now support an ultra, super-duper power saving mode, which has the unintended benefit of turning off everything except... you know... the phone part. With one swipe you can prioritize direct phone calls while eliminating all the distractions that smartphones represent. You can also set different applications manually to interrupt – I  mean notify – you or not, so that you only receive the interruptions that matter.

 

As long as we're talking about prioritization, let's talk about getting work done. Despite your nagging suspicion to the contrary, your technology was not protecting you from the Honey Do list. It was just pushing the items on your list to the point where you had to work on them later in the day or week, and at a time when you are even less happy about it than you would have been otherwise.

 

Use your unplugged time to prioritize some of the IRL tasks that are dragging you down. I know it sounds counterintuitive, but it is actually easier to get back to work when you know the gutters are clean.

 

As challenging as it sounds, you might also need to prioritize who you get together with on your day off the grid. Don't purposely get involved with friends who spend their weekends gaming, live-tweeting, etc. There's nothing wrong with those things, of course, but you're not really offline if you keep telling your buddy, "Tweet this for me, okay?”

 

Yes, this may change who you associate with and when. But don't try to be offline when everyone else around you is online. That's like going on a diet and forcing your friends to eat vegan chili cheese dogs.

 

But What About...

Hopefully this has gotten you thinking about how to plan for a day away from the interwebz. But there's still that annoying issue of work. Despite claims of supporting work-life balance, we who have been in IT for more than 15 minutes understand that those claims go out the window when the order entry system goes down.

 

The answer lies partly with prioritization. If you've made your schedule clear (as suggested earlier) and the NOC still contacts you, you'll need to make a judgement call about how or if you respond.

 

Spoiler Alert: Always opt for keeping a steady paycheck.

 

Speaking of which, on-call is one of those harsh realities of IT life that mangle, if not outright destroy, work-life balance. It's hard to plan anything when a wayward email, text, or ticket forces you to go running for the nearest keyboard.

on-call.jpg

 

If you are one of those people who is on-call every day of the year around the clock, I have very little advice for you to go offline, and honestly you have bigger fish to fry. Because that kind of rat race gets old fast.

 

On the other hand, I have a ton of experience coordinating rotating on-call with offline. Now, I don't want you to think that I've negotiated this upfront on every job I've held. I have had managers who respected my religious schedule and worked around it, and others who looked me in the eye and said my religion was my problem to solve. Here's what I've learned from both experiences:

 

First, the solution will ultimately rest with your coworkers. Not with your manager and certainly not with HR. If you can work out an equitable solution with the team first, and then bring it to management as a done deal, you're likely home free.

 

Second, nobody in the history of IT has ever said they loved an on call schedule; and everyone wants more options. YOU, dear reader, represent those options. In exchange for your desired offline time, you can offer to trade coworkers and cover their shift. You wouldn't believe how effective this is until you try it. In a few rare cases, I've had to sweeten the deal with two-for-one sales ("I'll take your Sunday and Monday for every Saturday of mine"), but usually just swapping one day for another is more than enough. Another trick is to take your coworker's entire on-call week in exchange for them taking that number of your offline days during your on call rotation.

 

Yet another trick: My kids school schedule is extremely non-standard. They have school on Sunday and don't get days off for Christmas, Thanksgiving, or most of the other major national holidays. So I can graciously offer to cover prime time days like Thanksgiving in exchange for them taking my time off. In essence, I'm leveraging time when my family isn’t going to be home, anyway.

 

The lesson here is that if you have that kind of flexibility, use it to your advantage.

 

But what about perception? If you unplug regularly, won't people notice and judge you?

 

First, don't overthink it. When people get wind of what you are doing, you're more likely to receive kudos than criticism, and more than a few wistful comments along the lines of, “I wish I could do that."

 

Second, if you followed my suggestions about communicating and prioritizing - the right people knew about your plans AND you remained flexible in the face of an actual crisis - then there really shouldn't be any question. In fact, you will have done more than most IT folks ever do when they walk out the doors.

 

So that leaves the issue of using your evenings and weekends to get ahead with technology, so you can be the miracle worker when Monday rolls around.

 

While I understand the truth of this comic:

11th_grade.png

 

I'll put a stake in the ground and say that few - if any - people saved their job, won the bonus, or got the promotion because they consistently used personal time to get work done. And for those few who did, I'd argue that long term it wasn't worth it for all the reasons discussed at the beginning of this essay.

 

It's also important to point out that managers, departments, or companies that require this level of work and commitment are usually dangerously toxic. If you find yourself in that situation, you will be doing your long-term happiness and career a favor, even if your bank account isn't happy in the short term.

 

To sum up: Learning to disconnect regularly and for a meaningful amount of time offers benefits to your physical health, your peace of mind, and even your career; and there are no insurmountable challenges in doing so, regardless of your business sector, years of experience, or discipline within IT.

 

The choice is yours. At the start of this series, I dared you to just sit and read this article without flipping over to check your phone, email, twitter feed, etc. Now, if you made it to the end of these essays without checking those blinking interrupti... I mean notifications, then you have my heartfelt gratitude as well as my sincere respect.

 

If you couldn’t make this far, you might want to think about why that is, and whether you are okay with that outcome. Maybe this is an opportunity to grow, both as an IT professional and as a person.

 

That's it folks! I hope you have gained a few insights that you didn't already have, and that you'll take a shot at making it work for you. Let me know your thoughts in the comments below.

This week we kicked off the December Writing Challenge, and the response has been incredible. Not just in volume of people who have read it (over 800 views) or commented (60 people and counting - all of whom will get 200 THWACK points for each of their contributions!), but also in the quality of the responses. And that's what I wanted to share today.

 

First, if you are having trouble finding your way, the links to each post are:

 

So here are some of the things that leapt out at me over the last 3 days:

 

Day 0: Prepare

First, I have to admit this one was a sneaky trick on my part, since it came out on Nov 30 and caught many people unprepared (#SeeWhatIDidThere?). Never the less, a few of you refused to be left out.

 

KMSigma pointed out:

"IT is (typically) an interrupt-driven job.  Sure, you have general job duties, but most are superseded by the high priority email from the director, the alert that there is a failing drive in a server, the person standing in your cube asking for the TPS report, the CIO stating that they just bought the newest wiz-bang that they saw at the trade show and you need to implement it immediately.  Regardless of what is causing the interruptions, your "normal" daily duties are typically defined by the those same interruptions.

 

So, how can you plan for interruptions?  Short answer is that you can't, but you can attempt to mitigate them."

 

Meanwhile, sparda963 noticed the connection to the old Boy Scout motto, and said:

"Instead of keeping rope, emergency food, matches, water filter, and other endless supplies within reasonable reach I keep things like tools, utilities, scanners and the such around."

 

Finally (for this day) zero cool noticed (not incorrectly), that

"Preparing for a days work in IT is like preparing for trench warfare.  You need to be a tech warrior and have a good plan of attack on how to communicate with EUs and prioritize their requests (demands). "

 

Moving to Day One (Learn), some highlights included:

bsciencefiction.tv spoke for many when he said

"The ability to learn is in my opinion one of the greatest tools in the kit for today’s IT professional. It is the ability to adapt and change to a world that is nowhere near static.  It is the skill to not just master a task but understand the concept as well."

 

Many others pointed out that learning is an active process that we have to be engaged with, not passively consume. And also that, as rschroeder commented,

"The day I stop learning is the day I die."

 

There were so many other amazing insights into how, why, and what to learn that you really should check them out.

 

But that brings us to today's word: Act.

miseri captured the essence of what many others said in the quote

"I don't trust words, I trust actions."

 

tinmann0715 was even able to honor the thoughts of his high school principal (even if he wasn't able to appreciate them at the time), who shared the motto:

"If it is to be it is up to me!"

 

And bleggett continued what is becoming a burgeoning Word Challenge trend to put thoughts into haiku with:

"Alerts that tell us

Charts that show us what we seek

Think before you act."

 

All in all there were some incredible ideas and personal stories shared. I appreciate everyone taking time out of their busy lives to share a piece of themselves in this way.

 

In the coming weeks, the "lead" article will come from other Head Geeks as well as folks from across the spectrum of the SolarWinds corporate community - members of the video team, editorial staff, product manageent, and more will share their stories, feelings, and reactions to each day's prompt.

 

Until next week...

Well hey everybody, I hope the Thanksgiving holiday was kind to all of you. I had originally planned to discuss more DevOPS with ya’ll this week however a more pressing matter came to mind in my sick and weakened state of stomach flu!

 

Lately we’ve been discussing ransomware but more important, lately I’ve been seeing an even greater incidence of ransomware affecting individuals and businesses, and worse when it would hit a business it would have a lot of collateral damage (akin to encrypting the finance share that only cursory access was allowed to or such)

 

KnowBe4 has a pretty decent Infographic on Ransomware I’m tossing in here and I’m curious what ya’ll have been seeing in this regards.

Do you find this to be true, an increased incidence, a decrease, roughly the same?

 

Ransomware-Threat-Survey.jpg

 

Some real hard and fast takeaways I’ve seen from those who aspire to mitigate ransomware attacks is to Implement:

 

  • Stable and sturdy firewalls
  • Email filtering scanning file contents and blocking attachments
  • Comprehensive antivirus on the workstation
  • Protected Antivirus on the servers

 

Yet all too often I see all of this investment around trying to ‘stop’ it from happening without a whole lot left to handling clean-up should it hit the environment, basically… Having some kind of backup/restore mechanism to restore files SHOULD you be infected.

 

Some of the top ways I’ve personally seen where Ransomware has wrought havoc in an environment have happened in the cases of; 

  • Using a work laptop on an untrusted wireless network
  • Phishing / Ransomware emails which have links instead of files and opening those links
  • Opening a “trusted” file off-net and then having it infect the environment when connected
  • Zero Day Malware through Java/JavaScript/Flash/Wordpress hacks (etc)

 

As IT Practitioners not only do we have to do our daily jobs, and the business to keep the lights on, and focus on innovating the environment, and keeping up with the needs of the business.   Worst of all when things go bad, and few things are as bad as Ransomware attacking and targeting an environment, then we have to deal with that on a massive scale! Maybe we’re lucky and we DO have backups, and we DO have file redirect so we can restore off of a VSS job, and we can detect encryption in flight and stop things from taking effect.   But that’s a lot of “Maybe” from end-to-end in any business and all of the applicable home devices that may be in play.  

 

There was a time when Viruses would break out in a network and require time and effort to cleanup, but at best it was a minor annoyance.  Worms would breakout and so long as we stopped whatever was the zero-day trigger we could stop it from occurring on the regular.   And while APTs and the like are more targeted threats this was less of a common occurrence for us to deal with where it would occupy our days as a whole.   But Ransomware gave thieves a way to monetize their activities, which gives incentives to infiltrate and infect our networks.   I’m sure you’ve seen the Ransomware now offering Helpdesk to assist victims with paying?

 

 

It’s definitely a crazy world we live in, one which leaves us only with more work to do on a daily basis, a constant effort to fend off and fight against.  This is a threat which has been growing at constant pace and is leaking and growing to infect Windows, Mac AND Linux.

 

What about your experiences, do you have any attack vectors for Ransomware you’d like to share, or other ways you were able to fend them off?  

Software Defined WAN is easily the most mature flavor of SDN. Consider how many large organizations have already deployed some sort of SD-WAN solution in recent years. It’s common to hear of organizations migrating dozens or even thousands of their sites to an entirely SD-WAN infrastructure, and this suggests that SD-WAN is no longer an interesting startup technology but part of the mainstream of networking.

 

The reason is clear to me. SD-WAN provides immediate benefits to a business’s bottom line, so from a business perspective, SD-WAN just makes sense. SD-WAN technology reduces complexity, improves performance and greatly reduces cost of an organization’s WAN infrastructure. The technology offers the ability to replace super-expensive private MPLS circuits for cheap broadband without sacrificing quality and reliability. Each vendor does this somewhat differently, but the benefits to the business are so palpable that the technology really is an easy sell.

 

The quality of the public internet has improved greatly over the last few years, so being able to tap into that resource and somehow retain a high quality link to branch offices and cloud applications is very tempting for cost-conscious CIOs. How can we leverage cheap internet connections like basic broadband, LTE and cheap cable yet maintain a high-quality user experience?

 

Bye-bye private circuits.

 

This is the most compelling aspect for using this technology. Ultimately it boils down getting rid of private circuits. MPLS links can cost thousands of dollars per month each, so if an SD-WAN solution can dramatically cut costs, provide fault tolerance and retain a quality experience, the value is going with all public internet connections.

 

Vendors run their software on proprietary appliances that make intelligent path decisions and negotiate with remote end devices to provide a variety of benefits. Some offer the ability to aggregate dissimilar internet connections such as broadband and LTE, some tout the ability to provide granular QoS over the public internet, and some solutions offer the ability to fail over from one primary public connection to another public connection without negatively affecting very sensitive traffic such as voice or streaming video. Also, keep in mind that this is an overlay technology which means that using SD-WAN means your transport is completely independent from the ISP.

 

Sweet. No more 3-year contracts with a monolith service provider.

 

Most SD-WAN vendors offer some, if not all, of these features, and some are going a step further by offering their solution as a managed service. Think about it: if your company is already paying some large ISP thousands per month for Ethernet handoffs into their MPLS cloud, what’s the difference with an SD-WAN managed service handing off a combination of Ethernet, LTE, etc. interfaces into their SD-WAN infrastructure?

 

Especially for small and medium-sized multi-site businesses, the initial cost of switching from managed MPLS to a dramatically cheaper managed SD-WAN provider is nothing compared to the savings over only a few years of savings from dropping private circuits.

 

For organizations such as high-transaction financial firms that want to manage their entire WAN infrastructure themselves and require almost perfect, lossless connectivity, SD-WAN may be a harder sell, but for most businesses it’s a no-brainer.

 

Picture a retail company with many locations such as a clothing store, bank, or chain restaurant that needs simple connectivity to payment processing applications, files, and authentication servers. These types of networks would benefit tremendously from SD-WAN because new branch locations can be brought online very quickly, very easily, and much more inexpensively than when using traditional private circuits. Not only that, but organizations wouldn’t be locked into a particular ISP anymore.

 

This is mainstream technology now, and it’s something to consider seriously when thinking about designing your next WAN infrastructure. It’s cheaper, easier to deploy, and easier to switch ISPs. That’s the real value of SD-WAN and why even huge organizations are switching to this technology in droves.

DRSTROATH001_cov.jpg

(image courtesy of Marvel)

 

...I learned from "Doctor Strange"

(This is part 2 of a 4-part series. You can find part 1 here)

 

When fate robs you of your skills, you can always seek others

The catalyst for the whole story was an accident that damaged Strange's hands beyond repair, or at least beyond his ability to ever hold a scalpel again.

 

The corollary for IT pros happens when we lose a technology. Maybe the software vendor is bought out and the new owner stops developing the old tools. Maybe your company moves in a different direction.  Or maybe the tool you know best  simply becomes obsolete. Whatever the reason, we IT professionals have to be ready to, in the words of my colleague Thomas LaRock (sqlrockstar), learn to pivot.

 

The interesting thing is that, very much like Stephen Strange, most of the time when we are asked (or forced) to pivot, we find we are able to achieve results and move our career forward in ways we couldn't have imagined previously.

 

Leverage the tools you have to learn new tools

One of the smaller jokes in the movie is when Wong the librarian asks, "How's your Sanskrit?" Strange glibly responds, "I'm fluent in Google Translate.” (Side note: Google translates Tamil, Telugu, Bengali, Gujarati, Kannada, and Sindhi among other Indic languages. But Sanskrit is not yet on the list).

 

The lesson for us in IT is that often you can leverage one tool (or the knowledge you gained in one tool), to learn another tool. Maybe the menuing system is similar. Maybe there are complimentary feature sets. Or maybe knowing one solution gives you insight into the super-class of tools that both tools belong to. Or maybe it's as simple as having a subnet calculator or TFTP server that lets you get the simple jobs done faster.

 

There’s no substitute for hard work

It's important to note that Strange does, in fact, learn to read Sanskrit. He puts in the work so that he isn't reliant on the tool forever. In fact, Strange is rarely shown already knowing things. Most of the time, he's learning, adapting, and most frequently just struggling to keep up. But at the same time, the movie shows him putting in enormous amounts of work. He rips through books at a fearsome rate. He learns to project his astral form so that he can stretch out his sleeping hours and continue to read, absorb, and increase his base of knowledge. Obviously, he also has natural gifts, and tools, but he doesn't rest on either of those.

 

In IT, there really is no better way to succeed than to put in the work. Read the manual. Test the assumption. Write some sample code. Build a test network (even if you do it completely virtually). Join a forum (for example, http://www.thwack.com?), and ask some questions.

 

Experience, creativity, and powerful tools help save the day

At the climax of the movie, Strange defeats the Dread Dormammu, lord of the dark dimension, in a most curious way: He creates a temporal loop that only he can break, locking Dormammu and himself into an endless repetition of the same moment in time. Faced with the prospect of his own personal Groundhog Day, Dormammu agrees to leave the Earth alone. The interesting thing is that, by all accounts, Strange isn't the strongest sorcerer in the world. Nor is he the most experienced. He has a spark of creativity and a few natural gifts, but that's about it.

 

Anyone in IT should be all too familiar with this narrative. A willingness to use the tools at hand, along with some personal sacrifice to get the job done, is often how the day is saved. In the movie, the tool at hand was the Eye of Agamotto. In real life, the small but powerful tool is often monitoring, which provides key insights and metrics that help us cut straight to the heart of the problem with little effort or time wasted.

 

Ask people who’ve stood in your shoes how they moved forward

In the course of his therapy, Stephen Strange is referred to the case of Jonathan Pangborn, a man who suffered an irreparable spinal cord injury, but who Strange finds one day playing basketball with his buddies. Telling Pangborn that he is trying to find his own way back from an impossible setback, Strange begs him to explain how he did it. This is what sets the hero's path toward the mystical stronghold in Kathmandu.

 

In IT, we run up against seemingly impossible situations all the time. Sometimes we muscle through and figure it out. Sometimes we just slap together a kludgy workaround. But sometimes we find someone who has had the exact same problem, and solved it! We need to remember that many in our ranks have stood where we stand and solved what we hope to solve. There’s no need to struggle to re-invent an already existing solution. But to benefit from others' experience, we have to ASK.

 

That's where being part of a community, such as Stack Exchange or THWACK©, can pay off. I’m not talking about registering an account and then asking questions only when you get stuck. I mean joining the community, really getting involved, reading articles, completing surveys, adding comments, answering questions, and, yes, asking your own as they come up.

 

Even broken things can help you find your way

On his way to the mystical school of Kamar Taj, Doctor Strange is accosted by muggers and ordered to give up his watch. Even though he is rescued from what appears to be a brutal beating, his watch isn't so lucky. It's only later that we realize there's an inscription on the back that reads, "Only time will tell how much I love you,” indicating that the watch is from Christina, one of the few people Strange has made a personal connection with.

 

While the joke, "Even a broken clock is right twice a day" comes to mind, the lesson I'm thinking of is a little deeper. In IT, we often overlook the broken things, whether it's code that doesn't compile, a software feature that doesn't work as advertised, or hardware that's burnt out, in favor of systems and solutions that run reliably. And that's not a bad choice, generally speaking.

 

But our broken things can still teach us a lot. I've rarely learned anything from a server that ran like clockwork for months on end. But I've learned a lot about pin-outs, soldering, testing, timing, memory registers, and more when I've tried to get an old boat anchor working again.

 

Sometimes that knowledge transferred. Sometimes it didn't. But even if not, the work grounded me in the reality of the craft of IT, and gave me a sense of accomplishment and direction.

 

Did you find your own lesson when watching the movie? Discuss it with me in the comments below. And keep an eye out for parts 3-4, coming in the following weeks.

Filter Blog

By date: By tag: