Skip navigation
1 2 3 4 Previous Next

Geek Speak

2,099 posts

By Joe Kim, SolarWinds Chief Technology Officer

 

I recently presented at our Federal User Group meeting in Washington, DC and hybrid IT was the hottest topic at the event.  It reminded me of a blog written last year by our former Chief Information Officer, Joel Dolisy, which I think is still very valid and worth a read.

 

 

It’s really no surprise that agencies are embracing hybrid clouds given that federal IT is balancing the need for greater agility, flexibility, and innovation, with strict security control. Hybrid clouds provide the perfect alternative, because they allow agencies to become more nimble and efficient while still maintaining control.

 

In today’s IT, there’s no room for barriers. Things exist in many places; applications, for example, must be stateless, mobile, and easily scalable to accommodate periods of peak demand.

 

Hybrid clouds offer three specific benefits that on-premises or hosted solutions cannot offer.

 

Lockdown Security

 

The hybrid model can help alleviate agencies’ cloud security concerns. Agencies can opt to keep extremely sensitive data on-premises in private clouds, while using public clouds to run applications.

 

That said, some agencies have begun placing contractor-owned and operated cloud offerings directly onto their networks. The contractors are providing physical security and boundary protection based on agency requirements. This helps solve acquisition challenges and allows agencies to more readily adopt innovative new commercial technologies.

 

Better Disaster Recovery

 

Systems may simply go down due to events that are beyond anyone’s control – power outages, hurricanes, and other phenomena. These situations require disaster recovery programs, and hybrid clouds can play a part in their implementation.

 

Hybrid clouds make disaster recovery far easier to implement and manage, and far more cost-effective. First, there’s no intensive physical installation, because everything is software-defined. Second, the hybrid cloud model can be more financially beneficial to an organization, especially agencies experiencing tight budgetary purse strings.

 

Greater Efficiency

 

With big data, processing time can sometimes take weeks, which may as well be a lifetime. Integrating existing on-premises computers with off-site hosted cloud resources can shave that time down to minutes.

 

Hybrid architectures make this possible. An organization can spin up hundreds of extra processors as necessary, for a specified period of time. This can help ensure that applications remain fully accessible and functioning, and speed up the dissemination of critical information.

 

There are other types of savings that can be had, specifically in terms of space. Data center consolidation has been in full swing since 2010, when the Federal Data Center Consolidation Initiative (FDCCI) was first introduced. A hybrid approach can help agencies in this effort by allowing them to save an enormous amount of space currently dedicated to compute resources that can be virtualized.

 

There are many benefits to hybrid cloud deployments, but they still have to be monitored closely. I wrote about the need for network monitoring, and the same rule of thumb can be applied to the cloud. Administrators must monitor their servers and applications – preferably with an agentless tool – that exist within both their on-premises and hosted environments. While hybrid clouds do offer the best of both worlds, agencies will want to always make sure the workloads they’re running within those worlds remain fully optimized. It’s also important to note that hybrid cloud deployment should be the goal, not just a transitional state.

 

Find the full article on our partner DLT’s blog, Technically Speaking.

1702_thwack_BracketBattle-Sidekicks_Landing-Page_525x133_v.1.png

 

Holy round one upset Batman! Our fifth annual bracket battle has already proven to be the most unpredictable we have ever seen. Maybe it’s the wide range of characters who span decades of pop-culture, but the community seemed pretty torn on most match-ups in the “noteworthy” round.

 

Let’s take a look at how our sidekicks fared in this round:

 

Shutouts:

  • Robin Vs Groot: For many, Robin is the quintessential sidekick, however, he was no match for Groot who gained fame in the Guardians of the Galaxy. ecklerwr1 offered one possible explanation for this huge upset: “I think this is a little age related to be honest. Many younger users probably didn't watch all the batman and robin movies and cartoons.”

  • Goose Vs Chewbacca:  No surprises in this match-up, the Wookiee warrior easily takes down the Top Gun Wingman. ajmalloy “Goose drew the worst possible first round opponent”

  • Samwise Vs Tina Ruth Belcher:  Easily one of the biggest shutouts this round! It appears that loyalty and sacrifice were more valued attributes in a sidekick than an obsession with zombies and working hard at the family restaurant. tallyrich “Samwise - that's a sidekick that will do what it takes.”

  • Shaggy Rogers Vs Morty Smith KMSigma explains how this was a loyalty vote for him (and everyone else apparently) “Sorry man - loyalty wins.  I do love me some Morty and can't wait for the next season, but we have, what, 20 episodes of R&M and have had Norville "Shaggy" Rogers since 1969?  And originally voiced by Casey Kasem?  Sorry, but Summer's stuttering little brother doesn't compare for me.”

  • Willow Rosenberg Vs Dr. Emmett Brown: Great Scott! This was no contest. The time traveling mad scientist easily beat out the vampire-slaying sidekick.

  • Rick Jones Vs Hermione Granger: Hermione used her magic to run away with the win. Maybe Rick would have had better luck if he were a more loyal sidekick. tinmann0715 “Points lost from Rick because he was a sidekick to so many different characters. In my eyes a sidekick is to be loyal.”

  • Bob, Agent Of Hydra Vs. Bucky Barnes: Neither sidekick seemed very popular in this match-up, nevertheless Bucky managed to win this round in a landslide.

  • Genie Vs. Dr. Watson: Magic lamps & wishes were surprisingly no match for the mystery solving sidekick, Dr Watson.

  • Pinky Vs Barney Rubble: chippershredder summed up this match-up perfectly “So Brain, What are we going to do today?” “Same thing we do every day, Pinky.  Use SolarWinds to take over the world!”

  • Barney Fife Vs Agent K: Unfortunately, the deputy sheriff of Mayberry was no match for the MIB.

  • Ford Prefect Vs Keenser: Ford Prefect FTW in this intergalactic shutout!

 

Nail-bitters:

  • Jiminy Cricket Vs. Wilson: This match-up came down to the wire, but in the end it was the Castaway companion who came out ahead. For a match-up between a cricket and a volleyball, it was decidedly heated. jeremymayfield “How can a bloody volleyball be beating the legend, an icon?  Its just not right people.   The ball didn't even make it through the entire movie, he washed away. Jiminy appears in many more cartoons over the years.”

  • Dwight Schrute Vs. Gromit: Choosing between a goofy office sidekick and a life-saving dog was tough for everyone. In the end Gromit came out ahead. silverwolf “Gromit definitely! He saves everyone waaaayyyy tooo many times.

  • Bullwinkle Vs Donkey: Another battle of the generations, but the voters who grew up with Shrek decided the winner in this match-up!

  • Luigi Vs Garth Algar: Garth is on a winning streak! First, he beats Spongbob and now Luigi! Mamma Mia!

  • Tonto Vs Short Round: Without a doubt the closest match-up of this round. ecklerwr1 was all of us when he said “Wow can't believe this is close... Tonto FTW!”

 

Were you surprised by any of the shutouts or nail bitters for this round? Comment below!

 

It’s time to check out the updated bracket & start voting for the ‘Admirable’ round! We need your help & input as we get one step closer to crowning the ultimate sidekick!

 

Access the bracket and make your picks HERE>>

Troubleshooting efficiency and effectiveness are core to uncovering the root cause of incidents and bad events in any data center environment. In my previous post about the troubleshooting radius and the IT seagull, troubleshooting efficacy is the key performance indicator in fixing it fast. But troubleshooting is an avenue that IT pros dare not to walk too often for fear of being blamed for being incompetent or incorrect.

 

We still need to be right a lot more than we are wrong. Our profession does not give quarters when things go wrong. The blame game anyone? When I joined IT operations many a years ago, one of my first mentors gave me some sage advice from his own IT journey. It’s similar to the three envelope CEO story that many IT pros have heard before.

  1. When you run into your first major (if you can’t solve it, you’ll be fired) problem, open the first envelope. The first envelope’s message is easy – blame your predecessor.
  2. When you run into the second major problem, open the second envelope. Its message is simply – reorganize i.e. change something whether it’s your role or your team.
  3. When you run into the third major problem, open the third envelope. Its message is to prepare three envelopes for your successor because you’re changing company willingly or unwillingly.  

 

A lifetime of troubleshooting comes with its ups and downs. Looking back, it has provided many an opportunity to change my career trajectory. For instance, troubleshooting the lack of performance boost from a technology invented by the number one global software vendor almost cost me my job; but it also re-defined me as a professional. I learned to stand up for myself professionally. As Agent Carter states, "Compromise where you can. And where you can’t, don’t. Even if everyone is telling you that something wrong is something right, even if the whole world is telling you to move. It is your duty to plant yourself like a tree, look them in the eye and say, no. You move." And I was right.

 

It’s interesting to look back, examine the events and associated time-series data to see how close to the root cause signal I got before being mired in the noise or vice-versa. The root cause of troubleshooting this IT career is one that I’m addicted to, whether it’s the change and the opportunity or all the gains through all the pains.

 

Share your career stories and how troubleshooting mishap or gold brought you shame or fame below in the comment section.

best.jpg

 

On Day Zero of being a DBA I inherited a homegrown monitoring system. It didn't do much, but it did what was needed. Over time we modified it to suit our needs. Eventually we got to the point where we integrated with OpsMgr to automate the collection and deployment of monitoring alerts and code to our database servers. It was awesome.

 

The experience of building and maintaining my own homegrown system combined with working for software vendors has taught me that every successful monitoring platform needs to have five essential components; identify, collect, share, integrate, and govern. Let's break down what each of those mean.

 

Identify

A necessary first step is to identify the data and metrics you want to monitor and alert upon. I would start this process by looking at a metric and putting it into one of two classes: informational or actionable. Metrics that were classified as information were the metrics that I wanted to track, but didn't need to be alerted upon. Actionable are the metrics where I needed to be alerted upon because I was needed to perform some actions in response. For more details on how to identify what metrics you want to collect, check out the Monitoring 101 guide, and look for the Monitoring 201 guide coming soon.

 

Collect

After you identify the metrics you want, you need to decide how you want to collect and store them for later use. This is where flexibility becomes important. Your collection mechanism needs to be able to consume data in varying formats. If you build a system the relies on data being in a perfect state, you will find yourself easily frustrated the first time some imperfect data is loaded. You will also find yourself spending far too much time playing the role of data janitor.

 

Share

Now that your data is being collected, you will want to share it with others, especially when you want to help provide some details about specific issues and incidents. As much as you might love the look of raw data and decimal points, chances are that other people will want to see something prettier. And there's a good chance they will want to be able to export data in a variety of formats, too. More than 80% of the time your end-users will be fine with the ability to export to CSV format.

 

Integrate

With your system humming along, collecting data, you are going to find that other groups will want that data. It could also be the case that you need to feed your data into other monitoring systems. Designing a system that can integrate well with other systems requires a lot of flexibility. It's best that you think about this now, before you build anything, as opposed to trying to make round pegs fit in a square hole later. And it doesn't have to be perfect for every possible case, just focus on the major integration method used the world over that I already mentioned: CSV.

 

Govern

This is the component that is overlooked most often. Once a system is up and running, very few people consider the task of data governance. It's important that you take the time to define what the metrics are and where they came from. Anyone consuming your data will need this information, as well. And if you change a collection, you need to communicate the changes and the possible impacts they may have for anyone downstream.

 

When you put those five components together you have the foundation for a solid monitoring application. I'd even surmise that these five components would serve any application well, regardless of purpose.

sqlrockstar

The Actuator - March 22nd

Posted by sqlrockstar Employee Mar 21, 2017

March madness is upon us! I'm not just talking about the NCAA tournament here in the United States, I'm also talking about the Bracket Battle 2017 here on THWACK®. As a former basketball player and coach, I love March. Not only do we get treated to some of the best basketball games, but spring arrives to help us forget all about the winter months.

 

Anyway, here's a handful of links from the intertubz I thought you might find interesting. Enjoy!

 

Netflix is launching a simplified rating system to improve its suggestions

This sounds great until you realize you use your profile to watch movies with your kids, and that's why you keep seeing "Goosebumps" in your recommendations.

 

GitHub - bcicen/ctop: Top-like interface for container metrics

Not a post, but a link to GitHub where you can find a project on "ctop," which is running the top command, but for containers. Yes, I find this fascinating.

 

Star Trek Ransomware Boldly Encrypts

It was only a matter of time before the ransomware folks got around to using Star Trek in some cute attempt to make their crimes seem less horrible.

 

Password Rules Are BS

A nice reminder about the disservice we've done to ourselves with password rules.

 

The most detailed maps of the world will be for cars, not humans

The amount of data behind autonomous cars is staggering. Here's hoping we don't have to wait for them much longer.

 

Thieves are pickpocketing wallet apps in China

Yet another reason to not enjoy QR codes: they are now being used for theft. It won't be long before this crime becomes common here, I expect. This is also the time to remind you that QR codes kill kittens.

 

From a walk in Brussels a few years ago, I think something got lost in translation:

1450133_10202146351411147_773876245_n.jpg

By Joe Kim, SolarWinds Chief Technology Officer

 

We all know that network monitoring is absolutely essential for government IT pros to help ensure IT operations are running at optimal performance. That said, with so many tools available, it’s tempting to monitor everything. But be careful: monitoring everything can quickly turn into too much of a good thing.

 

Having an excessive number of monitoring tools and alerts can result in conflicting metrics and data, overly complex systems, and significant management challenges all working together to undermine an administrators’ ability to accurately identify true network problems.

 

Understanding why, and for whom, systems are monitored, will help IT pros implement the needed tools and be the most useful for enhancing agency IT operations.

 

The Importance of Monitoring

 

Remember, monitoring is critical. The cost of downtime alone makes monitoring operational metrics a necessity. In fact, the value of monitoring is sometimes the driver for “over-monitoring." Some IT pros may think, “The more tools I have, the more insight I get.”

 

The countless number and type of monitoring tools available have increased from monitoring bandwidth, security systems, servers, code management, and implementation metrics, all the way to high-level operational metrics.

 

Unfortunately, most of these tools work independently, and agencies will patch several tools together -- each providing different metrics -- to create a massive monitoring system. With this complex system, monitoring becomes a task in and of itself, taking up IT pros’ valuable time instead of providing a seamless foundation of accurate and actionable monitoring data.

 

Agencies must make smart decisions to remain nimble and keep pace, and that means avoiding mammoth, costly monitoring systems. Solutions that neatly aggregate an agency’s preferred metrics deliver better availability, security, and performance.

 

Find an ideal monitoring solution by evaluating the response to two questions:

 

For whom am I monitoring? Are metrics more important to the operations engineer, the project manager, or agency management? There may be a wide array of monitoring needs, even within the engineering contingent. Determine in advance your monitoring “customer.”

 

What metrics do I really need? What is required to keep things running smoothly, without drowning in alerts and data? Too many alerts and too much data is a frighteningly common problem. Even worse, investing in a separate tool for each is costly and inefficient.

 

In a nutshell, agencies should identify the most valuable audience and metrics to avoid the need for multiple tools.

 

Focus on the Data

 

Remember, the point of monitoring is to inform operational decisions based on collected data. This should be the point that drives monitoring decisions, and the reason to consider investing in a comprehensive monitoring tool.

 

With an increasing demand for a more digital government, maintaining insights into the infrastructure and application level of the IT operations within the agency is critical. Focusing on the audience and the agency’s specific needs will help ensure a streamlined monitoring solution that that helps drive mission success.

 

Find the full article on Government Computer News.

1702_thwack_BracketBattle-Sidekicks_Landing-Page_525x133_v.1.png

 

They’re usually not the ones in the spotlight, although some steal the show.

They’re not the captains; they don’t decide which way to go.

They’re the unsung heroes, the ones in the shadows.

They’ve got your back when you’re battling your foes.

Sidekick is their title; they don’t need to gloat.

One will be the winner after you cast your vote!

Bracket battle is back and bigger than ever.

By the time we’re done, we’ll crown the best sidekick once and forever!

 

Starting today, 33 of the most popular sidekicks will battle it out until only one remains and reigns supreme as the ultimate sidekick. We’ve handpicked a wide range of sidekicks from TV, movies, comics, and video games to make this one of the most diverse bracket battles yet. The starting categories are as follows:

  • What are we going to do tonight, Brain?
  • You can be my wingman any time.
  • Holy ____, Batman!
  • Let your conscience be your guide.

 

We picked the starting point and initial match ups; however, just like in bracket battles past, it will be up to the community to decide who they would want as their partner in crime.

 

 

Bracket battle rules:

Match up analysis:

  • For each sidekick, we’ve provided reference links to wiki pages—to access these, just click on their NAME on the bracket
  • A breakdown of each match up is available by clicking on the VOTE link
  • Anyone can view the bracket and match ups, but in order to vote or comment, you must have a THWACK® account and be logged in

 

Voting:

  • Again, you must be logged in to vote and trash talk
  • You may vote ONCE for each match up
  • Once you vote on a match, click the link to return to the bracket and vote on the next match up in the series
  • Each vote earns you 50 THWACK points! If you vote on every match up in the bracket battle, you can earn up to 1,550 points!

 

Campaigning:

  • Please feel free to campaign for your favorite sidekicks and debate the match ups via the comment section (also, feel free to post pictures of bracket predictions on social media)
  • To join the conversation on social media, use hashtag #SWBracketBattle
  • There is a PDF printable version of the bracket available, so you can track the progress of your favorite picks

 

Schedule:

  • Bracket Release is TODAY, March 20th
  • Voting for each round will begin at 10 a.m. CDT
  • Voting for each round will close at 11:59 p.m. CDT on the date listed on the bracket home page
  • Play-in battle opens TODAY, March 20th
  • Round 1 OPENS March 22nd
  • Round 2 OPENS March 27th
  • Round 3 OPENS March 30th
  • Round 4 OPENS April 3rd
  • Round 5 OPENS April 6th
  • Ultimate sidekick announced April 12th

 

If you have any other questions, please feel free to comment below and we’ll get back to you!

Which one of these sidekicks would you want as your copilot?

We’ll let the votes decide!

 

Access the bracket overview HERE>>

Leon Adato

Berlin404 - Hype Not Found

Posted by Leon Adato Expert Mar 17, 2017

ciscolive2017_booth.png

When I head out to conventions, especially the bigger ones like Cisco Live, I always expect to find some darling technology that has captured imaginations and become the newest entry in every booth denizen's buzzword bingo word list. And most of the time, my expectation is grounded in experience. From SDN to IoT, and on through cloud, container, and BaaS Blah Blah as a Service (BaaS), each trend is heralded with great fanfare, touted with much gusto, and explained with significant confusion or equivocation.

 

Not this year.

 

Chalk it up to the influence of Berlin's tasty beer and solid work ethic if you want, but this year the crowd was clearly interested in "the work of the work," as I like to call it, or "less hat, more cattle," as my friends in Austin might phrase it.

 

Don't get me wrong. The sessions were engaging as ever. The vendor floor was packed. The attendees came early and stayed each day to the end. The DevNet area was bigger than ever before. It was, by every measure, a great conference.

 

More about DevNet: While there were a lot of younger faces, there was no shortage of folks who clearly had put their years in. Patrick was the first to notice it, and it's worth highlighting. Folks with depth skills in a technical area were taking time to begin training on the "new thing," a set of skills that are up-and-coming, which do not match, in any way, the techniques they use right now, which may not even bear a resemblance to their current job. But they were there, session after session, soaking it in and enjoying it.

 

But as I commented to patrick.hubbard and ding, I hadn't yet found "it." And they both pointed out that sometimes the "it" is simply thousands of people spending time and money to come together and share knowledge, build connections, and enjoy the company of others who know what they know and do what they do.

 

Meanwhile, a steady stream of visitors came to the SolarWinds booth asking detailed questions and waiting for answers. Sometimes they had several questions. Often, they wanted to see demos on more than one product. They ooh'ed and ah'ed over our new showstoppers like NetPath and PerfStack (more on those in a minute), but stuck around to dig into IPAM, VNQM, LEM, and the rest.

 

After speaking to someone for a few minutes, visitors were less apt to say, "Can I have my T-shirt now?" and more likely to say, "I would also like to see how you can do ______." For a company that staffs its trade show booths with an "engineers-only" sensibility, it was deeply rewarding.

 

But there was no "trendy" thing people came asking about. There simply was no buzz at this show.

 

Unless - and I'm just throwing it out there - it was US.

 

You see, about a month before Cisco Live, SolarWinds was identified as the global market share leader for network management software (read about it here: http://www.solarwinds.com/company/press-releases/solarwinds-recognized-as-market-leader-in-network-management-software). Now that's a pretty big deal for us back at the office, but would it matter to in-the-trenches IT pros?

 

It mattered.

 

They came to the booth asking about it. To be honest, it was a little weird. Granted, a kind of weird I could get used to, but still weird.

 

So it turns out that Cisco Live didn't feature a buzz-worthy technology, but instead we found out that we got to be the belle of the ball.

 

PostScript: Next year, Cisco Live will be in Barcelona, Spain. Espero ver tu alli y hablar con tu en ingles y español.

DataArrow.JPG

As we come to the end of this series on infrastructure and application data analytics, I thought I'd share my favorite quotes, thoughts, and images from the past few weeks of posts leading up to the PerfStack release.

 

SomeClown leads the way in The One Where We Abstract a Thing

 

"Mean time to innocence (MTTI) is a somewhat tongue-in-cheek metric in IT shops these days, referring to the amount of time it takes an engineer to prove that the domain for which they have responsibility is not, in fact, the cause of whatever problem is being investigated. In order to quantify an assessment of innocence you need information, documentation that the problem is not yours, even if you cannot say with any certainty who does own the problem. To do this, you need a tool which can generate impersonal, authoritative proof you can stand on, and which other engineers will respect. This is certainly helped if a system-wide tool, trusted by all parties, is a major contributor to this documentation."

 

Karen:  Mean Time To Innocence! I'm so stealing that. I wrote a bit about this effect in my post Improving your Diagnostic and Troubleshooting Skills. When there's a major problem, the first thing most of us think is, "PLEASE DON'T LET IT BE ME!"  So I love this thought.

 

demitassenz wrote in PerfStack for Multi-dimensional Performance Troubleshooting

 

"My favorite part was adding multiple different performance counters from the different layers of infrastructure to a single screen. This is where I had the Excel flashback, only here the consolidation is done programmatically. No need for me to make sure the time series match up. I loved that the performance graphs were re-drawing in real-time as new counters were added. Even better was that the re-draw was fast enough that counters could be added on the off chance that they were relevant. When they are not relevant, they can simply be removed. The hours I wasted building Excel graphs translate into minutes of building a PerfStack workspace."

 

Karen:  OMG! I had completely forgotten my days of downloading CSVs or other outputs of tools and trying to correlate them in Excel. As a data professional, I'm happy that we now have a way to quickly and dynamically bring metrics together to make data tell the story it wants to tell.

 

cobrien  NPM 12.1 Sneak Peek - Using Perfstack for Networks

 

"I was exploring some of the data the other day. It’s like the scientific method in real-time. Observe some data, come up with a hypothesis, drag on related data to prove or disprove your hypothesis, rinse, and repeat."

 

Karen:  Data + Science.  What's not to love?

 

SomeClown mentioned in Perfstack Changes the Game

 

"PerfStack can now create dashboards on the fly, filled with all of the pertinent pieces of data needed to remediate a problem. More than that, however, they can give another user that same dashboard, who can then add their own bits and bobs. You are effectively building up a grouping of monitoring inputs consisting of cross-platform data points, making troubleshooting across silos seamless in a way that it has never been before."

 

Karen: In my posts, I focused a lot on the importance of collaboration for troubleshooting. Here, Teren gets right to the point. We can collaboratively build analytics based on our own expertise to get right to the point of what we are trying to resolve.  And we have data to back it up.

 

aLTeReGo in a post demo-ing how it works, Drag & Drop Answers to Your Toughest IT Questions

 

"Sharing is caring. The most powerful PerfStack feature of all is the ability to collaborate with others within your IT organization; breaking down the silo walls and allowing teams to triage and troubleshoot problems across functional areas. Anything built in PerfStack is sharable. The only requirement is that the individual you're sharing with has the ability to login to the Orion web interface. Sharing is as simple as copying the URL in your browser and pasting it into email, IM, or even a help desk ticket."

Perfstackstacked.png

Karen: Yes! I also wrote about how important collaboration is to getting problems solved fast.

 

demitassenz shared in Passing the Blame Like a Boss

 

"One thing to keep in mind is that collaborative troubleshooting is more productive than playing help desk ticket ping pong. It definitely helps the process to have experts across the disciplines working together in real time. It helps both with resolving the problem at hand and with future problems. Often each team can learn a little of the other team’s specialization to better understand the overall environment. Another underappreciated aspect is that it helps people to understand that the other teams are not complete idiots. To understand that each specialization has its own issues and complexity.

 

Karen: Help desk ticket ping pong. If you've ever suffered through this, especially when someone passes the tick back to you right before the emergency "why haven't we fixed this yet" meeting with the CEO, you'll know the pain of it all.

 

SomeClown observed in More PerfStack - Screenshot Edition

 

"In a nutshell, what it allows you to do is to find all sorts of bits of information that you're already monitoring, and view it all in one place for easy consumption. Rather than going from this page to that, one IT discipline-domain to another, or ticket to ticket, PerfStack gives you more freedom to mix and match, to see only the bits pertinent to the problem at hand, whether those are in the VOIP systems, wireless, applications, or network. Who would have thought that would be useful, and why haven't we thought of that before?"

 

Karen: "Why haven't we thought of that before?" That last bit hit home for me. I remember working on a project for a client to do a data model about IT systems. This was at least 20 years ago. We were going to build an integrated IT management systems so that admins could break through the silo-based systems and approaches to solve a major SLA issue for our end-users. We did a lot of work until the project was deferred when a legislative change meant that all resources needed to be redirected to meet those requirements. But I still remember how difficult it was going to be to pull all this data together. With PerfStack, we aren't building a new collection system.  We are applying analytics on top of what we are already collecting with specialized tools.

 

DataChick's Thoughts

 

This next part is cheating a bit, because the quotes are from my own posts. But hey, I also like them and want to focus on them again.

 

datachick in Better Metrics. Better Data. Better Analytics. Better IT.

 

"As a data professional, I'm biased, but I believe that data is the key to successful collaboration in managing complex systems. We can't manage by "feelings," and we can't manage by looking at silo-ed data. With PerfStack, we have an analytics system, with data visualizations, to help us get to the cause faster, with less pain-and-blame. This makes us all look better to the business. They become more confident in us because, as one CEO told me, "You all look like you know what you are doing." That helped when we went to ask for more resources."

 

Karen: We should all look good to the CEO, right?

 

datachick ranted in 5 Anti-Patterns to IT Collaboration: Data Will Save You

 

"These anti-patterns don't just increase costs, decrease team function, increase risk, and decrease organizational confidence, they also lead to employee dissatisfaction and morale. That leads to higher turnover (see above) and more pressure on good employees. Having the right data, at the right time, in the right format, will allow you to get to the root cause of issues, and better collaborate with others faster, cheaper, and easier.  Also, it will let you enjoy your 3:00 ams better."

 

I enjoyed sharing my thoughts on these topics and reading other people's posts as well. It seems bloggers here shared the same underlying theme of collaboration and teamwork. That made this Canadian Data Chick happy. Go, everyone. Solve problems together.  Do IT better.  And don't let me catch you trying to do any of that without data to back you up. Be part of #TeamData.

There are many books, websites, and probably self-help videos devoted to teaching or explaining the art of troubleshooting. Most are specific to an industry, and further to a problem domain within that industry. Within each problem domain, within each industry, within each methodology, there are tools of the trade designed to help you solve whatever problem is vexing you at that moment. The specificity of all of this, however, can be abstracted out of these insular, domain-specific modalities to affect a greater understanding of the role of troubleshooting in general.

 

It goes without saying that you cannot find something that you do not know you are looking for, and yet this is what a lot of neophyte engineers instinctively try. “The phones are down” may seem like the problem you need to fix, but counterintuitively that is only a symptom of the real problem. The real problem, the one causing the phones to be down, lies elsewhere. While you run around trying to figure out what’s up with the phones, what you should be thinking is, “For what reason(s) is/are the phones ‘down’?” and move from there. For example, are all the phones down? Some? Are there other symptoms? And what has changed recently, if anything? Once you’ve worked through some of this, which may only take seconds or minutes for a seasoned engineer, you’re more prepared to move onto the next steps.

 

Analyzing the problem(s), or problem statements, will help you to form some hypothesis as to where the problem is likely to lie. Now, how can you begin testing your ideas to see if you are on the right track? Well, in the IT world that we all live in (I know, I said abstracted…), you’re going to need information. Information gathering can be a manual process, and in many cases must be, but having good tools at your disposal can certainly help the process along the way, especially when you are shooting in the dark, so to say. Again, if you don’t know what you don’t know, an automated and impartial tool can help.

 

Tool impartiality is often overlooked as a step in the discovery phase of troubleshooting any problem. Plumbers have scopes to look inside of pipes that they cannot see; electricians have multi-meters to help them test connectivity, resistance, etc.; and you as an IT professional have tools like PerfStack. A tool like this happily gathers information from all of your systems, jumping to no conclusions, and can call out abnormalities in the steady state of a system. Where many engineers skip straight to the “trying to fix anything they suspect is the problem” phase, PerfStack simply presents what it sees in an impartial and authoritative manner. From its dashboards, an engineer can begin his/her search from a position of knowledge. Combine that with the wisdom that comes from experience, and you have a very strong team.

 

Mean time to innocence (MTTI) is a somewhat tongue-in-cheek metric in IT shops these days, referring to the amount of time it takes an engineer to prove that the domain for which they have responsibility is not, in fact, the cause of whatever problem is being investigated. In order to quantify an assessment of innocence you need information, documentation that the problem is not yours, even if you cannot say with any certainty who does own the problem. To do this, you need a tool that can generate impersonal, authoritative proof you can stand on, and which other engineers will respect. This is certainly helped if a system-wide tool, trusted by all parties, is a major contributor to this documentation.

 

A tool like PerfStack will certainly help in getting buy-off from the pointy-haired bosses as to what needs to happen to fix whatever needs fixing. Most organizations have a change control process--though likely an amended one during any kind of outage—and documentation is always a part of that. And all of this stuff, this paper trail from beginning to end, flows together nicely right into the final package that many organizations require for a post-mortem. Engineers and management can get through an after-the-fact incident meeting much quicker, and with likely consensus, with a clean and robust set of documents.

 

At the end of the day, troubleshooting is an art no matter what you do, where you do it, or in what industry you live. The methodologies are largely the same at a macro level, as are the need for quality tools. Can a great engineer find the root cause of a problem without a comprehensive tool like PerfStack? Sure. A cobbled together band of point tools has always been a part of the engineer’s toolkit and likely always will be, at least until our new sentient robotic overlords obviate the need for that. But a full-scale, system-wide solution like PerfStack should also be a part of any well-stocked engineering team’s process. After all, it can help find those things you do not yet know you are looking for.

sqlrockstar

The Actuator - March 15th

Posted by sqlrockstar Employee Mar 15, 2017

The Ides of March are upon us. And with the Ides comes one of my least favorite things: Daylight Saving Time. I'm one of those "UTC forever" fanboys, because I've suffered having to work with systems that fail to consider how to track, or properly convert, datetimes. On the other hand, I do recognize the trouble by trying to convert everyone to UTC. The link at the end is a nice thought experiment for anyone who has to work with datetimes.

 

As always, here's a handful of links from the intertubz I thought you might find interesting. Enjoy!

 

Spammergate: The Fall of an Empire

One dozen people, 1.4 billion emails a day. Nice summary of how a handful of groups came together to take this spam empire offline.

 

The WikiLeaks CIA hacking documents include spy tools literally from sci-fi

Most of the articles about the CIA spy hacks last week were like this one: clickbait. The details in the Wikileaks appear to be quite dated, resulting in Apple, Google, and Microsoft to all declare that the vulnerabilities have been patched long ago.

 

Artificial intelligence: Cooperation vs. aggression

A nice reminder that computers will do what we tell them to do. If we tell them to shoot people, we shouldn't be surprised when they shoot people.

 

DevOps has reached critical mass, CIOs need to get on board

I dislike the marketing term DevOps, but I *love* how it helps describe a modern software development lifecycle. It's hard to believe that there is any CIO out there that isn't subscribing to such methods.

 

Serverless is the new Multitenancy

A quick summary of the future of SaaS. Server-less architecture is going to allow cloud providers the ability to scale further than they do now. I suspect that for the end-user we won't have to worry about creating such functions, we will just click buttons and the plumbing will be handled for us.

 

30 Questions to Ask a Severless Fanboy (or Fangirl)

Because if server-less gets brought into a discussion you are having, you should be prepared to ask a few basic questions.

 

So You Want Continuous Time Zones

Someone had a bit of time on their hands, pun intended. This thought experiment was worth the time, and has a wonderful conclusion: "The sad, ultimate truth of modern timekeeping is this: it's not perfect, but it doesn't honestly get a whole lot better."

 

With the Perfstack™ launch this week, I am reminded how much I love working here at SolarWinds, and this image best describes why:

skills_love_money_rz.jpg

One of the common challenges in troubleshooting performance issues is that the multiple dimensions belong to different teams. Co-ordinating the troubleshooting across the teams can bring its own challenges. I really like the feature of PerfStack where the dashboard URL contains all of the information required to recreate the dashboard. The net result is that I can paste that one URL into my help desk ticket to include the evidence to hand off an issue to another team. Equally, when another team sends me a ticket it can already have a dashboard to jump start my troubleshooting.

 

I’ve seen help desk tickets bounce around from team to team within large organizations. As any network engineer will tell you, the network is always blamed first. To prove the issue isn’t the network, you craft together some graphs showing that all the latency is in a VM. Then you paste a screenshot of the graphs into the help desk system and reassign the ticket to the virtualization team. Shortly afterward the virtualization team says they are unable to see the issue and can you provide more details. This poor handoff between departments slows the whole process. The handoff makes it difficult to resolve the problem for the application end-users. It also makes every team feel like the other teams are idiots because they cannot see the obvious problems.

 

With PerfStack, you are able to hand the virtualization team a live graph showing the performance issue as being a VM problem. The virtualization team can take that URL and make changes to the dashboard. They might add VM specific counters and also information from inside the operating system. The VM team may identify that the issue is happening within  SQL server. They hand it off to the DBAs, with the URL for an updated dashboard. The DBAs rebuild the indices (or something) and all the performance problems go away. The important thing is that the handoff between teams has far more actionable information. Each team can take the information from the previous team and adapt it to their own view of the world. The context of each team's information remains through the URLs in the ticket. This encapsulation into an URL was one of my favorite little features of the PerfStack demonstration.

 

One thing to keep in mind is that collaborative troubleshooting is more productive than playing help desk ticket ping pong. It definitely helps the process to have experts across the disciplines working together in real-time. It helps both with resolving the problem at hand and with future problems. Often each team can learn a little of the other team’s specialization to better understand the overall environment. Another under-appreciated aspect is that it helps people to understand that the other teams are not complete idiots, that each specialization has its own issues and complexity.

By Joe Kim, SolarWinds Chief Technology Officer

 

Analysts and other industry experts have defined application performance monitoring as software that incorporates analytics, end-user experience monitoring, and a few other components, but this is just a small and basic part of the APM story. True APM is rich and nuanced, incorporating different approaches and tools for one common goal: keeping applications healthy and running smoothly.

 

Two Approaches to APM

 

How you use APM will depend on your agency’s environment. For example, you may prefer an APM approach that allows you to go inside underperforming applications and make changes directly to the code. In other cases, you may simply need to assess the overall viability of applications to help ensure their continued functionality. There are two very different methodologies that address both of these needs.

 

To solve a slow application problem, you may wish to dig down into the code itself to discover how long it takes for each portion of that code to process a transaction. From this, you’ll be able to determine, in aggregate, the total amount of transaction processing time for that application.

 

For this, you can use application bytecode instrumentation monitoring (ABIM) and Distributed Tracing. ABIM allows you to insert instrumentation into specific parts of the code. Monitoring processing times gives you information to accurately pinpoint where the problem exists and rectify the issue. For more complex application infrastructure that are distributed in nature, you can use Distributed Tracing to tag and track processes that go across multiple stacks and platforms. It’s a very specific and focused approach to APM, almost akin to a surgeon extracting a tumor.

 

Another, more general – though no less effective – approach is application interface performance management (AIPM). If ABIM is the surgeon’s tool, AIPM is something that a general practitioner might use.

 

AIPM allows you to monitor response times, wait times, and queue length, and provides near real-time visibility into application performance. You can receive instant alerts and detailed analytics regarding the root cause of problems. Once issues are identified, you can respond to them quickly and help your agency avoid unnecessary and costly application downtime.

 

Tools and Their Features

 

There are a number of different monitoring solutions on the market, and it can be hard to determine which technologies will best fit your agency’s unique needs. Most of them will do the basics – alerts, performance metrics, etc. — but there are certain specialized features you’ll also want to look out for:

 

Insight into all of your applications. Applications are the lifeblood of an agency, and you’ll need solutions that provide you with insight into all of them, preferably from a single dashboard or control point.

 

A glimpse into the health of your hardware. Hardware failure can cause application performance issues. You’ll need to be able to monitor server hardware components and track things like high CPU load and other issues to gain insight into how they may be impacting application performance.

 

The ability to customize for different types of applications. Different types of applications (for example, custom or home-grown apps) may have various monitoring requirements you’ll need tools that are adaptable depending on the applications in your stack.

 

As you can see, APM is far more intricate than some may have you believe, and that’s a good thing. You have far more resources at your fingertips than you may have thought. With the right combination of approaches and tools, you’ll be able to tackle even the trickiest application performance issues.

 

Find the full article on our partner DLT’s blog, Technically Speaking.

kong.yang

The Troubleshooting Radius

Posted by kong.yang Employee Mar 10, 2017

Most of the time, IT pros gain troubleshooting experience via operational pains. In other words, something bad happens and we, as IT professionals, have to clean it up. Therefore, it is important for you to have a troubleshooting protocol in place that is specific to dependent services, applications, and a given environment. Within those parameters, the basic troubleshooting flow should look like this:

 

      1. Define the problem.
      2. Gather and analyze relevant information.
      3. Construct a hypothesis on the probable cause for the failure or incident.
      4. Devise a plan to resolve the problem based on that hypothesis.
      5. Implement the plan.
      6. Observe the results of the implementation.
      7. Repeat steps 2-6.
      8. Document the solution.

 

Steps 1 and 2 usually lead to a world of pain. First of all, you have to define the troubleshooting radius, the surface area of systems in the stack that you have to analyze to find the cause of the issue. Then, you must narrow that scope as quickly as possible to remediate the issue. Unfortunately, remediating in haste may not actually lead to uncovering the actual root cause of the issue. And if it doesn’t, you are going to wind up back at square one.

 

You want to get to the single point of truth with respect to the root cause as quickly as possible. To do so, it is helpful to combine a troubleshooting workflow with insights gleaned from tools that allow you to focus on a granular level. For example, start with the construct that touches everything, the network, since it connects all the subsystems. In other words, blame the network. Next, factor in the application stack metrics to further shrink the troubleshooting area. This includes infrastructure services, storage, virtualization, cloud service providers, web, etc. Finally, leverage a collaboration of time-series data and subject matter expertise to reduce the troubleshooting radius to zero and root cause the issue.

 

If you think of the troubleshooting area as a circle, as the troubleshooting radius approaches zero, one gets closer to the root cause of the issue. If the radius is exactly zero, you’ll be left with a single point. And that point should be the single point of truth about the root cause of the incident.

 

Share examples of your troubleshooting experiences across stacks in the comments below.

Late last month, shockwaves were sent through the SAP customer base as a UK court ruled in favor of SAP and against the mega spirits supplier Diageo in an indirect licensing case. The court determined that Diageo was violating SAP’s licensing T&Cs when they were connecting a 3rd-party app to their SAP ERP for a myriad of business process life cycles. In their claim, SAP is asking for £60m in unpaid fees. Yes, £60m! Pending appeal, the court will make a decision on the actual amount to be paid within the month. As a fellow SAP customer, my company is now in a hurry to audit all the systems that are connecting to our SAP ERP to verify compliance, regardless of the fact that we conduct a license “True Up” with SAP every year.

 

This case reminds me of a licensing change that Microsoft made for SQL Server back in 2011, aka “The Money Grab." Microsoft decided to change enterprise agreement licensing in late 2011 for SQL servers from per-processor to per-core. This left many companies, mine included, scrambling to reduce, consolidate, or eliminate SQL servers ahead of their enterprise agreement renewal with Microsoft, usually with only a couple of months’ notice.

 

A common, and humorous, comparison that I often come across is that Lincoln’s historic Gettysburg Address clocks in at a shade over two minutes, yet the standard EULA for any software these days is more than three pages. Who has the time or patience to read that? Now ask yourself, how many software packages and applications do you have running across your enterprise? Do you, or someone else at your company, know the terms and conditions of the licensing for these software packages? Better yet, are they being regularly audited for compliance and/or usage reviewed to minimize spend? Don’t fear. There are many firms out there ready to provide their services when it comes to software license audits, but for a hefty sum.

 

It's difficult to predict the next “Money Grab” and who it will come from. I predict that as more companies go all in with the cloud, it will come from there. Think about it: IAAS equals cheap space and cheap processing for hungry consumers.

 

How do you react when it is too late and the vendor is knocking on your door? How do you remain proactive, stay organized, and prevent sprawl? Do you have all your T&Cs on file?

Filter Blog

By date:
By tag: