The SolarWinds crew including sqlrockstar, chrispaap, and I just returned stateside after a successful jaunt across the Atlantic at VMworld Europe in Barcelona, Spain. Thank you to all of the attendees who joined us at Tom’s speaking sessions and at our booth. Thank you to Barcelona for your hospitality!
Below are a few pictures of the SolarWinds team as we walked the walk and talked the talk of monitoring with discipline.
|The SolarWinds Family at VMworld Europe 2017 in Barcelona||The SolarWinds Family Team Dinner|
Our journey doesn’t stop with the end of the VMworld two-continent tour. We are about to ignite a full course of monitoring with discipline in Orlando. At Microsoft Ignite, visit us in Booth #1913 for the most 1337 swag as well as fantastic demos on monitoring hybrid IT with discipline.
Let us know in the comments if you'll be joining us in Orlando for Microsoft Ignite.
IT organizations are embracing hybrid IT because these services and technologies are critical to enabling the full potential of an application’s disruptive innovation. Although change is coming fast, the CIO’s mission remains the same: keep the app healthy and running smoothly. It’s time for application performance management to extend its strategy and practice to handle the modern application’s needs.
In the "Extend Your Modern APM Strategy" session, I will be joined by a panel of SolarWinds product experts, including Jerry Schwartz, director of product marketing, Robert Mandeville, product marketing manager, product managers Steven Hunt, Chris Paap, and Chris O'Brien, and Dan Kuebrich, director of engineering. We will explore the five elements of a modern approach to APM, including product demonstrations. The session will cover concepts from WPM to response-time analysis. After attending this session, you will have a better understanding of what an APM approach entails and the technologies that are available to support each of the five fundamental aspects of APM.
After attending this session, you will have a better understanding of what a comprehensive APM strategy entails and what technologies are available to support each of the five elements.
THWACKcamp is the premier virtual IT learning event connecting skilled IT professionals with industry experts and SolarWinds technical staff. Every year, thousands of attendees interact with each other on topics like network and systems monitoring. This year, THWACKcamp further expands its reach to consider emerging IT challenges like automation, hybrid IT, cloud-native APM, DevOps, security, and more. For 2017, we’re also including MSP operators for the first time.
THWACKcamp is 100% free and travel-free, and we'll be online with tips and tricks on how to your use SolarWinds products better, as well as best practices and expert recommendations on how to make IT more effective regardless of whose products you use. THWACKcamp comes to you so it’s easy for everyone on your team to attend. With over 16 hours of training, educational content, and collaboration, you won’t want to miss this!
Check out our promo video and register now for THWACKcamp 2017! And don't forget to catch our session!
A big hearty THANK YOU to everyone who joined us at our SolarWinds booth, breakout and theater sessions, and Monitoring Morning! We were excited to converse with you in person about the challenges that practitioners face.
|SolarWinds Views from VMworld|
There were plenty of announcements at VMworld. The two that stood out for me were:
Finally, I was invited to Future:Net, a conference within a conference. It was really cool to talk shop about the latest academic research, as well as what problems the next generation of startups are trying to solve.
The SolarWinds “Virtualization Monitoring with Discipline” VMworld tour is about to start and we are bringing solutions and swag.
At VMworld® US in Las Vegas, the SolarWinds family is bringing a new shirt, new stickers and buttons, new socks, and a new morning event. And that’s not all we’re bringing to VMworld.
Monitoring With Discipline to Master your Virtualized Universe
Performance Tuning and Monitoring for Virtualized Database Servers
SQL Server ® on vSphere ®: A Panel with Some of the World’s Most Renowned Experts
Another first is that SolarWinds will be on the Solutions Expo floor at VMworld Europe in Barcelona. In the lead-up to the event, we’ll be hosting a pre-VMworld Europe webcast to talk shop about Virtualization Manager and its virtue for empowering troubleshooting in the highly virtualized domain of hybrid IT.
Performance Tuning and Monitoring for Virtualized Database Servers
I’ll update this section with details as they become available.
Let me know in the comment section if you will be in attendance at VMworld US or VMworld Europe. If you can’t make it to one of these events, let me know how we at SolarWinds can better meet and exceed your virtualization pain points.
Logs are insights into events, incidents, and errors recorded over time on monitored systems, with the operative word being monitored. That’s because logging may need to be enabled for those systems that depend on defaults, or if you’ve inherited an environment that was not configured for logging. For the most part, logs are retained to maintain compliance and governance standards. Beyond this, logs play a vital role in troubleshooting.
For VMware® ESXi and Microsoft® Hyper-V® nodes, logs represent quintessential troubleshooting insights across that node’s stack, and can be combined with alerts to trigger automated responses to events or incidents. The logging process focuses on which logs to aggregate, how to tail and search those logs, and what analysis needs to look like with the appropriate reactions to that analysis. And most importantly, logging needs to be easy.
Configuring system logs for VMware and Microsoft is a straightforward process. For VMware, one can use the esxcli command or host profiles. For Microsoft, look in the Event Viewer under Application and Services Logs -> Microsoft -> Windows and specifically, Hyper-V-VMMS (Hyper-V Virtual Machine Management service) event logs. The challenge is efficiently and effectively handling the logging process as the number of nodes and VMs in your virtual environment increase in scale. The economies of scale can introduce multi-level logging complexities thereby creating troubleshooting nightmares instead of being the troubleshooting silver bullets. You can certainly follow the Papertrail if you want the easy log management button at any scale.
The question becomes, would your organization be comfortable with, and actually approve of, cloud-hosted log management, even with encrypted logging, where the storage is Amazon® S3 buckets? Let me know in the comment section below.
Continuous integration. Continuous delivery. Cloud. Containers. Microservices. Serverless. IoT. Buzzworthy tech constructs and concepts are signaling a change for IT professionals. As IT pros adapt and evolve, the application remains the center of the change storm. More importantly, the end goal for IT remains essentially the same as it always has been: keep the revenue-impacting applications performing as optimally as possible. Fundamental principles remain constant below the surface of anything new, disruptive, and innovative. This applies to IT titles and responsibilities as well.
Take, for example, the role of site reliability engineer (SRE), which was Ben Treynor’s 2003 creation at Google. He describes it as what happens when you ask a software engineer to perform an operations function. Google lists it as a discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Even before the coining of the term SRE, there were IT professionals who came before and built out massively distributed, fault-tolerant, large-scale systems. They just weren’t called SREs. Fast forward to 2008, and another title started to gain momentum: DevOps engineer aka the continuous integration/continuous delivery engineer. Regardless of their titles, core competencies remain fundamentally similar.
Speaking of IT titles. How do you identify yourself with respect to your professional title? I've been a lab monitor, a systems engineer, a member of technical staff, a senior consultant, a practice leader, and now a Head GeekTM. Does your title bring you value? Let me know in the comment section below.
On the surface, application performance management (APM) is simply defined as the process of maintaining acceptable user experience with respect to any given application by "keeping applications healthy and running smoothly." The confusion comes when you factor in all the interdependencies and nuances of what constitutes an application, as well as what “good enough” is.
APM epitomizes the nature vs nurture debate. In this case, nurture is the environment, the infrastructure, and networking services, as well as composite application services. On the other hand, nature is the code level elements formed by the application’s DNA. The complexity of nature and nurture also plays a huge role in APM because one can nurture an application using a multitude of solutions, platforms, and services. Similarly, the nature of the application can be coded using a variety of programming languages, as well as runtime services. Regardless of nature or nurture, APM strives to maintain good application performance.
And therein lies the million dollar APM question: What is good performance? And similarly, what is good enough in terms of performance? Since every data center environment is unique, good can vary from organization to organization, even within the same vertical industry. The key to successful APM is to have proper baselines, trends reporting, and tracing to help ensure that Quality-of-Service (QoS) is always met without paying a premium in terms of time and resources while trying to continuously optimize an application that may be equivalent to a differential equation.
Let me know in the comment section what good looks like with respect to the applications that you’re responsible for.
If I learned anything from Tetris, it’s that errors pile up and accomplishments disappear.
– Andrew Clay Shafer (@littleidea on Twitter).
In IT, we make our money and maintain our job by being right. And we have to be right more often than not because the one time we are wrong might cost us our job. This kind of pressure can lead to a defensive, siloed mentality. If might equals right, then look for an IT working environment that is conducive to hostilities and sniping.
I’ve witnessed firsthand the destructive nature of a dysfunctional IT organization. Instead of working as a cohesive team, that team was one in which team members would swoop in to fix issues only after a colleague made a mistake. It was the ultimate representation of trying to rise to the top over the corpses of colleagues. Where did it all go wrong? Unfortunately, that IT organization incentivized team members to outdo one another for the sake of excellent performance reviews and to get ahead in the organization. It was a form of constant hazing. There were no mentors to help guide young IT professionals to differentiate between right and wrong.
Ultimately, it starts and ends with leadership and leaders. If leaders allow it, bad behaviors will remain pervasive in the organization’s culture. Likewise, leaders can nip such troubling behavior in the bud if they are fair, firm, and consistent. That IT team’s individual contributors were eventually re-organized and re-assigned once their leaders were dismissed and replaced.
Rewards and recognition come and go. Sometimes it’s well-deserved and other times we don’t get the credit that’s due. Errors, failures, and mistakes do happen. Don’t dwell on them. Continue to [ learn and ] move forward. A career in IT is a journey and a long one at that. Mentees do have fond memories of mentors taking the time to help them become a professional. Lastly, remember that kindness is not weakness, but rather an unparalleled kind of strength.
I was fortunate enough to be in the audience for my friend, Dave McCrory's presentation at Interop during the Future of Data Summit. Dave is currently the CTO of Basho, and he famously coined the term "data gravity" in 2010. Data gravity, or as friends have come to call it, McCrory's Law, simply states that data is attracted to data. Data now has such critical mass that processing is moving to it versus data moving to processing.
Furthermore, Dave introduced this notion of data agglomeration, where data will migrate to and stick with services that provide the best advantages. Examples of this concept include car dealerships and furniture stores being in the same vicinity, just as major cities of the world tend to be close to large bodies of water. In terms of cloud services, this is the reason why companies that incorporate weather readings are leveraging IBM Watson. IBM bought The Weather Company and all their IoT sensors, which has produced and continues to produce massive amounts of data.
I can't do enough justice to the quality of Dave's content and its context in our current hybrid IT world. His presentation was definitely worth the price of admission to Interop. Do you think data has gravity? Do you think data agglomeration will lead to multi-cloud service providers within an organization that is seeking competitive advantages? Share your thoughts in the comment section below.
I have lots of conversations with colleagues and acquaintances in my professional community about career paths. The question that inevitably comes up is whether they should continue down their certification path with specific vendors like VMware, Microsoft, Cisco, and Oracle, or should they pursue new learning paths, like AWS and Docker/Kubernetes?
Unfortunately, there is no answer that fits each individual because we each possess different experiences, areas of expertise, and professional connections. But that doesn't mean you can't fortify your career. Below are tips that I have curated from my professional connections and personal experiences.
As organizations embrace digital transformation, there are three questions that every organization will ask IT professionals:
How you respond to these questions will determine your future within that organization and in the industry.
So what do you think of my curated tips? What would you add or subtract from the list? Also, how about those three organizational questions? Are you being asked those very questions by your organization? Let me know in the comment section.
A final note: I will be at Interop ITX in the next few weeks to discuss this among all the tech-specific conversations. If you will be attending, drop me a line in the comment section and let’s meet up.
Raise your hand if you have witnessed firsthand rogue or shadow IT. This is when biz, dev, or marketing goes directly to cloud service providers for infrastructure services instead of going through your IT organization. Let's call this Rogue Wars.
Recently, I was talking to a friend in the industry about just such a situation. They were frustrated with non-IT teams, especially marketing and web operations, procuring services from other people’s servers. These rogue operators were accessing public cloud service providers to obtain infrastructure services for their mobile and web app development teams. My friend's biggest complaint was that his team was still responsible for supporting all aspect of ops, including performance optimization, troubleshooting, and remediation, even though they had zero purviews or access into the rogue IT services.
They were challenged by the cloud’s promise of simplified self-service. The fact that it's readily available, agile, and scalable was killing them softly with complexities that their IT processes were ill prepared for. For example, the non-IT teams did not leverage proper protocol to retire those self-service virtual machines (VMs) and infrastructure resources that form the application stack.That meant that they were paying for resources that no longer did work for the organization. Tickets were also being opened for slow application performance, but the IT teams had zero visibility to the public cloud resources. For this reason, they could only let the developers know that the issue was not within the purview of internal IT. Unfortunately, they were handed the responsibility of resolving the performance issue.
This is how the easy button of cloud services is making IT organizations feel the complex burn. Please share your stories of rogue/shadow IT in the comments below. How did you overcome it, or are you still cleaning up the mess?
SolarWinds recently released the 2017 IT Trends Report: Portrait of a Hybrid IT Organization, which highlights the current trends in IT from the perspective of IT professionals. The full details of the report, as well as recommendations for hybrid IT success, can be found at it-trends.solarwinds.com.
The findings are based on a survey fielded in December 2016. It yielded responses from 205 IT practitioners, managers, and directors in the U.S. and Canada from public and private-sector small, mid-size, and enterprise companies that leverage cloud-based services for at least some of their IT infrastructure. The results of the survey illustrate what a modern hybrid IT organization looks like, and shows cost benefits of the cloud, as well as the struggle to balance shifting job and skill dynamics.
The following are some key takeaways from the 2017 IT Trends Report:
Cloud and hybrid IT are a reality for many organizations today. They have created a new era of work that is more global, interconnected, and flexible than ever. At the same time, the benefits of hybrid IT introduce greater complexity and technology abstraction. IT professionals are tasked with devising new and creative methods to monitor and manage these services, as well as prepare their organizations and themselves for continued technology advancements.
Are these consistent with your organizational directives and environment? Share your thoughts in the comment section below.
Troubleshooting efficiency and effectiveness are core to uncovering the root cause of incidents and bad events in any data center environment. In my previous post about the troubleshooting radius and the IT seagull, troubleshooting efficacy is the key performance indicator in fixing it fast. But troubleshooting is an avenue that IT pros dare not to walk too often for fear of being blamed for being incompetent or incorrect.
We still need to be right a lot more than we are wrong. Our profession does not give quarters when things go wrong. The blame game anyone? When I joined IT operations many a years ago, one of my first mentors gave me some sage advice from his own IT journey. It’s similar to the three envelope CEO story that many IT pros have heard before.
A lifetime of troubleshooting comes with its ups and downs. Looking back, it has provided many an opportunity to change my career trajectory. For instance, troubleshooting the lack of performance boost from a technology invented by the number one global software vendor almost cost me my job; but it also re-defined me as a professional. I learned to stand up for myself professionally. As Agent Carter states, "Compromise where you can. And where you can’t, don’t. Even if everyone is telling you that something wrong is something right, even if the whole world is telling you to move. It is your duty to plant yourself like a tree, look them in the eye and say, no. You move." And I was right.
It’s interesting to look back, examine the events and associated time-series data to see how close to the root cause signal I got before being mired in the noise or vice-versa. The root cause of troubleshooting this IT career is one that I’m addicted to, whether it’s the change and the opportunity or all the gains through all the pains.
Share your career stories and how troubleshooting mishap or gold brought you shame or fame below in the comment section.
Most of the time, IT pros gain troubleshooting experience via operational pains. In other words, something bad happens and we, as IT professionals, have to clean it up. Therefore, it is important for you to have a troubleshooting protocol in place that is specific to dependent services, applications, and a given environment. Within those parameters, the basic troubleshooting flow should look like this:
Steps 1 and 2 usually lead to a world of pain. First of all, you have to define the troubleshooting radius, the surface area of systems in the stack that you have to analyze to find the cause of the issue. Then, you must narrow that scope as quickly as possible to remediate the issue. Unfortunately, remediating in haste may not actually lead to uncovering the actual root cause of the issue. And if it doesn’t, you are going to wind up back at square one.
You want to get to the single point of truth with respect to the root cause as quickly as possible. To do so, it is helpful to combine a troubleshooting workflow with insights gleaned from tools that allow you to focus on a granular level. For example, start with the construct that touches everything, the network, since it connects all the subsystems. In other words, blame the network. Next, factor in the application stack metrics to further shrink the troubleshooting area. This includes infrastructure services, storage, virtualization, cloud service providers, web, etc. Finally, leverage a collaboration of time-series data and subject matter expertise to reduce the troubleshooting radius to zero and root cause the issue.
If you think of the troubleshooting area as a circle, as the troubleshooting radius approaches zero, one gets closer to the root cause of the issue. If the radius is exactly zero, you’ll be left with a single point. And that point should be the single point of truth about the root cause of the incident.
Share examples of your troubleshooting experiences across stacks in the comments below.
What’s an IT seagull? An IT seagull is a person who swoops into projects, takes a dump on everything, then flies off, leaving the rest of the team to clean up the mess. We all know an IT seagull. They tend to be in management positions. Heck, we might even be guilty of being IT seagulls ourselves every once in a while. So how does one prevent IT seagulls from wreaking havoc on mission-critical projects?
First, stick to the data – specifically, time series data that can be correlated to root cause issues. The key to troubleshooting is to quickly surface the single point of truth to eliminate the IT blame game. This effectively nullifies IT seagulls with data that clearly shows and correlates cause and effect.
Second, collaboration tends to deter IT seagulls. Being able to share an expert’s point of view with another subject matter expert to give them specific insights to the problem is powerful when you are trying to quickly remediate issues across multiple stacks because it allows decisive action to take place.
Third, by focusing on the connected context provided by time series data that cuts across the layers of the entire stack, teams can eliminate the IT seagull’s negative potential, even as they are busy dropping gifts that keep on giving from the skies.
Do you know any IT seagulls? Share your stories and how you overcame them in the comments below.