1 2 3 Previous Next

Geek Speak

1,465 posts

Back in 1989, I received a copy of a program that promised to be groundbreaking. It was called, “Forest-and-the-Trees.” The software that came on a 3.5” floppy proposed to scan my computer's hard drive (a whopping 80Mb!), detect patterns, trends, data, and present them in such a way that I could have insight into my businessimpossible to imagine just a few short years before.

 

For the sake of this discussion, it doesn't matter that the software was less than impressive (or at least, it was less than impressive with the negligible amount of “business” data I had on my computer). What matters is that our desire to use all the computing power at our disposal to find connections we might not otherwise see is deeply-ingrained, and goes as far back as computers themselves.

 

We at SolarWinds have been spilling a lot of digital ink talking about “AppStack” (links links links).

 

As a member of the SolarWinds team, I think it's worth the time and space to discuss because it satisfies that same deep-seated desire to have the computer find the connections, and show us how everything fits together in ways we may not otherwise see.

 

On a personal level, I love it for the sheer cool factor and for the fact that it's effectively “free.” SolarWinds AppStack isn't as much a product as it is a philosophy and an over-riding architectural goal. That's right, can't run out and buy a box of AppStack. It's not available in stores at any price. Operators are not standing by to take your call. You “get” a bit of AppStack in every SolarWinds product you buy, and the insight grows with each solution you add to the pile (thus the “stack” of Apps).

 

You can get a glimpse of the power of AppStack in SolarWinds lab episode 25 (jump to the 11:35 minute mark for just AppStack, but the whole episode is worth watching).

 

But what about us network guys? Based on at least half of its name, you'd think there wouldn't be much for the average router jockey to care about, right?

 

There are a few reasons this attitude is wrong:


Breaking the silo mentality

We all have our niche, which I recently wrote about on my personal blog. In that blog, I discussed how important it is to choose a specialty to avoid the special kind of hell called “IT Generalist.” IT professionals can no longer afford to get caught up in the “that's not my area” mentality. Sure you have three routers and two core switches under your desk. But that doesn't mean you can't know or care what is running on your wires. AppStack lets you quickly familiarize yourself with how all the parts fit together and in turn you never have to attend a DevOps meeting…ever!


IoT

Monitor! All! The! Things! We know its coming. Wearable devices, warehouse geo tracking, real-time shipment data, and more. The Internet of Things is going to create pressure not just on applications, but the networks that all that data rides on. Having a tool like AppStack will allow you to discern the pressures being placed on the wire from those being placed on the application infrastructure.

 

Which leads us to...


MTTI

Standing for “mean time to innocence,” this is the speed with which you can prove it's NOT the network's fault. AppStack allows you to show the entire connection chain of an applicationfrom the customer-facing webserver to the back-end database, and even further back to the storage arraypinpointing the source of the problem. SolarWinds Lab Episode 25 provides another great example of what I’m talking aboutit’s that good (Jump to the 19:40 mark). In the case on the video, what are the odds that the network would have been blamed for slow data, instead of the real culpritan overworked RAID5 array connected via iSCSI to a demanding SQL server?

 

Back in 1989, the idea of software correlating insight from multiple systems was an enticing vision. It’s nice to know that if you stick around long enough, you get to see some of those visions become reality.

Perusing through my virtualization blog feeds, a post from Amy Manley (@wyrdgirl) on her blog site, Virtual Chick, entitled "Solarwinds AppStack with SRM Flavor and More!" caught my attention. Amy is a fellow VMware vExpert and a certified virtualization unicorn. She was also a delegate at Virtualization Field Day 4, so imagine how awesome it was to find out that she wrote about SolarWinds AppStack, especially after our launch announcement.

 

Amy walks through what the AppStack consists of, which is Server & Application Monitor, Storage Resource Monitor, Virtualization Manager, and Web Performance Monitor. She also covers what I think will be most useful view for IT adminsthe AppStack view. And then, she summarizes AppStack perfectly by stating that it is "trying to make it easier to troubleshoot with all the information in one place" and "maps out the underlying dependencies and infrastructure to help bring about swift resolution." AppStack definitely puts the application front and center with all the contextual connections into the stack layers.

 

The part that I'm most ecstatic about was that our AppStack demo at VFD4 was memorable and worthy enough for Amy to write about it and share with her network

There is much talk in the IT profession about automation. “Automate all the things” is written in some shape or fashion across a variety of blogs and social media platforms. I even briefly mentioned it in my last Geek Speak post about configuration management. You have read that already... Right?

 

5b3761867c14de76ab959f4b9ece9a3b51654222b93c92e7e02c4e80fad9da21.jpg

 

I get the movement. Why do everything manually, wasting your time on tedious, trivial tasks when you could be working on the newest design for your data center or something better? And even though I could probably consider myself a new age networking professional, there’s still one task I enjoy doing the old-fashioned way: network discovery.

 

Call me crazy, but the task of learning a network for the first time in my opinion is best done manually. There are so many nuances that could be lost if this process is done automatically. Dissecting a network, one device at a time, port by port truly allows the ability to intimately understand the complexities of that network. Here are some tips and tricks that I have learned along the way and also seen other networking professionals speak of when discovering a network for the very first time:

 

  • Start from the Core switch and work your way out (if you don’t know where the Core switch is, start with your default gateway and branch out)
  • Use information like CDP/LLDP, ARP, MAC-addresses, and routing tables to help you navigate
  • NEVER completely trust switch port descriptions or network diagrams. They are almost always not kept up with or updated regularly.
  • Draw out the network as you go using pencil and paper. You will continuously edit this diagram so using pen will hamper you and trying to input this into a program like Microsoft Visio while continuously making changes will make you scream.


What about you all? Do you prefer automated network discovery or would you rather do it manually? Have any tips for either method? I look forward to hearing from you all.

An IT debate has been brewing for years now: the old school IT management, the one built on experience and practice versus the new school IT management, the one built on policy based management and analytic engines that leverage vendor knowledge base and learning algorithms.

 

This debate is reminiscent of politics. IT is viewed as slow and inefficient bureaucracy by developers and end-users; and yet, IT needs to perform due diligence and impart rigor and processes for compliance, governance, and security. Policies are derived from best practices of well-understood problem-solutions, while experience hardens IT pros toward resolving issues quickly. Oh how IT weaves a twisted tale, since best practices come from IT experience and form the foundation of the policy and analytic engines. In the as-a-Service IT world, strategies are being built around automation and orchestration, which facilitates auto scaling and self-healing. And automation and orchestration implies heavy doses of policy based management actions.

 

Disruptive innovation is one of the primary drivers for the adoption of new school IT management. As the hardware stack converge towards the commodity limit, businesses view the software application as the layer where differentiation will be realized. This differentiation will come in the form of:

  1. Optimal data center utility;
  2. Optimal availability and scalability for applications;
  3. Maximize utility of IT professionals while minimizing human errors;
  4. Minimize app development and release times.

 

The real value of an IT pro is during the times of needwhen things break and systems misbehave, and how quickly the IT pro can perform root-cause analysis and bring things back to working order. The counterbalance to this is that there are specific breaks that always follow specific, well-known steps to remediate. These are the opportunities to automate and self-heal. There are also many tasks that may appear repetitive and that self-identify as automation candidates; but requires connected context of the stack and application. Otherwise, it’s a whole lot of wasted actions and time.

 

Where are you on this spectrum? Is your organization looking at fully automating IT management, hybridizing IT management between some automation & orchestration with some manual intervention, or completely manual? And are you being represented in the policy based IT management world?

I like cake. Most people do, I suppose. And that’s why I use cake as an analogy in my classes and presentations whenever I talk about users, data, and performance.

 

See, there are many, many layers of cake between users and their data. Some layers are thicker than others, some more delicious, and some more important. But together the layers make the cake complete. When done correctly, these layers form something that you want to consume more and more, just like a good cake should.

 

You’ve been hearing a lot about AppStack this month. Truthfully whenever I hear “AppStack” I just think of cake. It makes it easier for me to understand all the layers, and I think the analogy works for others as well. Why is that? It is because AppStack means different things to different people. Just look at the previous posts and you’ll get a sense for the differing points of view about what is important.

 

Today I want to talk about AppStack from the database administrator’s point of view. For many people a database is a black box; Queries go in, data comes out. It’s a mystery as to what happens inside that magical box. And it’s not just a database that’s a mystery either. Admit it; you have no idea what a DBA does for work all day long. Besides a lot of Twitter and Facebook activity we are often busy providing evidence as to why the database itself is not the bottleneck in whatever business process is being seen as “acting slow”.

 

Let’s break this down into some delicious layers of cake:

 

Storage layer

The backbone of any infrastructure, your data is stored somewhere (Earthed or Cloud) on something (traditional hard drives or flash drives). Data is read from, or written to, these storage devices. Often these devices are shared among many systems, making it difficult to know if your data requests are the cause, or victim, of performance issues. All of this disk activity ultimately needs to travel through the network, too.

 

Server layer

It is no longer uncommon for database servers to be virtualized. Unfortunately, virtualized database servers are often improperly configured, leading to performance issues and concerns. Improper configuration choices at the host and guest layers are often the root cause for what is perceived to be a database issue. Knowing where to look (database, guest, host, storage) first is important for any administrator.

 

Database Layer

The database engine itself is often blamed as the source of performance issues, but the reality is that the true bottleneck exists elsewhere. In many cases, issues directly related to the database engine are caused by configuration options selected upon installation of the engine. I’ve lost track of the number of times I’ve tracked down the root cause of a database issue to be the fact that it’s so easy to click “Next, Next, Next” during an install.

 

User Experience

The icing on our cake is, of course, the end-user. But let’s think about what is under that icing. The end-user is interacting with an application (may or may not be local to their PC), they send a request through the network, the application takes the request and it may (or may not) need to interact with a database, and the database may (or may not) need to interact with storage, and once the data is retrieved/written, back up the chain we go.

 

The end-user knows nothing of the series of tubes known as the internet. And what if, at the end of all this, the end-user needs to wait for a report to be rendered? They can be sitting there, watching the hourglass turn, silently muttering under their breath about how slow the database is behaving, when in fact the issue is a piece of reporting software that takes minutes to draw a pretty graph.

 

And that’s why AppStack is so important to someone like me. I get the opportunity to visually show someone where the bottleneck truly lies. This saves time and reduces frustration. And in the end it makes for a more stable infrastructure for which a business is able to leverage.

In my Linkedin Profile, I write that I’m a fan of “elaborate IT Metaphors” yet, in a very literal way, I’ve never actually written a list of my favorites.

 

Listing out my favorite IT metaphors, sayings, aphorisms and such is risky. Too much pithiness, and I risk not being taken seriously. Too much cynicism and no one wants to talk to you.

 

And yet I must take that risk, because if you’re a practitioner of the IT arts as I am, then you’re used to engaging in these sorts of thoughtful/silly/humorous reflections.

 

Or maybe you don't but will find humor and value in them nonetheless. Enjoy and if you like, add your own!

 

MetaphorMeaningUsed in SentenceOrigin/Notes
Dark side of the moonWaiting for a host or device to reply to pings after reload/rebootServer's gone dark side of the moon, standby.NASA obviously
Eat our own dogfoodApplying same policies/tech/experience to IT that apply to usersThat dogfood tastes pretty nasty and we've been dishing it out to our users for yearsNot sure but heard on TWiT
DNS is like a phonebookComputers speak in numbers humans speak in wordsLook, it works like a phonebook, alright? Do you remember those things?My own metaphor to explain DNS problems
Fat fingerA stupid mistake in perhaps an otherwise solid plan (eg IP address keyed incorrectly)Jeff totally fat fingered itFormer boss/Homer Simpson?
Go FQDN or Go HomeAdmonishment to correct lazy IT tendency to code/build with IP addresses rather than FQDNThere are to be no IP addresses listed on the support page. Go FQDN or Go Home, son.My own
Garbage in Garbage OutYou get out of a system that which you put inI can't make beautiful pivots with a GIGOed setUnknown but s/he was brilliant
Debutante in the DatacenterA server or service that is high-profile/important but prone to failure and drama without constant attentionHyperion is doing its debutante thing againHeard it from someone somewhere in time
Cadillac SolutionA high-priced solution to a problem that may require only ingenuity/dilligenceDon't come to me with a Cadillac solution when a Kia will doMy own but really…Cadillac…I'm so old
The Sun Never Sets on InfrastructureA reference to the 24/7 nature of Infrastructure stack demand by way of the British EmpireAnd the sun never sets on our XenApp farm, so schedule your maintenanceI used this metaphor extensively in last job
Infrastructure is Roads/Applications are cars/Users are driversReference to the classic split in IT departmentsSee hereFormer colleague
Two Houses Both Alike in DignityAnother reference to AppDev & Infrastructure divide in IT-My own liberal abuse of Shakespeare's opening line in R&J
Child Partition/Parent PartitionReference to me and my son in light of Hypervisor technologyChild partition is totally using up all my spare compute cyclesMy own
Code is poetryThere is something more to technology than just making things work/be an aristan technologistJust look at that script, this guy's a poet!Google but adapted by me for scripting and configs
Going full FibonacciThe joy & euphoria inherent in a well-designed subnetting or routing plan wherein octets and route summaries are harmonized & everything just fitsHe went full Fibonacci to save memory on the routerMy own abuse of the famedFibonacci Sequence which honestly has nothing to do with IP subnetting and more to do with Dan Brown. Also applies to MAC address pools because encoding your MAC address pools is fun
Dystopian ITDysfunctional IT departmentsI thought I was going to work in the future, not some dystopian nightmare IT group!Not sure
When I was a Child I thought as a ChildHow I defend poor technical decisions that haunt me years later-A (perhaps blasphemous) homage to St. Paul
There are three ____ and the greatest of these is ____Another St. Paul referenceAnd then there were three storage systems: file, block and object, and the greatest of these is fileUseful in IT Purchasing decisions
IT White WhaleHighly technical problems I haven't solved yet and obsess overI've been hunting this white whale for what seems like foreverBorrowed from Herman Melville's Moby Dick
Servers are cattle, not petsA pithy & memorable phrase to remind systems guys not to get attached to physical or virtual servers, to view them as cattle that are branded at birth, worked hard throughout life, then slaughtered without pomp or circumstance. No more Star Wars references as  server names, ok? It's a cow, not your pet labrador! The guys who built Chef
Drawer of TearsThe drawer in your desk/lab where failed ideas -represented by working parts- go. My drawer of tears is filled with Raspberry Pis/Surface RTs etc Yeah I tried that once, ended up in the drawer of tearsMy own

“Cyber terrorism could also become more attractive as the real and virtual worlds become more closely coupled, with automobiles, appliances, and other devices attached to the Internet.”  -- Dorothy Denning

 

In my recent posts I have laid bare the landscape of cyber security, some of the risks, and some of the solutions and possible solutions to the current, untenable state of our networks. One of the biggest risks to the consumer, to everyone really, I haven’t yet brought up. The so called Internet of Things, or the Internet of Everything, is going to bring a level of risk an order of magnitude greater than anything we have seen to this point.

 

The Internet of Things is a marketing term for sure, but what it represents conceptually is very real: the interconnection of everything from light bulbs, to refrigerators, to door locks. All of these things and many others, all connected together via our home and public networks, all potential risks. When everything is an entry point to the network, and every device we own is connected, imagine the chaos of a large scale hack, virus, or distributed denial of service attack.

 

We’re not there yet, of course. These things are just beginning to become connected. Devices like the Nest thermostat, the Phillips Hue light bulbs, and some of the remote door locks and security systems accessible from our phones are just the opening act, the beginning of a future where literally everything in our lives are connected. How does our standard model of perimeter security look now? Where exactly is the perimeter we’re defending?

 

I don’t have many of the answers, and I doubt a lot of other people do either. That’s not to say that there aren’t a lot of smarter people out there working on the problem, it’s just a very hard nut to crack. The security landscape is moving so quickly, the things we think of today are all but ineffectual tomorrow. All I know is that the more things we connect, and the more we depend on those things, the bigger the imperative is that we figure something out soon.

A couple of years ago at Cisco Live, I was chatting with another admin about the future of monitoring. It was relatively quiet in the expo hall and we’d shooed away a couple of Cisco PMs to play with the UCS configurator demo. Mike (not his real name) was primarily a network administrator, but was increasingly involved with very non-network engineering, like compute (in the guise of UCS), and storage performance, at least in the context of connective to VM clusters. He kept saying “App.. Stack,” with pejorative emphasis on “App.”  I thought at the time, and have only become more convinced since, that perhaps his discomfort with the term actually stems from its historical incompleteness.

 

Stack (of Application Components, Related Technologies, Infrastructure, and Services) SACRTIS?

 

You might be thinking, “AppStack. Great, yet another thing I have to wrap my head around. All it needs is a trademark.”  In past posts of this series, joeld Joel Dolisy's A Lap around AppStack described related SolarWinds product evolution, Lawrence Garvin 's A Look at Systems’ Role in the AppStack discussed systems considerations, and, most recently, kong.yang 's AppStack Addresses the Dynamic Changes in the Virtualization and Storage Layers both broke out the role of virtualization and storage and set the record for longest article title. I expect that Leon Adato and Thomas LaRock will have more to say in upcoming posts as well, especially on the subject of databases. So, SolarWinds has been on a bit of a tear about AppStack recently, and you might think they have a new product, perhaps with AppStack in the title.

 

Nope. You’d be wrong. You and Mike already know what it is and it’s not necessary to reinvent anything. I think it’s simply an effort to draw attention to the inescapable and perhaps even unpleasant truth that we can’t think of just “our part” of IT, “our part” of the network, or even “our part” of the help desk. To win at IT, or maybe even just to win back our weekends, we have to keep the extensive tendrils of all elements of applications in the front of our minds AT ALL TIMES.

 

Way back in history, between the discovery of fire and invention of the wheel, applications were simple affairs. They sat on a mainframe, micro, or PC, with local storage. Yes, remote access was developed from green screen, to fat client, and ultimately to Web. But in most cases, the app below a minimal veneer of presentation infrastructure was unchanged. It’s that App-at-the-top context that, even with shared storage and server virtualization, belies a modern restatement of the meaning of AppStack. It’s A to Z, user to spindle (or EFD flash controller), and everything in between. And, with the complexity of today’s applications, “everything” includes quite a pile of… let’s just say bits.

 

VPN, in My AppStack? More Likely Than You Think

 

With a broader definition of AppStack, I thought back to Mike’s solution for operating a hybrid-cloud hosted virtual classroom system. I white-boarded out the components we discussed, and then projected all the components we didn’t discuss. HTTP services for presentation and data tier- check. Storage and virtualization- check. Core, distribution, WAN, and firewall also check. But what about the VPN and firewall to the AWS VPC? Traditionally, that’s infrastructure, but from the context of AppStack, that’s a required component of his service.

 

What about security certificate expiration for the Web portal? Or, the integration of durable subscription endpoint services? Or, Exchange servers, which transmitted student notification emails and relayed student questions? None of those components seem like part of a traditional application, but are critical to Mike’s uptime. He should not only be monitoring all of these elements, but ideally connecting them via context(s) so that specialist admins across his teams may troubleshoot quickly. And therein lies the greater initiative we should lead as trusted administrators—education.

 

Don’t be afraid to open up the definition of AppStack in your environment, human, and silicon. Encourage your team, from desktop support to the LUN master, to sit in the user’s chair. List out every element of the connectivity chain you can think of. Go to the white board and discover or imply some more. Lastly, identify linking context. I think you’ll find with the inclusion of previously unmonitored services, the proactive AppStack approach can keep users happy and the help desk quiet. OK, quieter.

20150204_113539.jpg

At VMware Partner Exchange 2015 in the Cisco booth with Lauren Friedman (@lauren) and Mike Columbus (@mec829) for an Engineers Unplugged session.


Engineers Unplugged is a Cisco production and has had many wonderful engineers enlighten and delight the audience with technical knowledge with a unicorn flair. At the VMware PEX 2015, I was given an opportunity to join the ranks of the “I Drew the Unicorn” on Engineers Unplugged.

 

My fellow white-boarding engineer was Mike Columbus, a Cisco Consulting Systems Engineer. Our topic was simple: the elusive IT Management Unicorn. What exactly is the IT Management Unicorn? Well it’s one-part monitoring, two-parts troubleshooting, three-parts reporting with a horn for more bacon. Trust me it will be EPIC.

 

Next, SolarWinds Head Geeks, Patrick Hubbard  and Leon Adato are teaming up with Lauren on something fantastic for Cisco Live 2015. It has so much software-defined networking goodness. The SolarWinds Head Geeks and Cisco Engineers Unplugged crew will make magic when you see us at Cisco Live 2015. Register for Cisco Live 2015. And follow Cisco Live news on Twitter with #CLUS.

 

Finally, two more reasons to attend Cisco Live 2015:

“One hundred twenty countries currently have or are developing offensive cyber attack capabilities, which is now viewed as the fifth dimension of warfare after space, sea, land and air…” - Jamie Shea, NATO Director of Policy Planning

 

In cyber warfare, as in any kind of warfare, there are two types of players: those on offense and those on defense. Many countries have offensive capabilities, though none rival those of the United States. All we have to do to confirm this anecdotally is to review some of the past years’ headlines of hacking attacks against foreign nations, ostensibly by the NSA or NSA-affiliated entities. I would also encourage you to view the 2014 Data Breach Investigations Report compiled by Verizon for an extensive list of attack vectors from the past year.

 

While offensive capabilities are, if not easy to develop and execute, at least fairly ubiquitous, defense is another matter entirely. As a nation we are largely able to defend key strategic military assets, but not all of them. We are able to defend our financial sector somewhat, though not as successfully as with our military assets. Most private enterprises that fall into the “other” category are either undefended today, indefensible due to lack of knowledge, staffing, or willpower, or have already been compromised. We are not good at defense.

 

Two approaches to this are needed. The first is for entities like the NSA and private corporations to have the ability to share information (not a forced mandate) and react using the resources of each, while still maintaining privacy. Most large enterprises immediately call in the government once they’re aware of a hack, so why can’t the government work with them proactively to mitigate attacks in the early stages?

 

The second approach, taken by only the largest of companies, is to monitor everything on their networks at full wire speed (no small task), and then to feed that real-time information into a big-data engine. Run real time analytics using something like SAP Hana where the sheer volume of information can be analyzed in real time. This generates alerts based on real-time anomaly detection in a much more sophisticated way than any IDS/IPS ever could, but it’s still missing one piece: the ability to remediate in real time. This is one of the use cases for SDN—something that I may explore in another post.

 

What approaches are you taking today?  What approach would you take if you had an unlimited budget to work with? What other suggestions do you have beyond the things we’re doing today?

On Friday the 13th, Kaspersky, a Russian anti-malware and research firm, released a report documenting a significant campaign to infiltrate banks worldwide to steal hard cash.  Somewhere between 300M and 1 billion dollars are estimated to have been pilfered.

 

Attackers entered the banks systems using phishing lures and then compromised systems that had known vulnerabilities.  The actual monetary theft came from observing the banks processes and compromising users that had the right access to the banks financial applications.

 

This was an extensive and coordinated operation as the cybercriminals moved electronic money through SWIFT (an international interbank transfer program), and cash through reprogrammed ATMs – essentially turning your local ATM into a Las Vegas jackpot. Clearly creating so many fraudulent receiver accounts and spewing cash required an extensive money mule network.

 

Given that the actual theft involved deep understanding of the target banks audit and reconciliation procedures, and actual understanding of banking software, this was a well-researched and staged attack – the essence of an Advanced Persistent Threat (APT). So if a sophisticated, regulated organization like a bank is vulnerable are there any lessons for the rest of us?

 

Here are a few takeaways we can all apply in our own organizations.

 

1. Staff security awareness

 

Your staff is your front line infantry in the battle against cybercrime.  Even small organizations can put together a meaningful security awareness program with open source tools.

 

2. Backup and Patch

 

If you have a good backup program, you can be more aggressive about patching.  Depending on your organization size and how mission critical your systems are, backup – test – patch is a tried and true method for avoiding infections that do not use 0-days.

 

3. Monitor

 

Use your logging, log management and Patch management systems to find:

  • Systems that aren’t patched
  • The same user ID logged into a critical systems simultaneously (especially from different IP addresses)
  • Critical Application anomalies – high rate of transactions, more logins than usual, low and slow e.g short bursts of application activity over a long period of time.
  • Suspicious software installations.
  • System file changes
  • Suspicious out bound network traffic via the firewall i.e. FTP, SMTP and other protocols that facilitate file or data transfer.

 

For more information see:

Kaspersky report of Carbanak APT

https://blog.kaspersky.com/billion-dollar-apt-carbanak/

 

Free Security Awareness

http://mindfulsecurity.com/

http://www.securingthehuman.org/blog/category/community-resources

http://www.dhs.gov/stopthinkconnect

"Do you think the guys running Azure or AWS care if a server gets rebooted in the middle of the day?" I asked the Help Desk analyst when he protested my decision to reboot a VM just before lunch.

 

"Well, uhh. No. But we're not Azure," He replied.

 

"No we're not. But we're closer today than we have ever been before. Also, I don't like working evenings." I responded as I restarted the VM.

 

The help desk guy was startled, with more than a little fear in his voice, but I reassured him I'd take the blame if his queue was flooded with upset user calls.

 

Such are the battles one has to fight in IT Environments that are stuck in what I call the Old Ways of IT. If you're in IT, you know the Old Ways because you grew up with them like I did, or because you're still stuck in them and you know of no other way.

 

The Old Ways of doing IT go something like this:

 

  • User 1 & User 2 call to complain that Feature A is broken
  • Help desk guy dutifully notes feature A is busted, escalates to Server Guy
  • Server Guy notices Feature A is broken on Server A tied to IP Address 192.168.200.35, which is how User 1 & User 2 access Feature A
  • Server Guy throws up his hands, says he can't fix Server A without a Reboot on Evening 1
  • Help Desk guy tells the user nothing can be done until Evening 1
  • User1 & User 2 hang up, disappointed
  • Server Guy fixes problem that evening by rebooting Server A

 

I don't know about you, but working in environments stuck in the Old Ways of IT really sucks. Do you like working evenings & weekends? I sure don't. My evenings & weekends are dedicated to rearing the Child Partition and hanging out with the Family Cluster, not fixing broke old servers tied to RFC-1918 IP addresses.

 

As the VM rebooted, my help desk guy braced himself for a flood of calls. I was tempted to get all paternalistic with him, but I sat there, silent. 90 seconds went by, the VM came back online. The queue didn't fill up; the help desk guy looked at me a bit startled. "What?!? How did you...but you rebooted...I don't understand."

 

That's when I went to the whiteboard in our little work area. I wanted to impart The New Way of Doing IT upon him and his team while the benefits of the New Way were fresh in their mind.

 

"Last week, I pushed out a group policy that updated the url of Feature A on Service 1. Instead of our users accessing Service 1 via IP Address 192.168.200.35, they now access the load-balanced FQDN of that service. Beneath the FQDN are our four servers and their old IP addresses," I continued, drawing little arrows to the servers.

 

"Because the load balancer is hosting the name, we can reboot servers beneath it at will," the help desk guy said, a smile spreading across his face. "The load balancer maintains the user's session...wow." he continued.

 

"Exactly. Now  you know why I always nag you to use FQDN rather than IP address. I never want to hear you give out an IP address over the phone again, ok?"

 

"Ok," he said, a big smile on his face.

 

I returned to automating & building out The Stack, getting it closer to Azure or AWS.

 

The help desk guy went back to his queue, but with something of a bounce in his step. He must have realized -the same way I realized it some years back- that the New Way of IT offered so much more the the Old Way. Instead of spending the next 90 minutes putting out fires with users, he could invest in himself and his career and study up a bit more on load balancers. Instead of rebooting the VM that evening (as I would have had him do it), he could spend that evening doing whatever he liked.

 

As cliche as it sounds, the new way of IT is about working smarter, not harder, and I think my help desk guy finally understood it that day.

 

A week or two later, I caught my converted help desk guy correcting one of his colleagues. "No, we never hand out the IP address, only the FQDN."

 

Excellent.

You sit down at your desk. It's 9:10AM and your coffee is still warm. There is a smell of bacon in the air.

 

Suddenly your phone rings. The trading system is down. The time for quick thinking is now.

 

Where would you begin to troubleshoot this scenario?

 

A lot of people will reach for the biggest hammer they can find: a tool that will trace all activity as it hits the database instance. For SQL Server, that tool is typically SQL Profiler.

 

The trouble here is this: you are in a reactive mode right now. You have no idea as to the root cause of the issue. Therefore, you will configure your trace to capture as many details as possible. This is your reaction to make certain that when the time comes you are prepared to do as thorough a forensics job as possible in the hope that you can fix the issue in the shortest amount of time.

 

And this method of performance monitoring and troubleshooting is the least efficient way to get the job done.

 

When it comes to performance monitoring and troubleshooting you have two options: tracing or polling.

 

Tracing will track details and capture events as they happen. In an ironic twist this method can interfere with the performance of the queries you are trying to measure. Examples of tools that use the tracing method for SQL Server are Extended Events and SQL Profiler.

 

Polling, however, is also known by another name: sampling. A tool that utilizes polling will gather performance data at regular intervals. This is considered a light-weight option to tracing. Examples of tools that use this method are Performance Monitor (by default it samples once per second) and 3rd party applications like Database Performance Analyzer that query dynamic management objects (which are system views known as DMVs in SQL Server, and x$ and v$ objects in Oracle).

 

See, here's the secret about performance monitoring and troubleshooting that most people don't understand: when it comes to gathering performance metrics it's not the what you gather as much as it is the how you gather.


Knowing what to measure is an easy task. It really is. You can find lots of information on the series of tubes known as the internet that will list out all the metrics an administrator would want. Database size, free disk space, CPU utilization, page life expectancy, buffer cache hit ratio, etc. The list of available metrics seems endless and often overwhelming. Some of that information is even useful; a lot of it can be just noise, depending on the problem you are trying to solve.


So which method is right for you?


Both, actually.


Let me explain.


Think of a surgeon that needs to operate on a patient. There's a good chance that before the surgeon cuts into healthy skin they will take an X-ray of the area. Once they examine the X-ray, they know more about what they need to do when they operate.

Polling tools are similar to X-rays. They help you understand more about what areas you need to investigate further.  Then, when you need to take that deeper dive, that's where you are likely to use a tracing tool in order to return only the necessary information needed to solve the problem, and only for the shortest possible duration.

 

I find that many junior administrators (and developers with novice database performance troubleshooting skills) tend to rely on tracing tools for even the most routine tasks that can be done with a few simple queries against a DMV or two. I do my best to educate when I can, but it often is an uphill battle. I lost track of the number of times I've been thrown under the bus by someone saying that they can't fix an issue because I won't let them run Profiler against a production box as a first attempt at figuring out what’s going on. Rather than make people choose between one tool or the other I do my best to explain how they work well together.

 

I never use a tracing tool as a first option for performance monitoring and troubleshooting. I rely on polling in order to help me understand where to go next. Sometimes that next step requires a trace but often times I'm able to help make positive performance improvements without ever needing to run a trace. Then again, I'm lucky that I have some really good tools to use for monitoring database servers, even ones that are running on VMWare, or Amazon RDS, or Microsoft Azure.

 

There's a lot for anyone to learn as an administrator, and it can be overwhelming for anyone, new or experienced.

 

So it wouldn't hurt to double check how you are currently monitoring right now, to make certain you are grabbing the right things and at the right frequency.

I’ll be honest, when I initially saw the words configuration management, I only thought of managing device configurations. You know, things like keeping backup copies of configurations in case a device bit the bucket. However, the longer I’ve been in the IT field, the more I’ve learned how short-sighted I was in relation to what configuration management truly meant. Hopefully, by the end of this post, you will either nod and agree or thank me for opening your eyes to an aspect of IT that is typically misunderstood or severely neglected.

 

There are several components of configuration management that you, as an IT professional should be aware of:

 

  • Device hardware and software inventory
  • Software management
  • Configuration backup, viewing, archiving, and comparison
  • Detection and alerting of changes to configuration, hardware, or software
  • Configuration change management

 

Let’s briefly go over some of these and why they are so integral to maintaining a healthy network.

 

Most (hopefully all) IT teams keep an inventory of hardware and software that they support. This is imperative for things like service contract renewals and support calls. But, how you keep track of this information usually calls for question. Are you manually keeping track of this information using Excel spreadsheets or something similar? I would agree that it works, but in a world so hellbent on automation, why risk human error? What if you forget to add a device and it goes unnoticed? Wouldn’t it be easier to have software that automatically performs an inventory of all your devices?

 

One of my favorite components of configuration management is configuration backup and the ability to view those backups as well as compare them to previous backups. If your Core switch were to fail today, right now, are you prepared to replace it? I’m not talking about calling your vendor’s support to have them ship out a replacement. I’m talking about rebuilding that new shiny piece of hardware to its predecessor’s last working state. If you have backups, that process is made easy. Grab the latest backup and slap it on the new device when it arrives. This will drastically cut down the recovery time in a failure scenario. Need to know what’s changed between the current configuration and 6 months ago for audit purposes? Having those backups and a mechanism for comparing them goes a long way.

 

There are a number of ways to know when an intruder’s been in your network. One of those methods is through the detection and alerting of changes made to your devices. If you don’t have something in place that can detect these changes in real-time, you’ll be in the dark in more ways than one. How about if a co-worker made an “innocent” change before going on vacation that starts to rear its ugly head? Being able to easily generate real-time alerts or reports will help pinpoint the changes and get your system purring like a kitten once again.

 

In conclusion, configuration management is not just about keeping backups of your devices on hand. It involves keeping inventories of those devices as well as being able to view, archive, and compare their configurations. It also includes being able to easily detect and alert on changes made to your devices for events like catching network intruders. Are you practicing good configuration management techniques?

Whether it be at work or in my personal life, I like to plan ahead and be readily prepared. Specifically, when it comes to allocating storage, you definitely need to strategically plan your allocation. This is where Thin Provisioning comes inorganizations can adopt this strategy to avoid costly problems and increase storage efficiency.


Efficiently optimizing available space in storage area networks (SAN) is known as Thin Provisioning. Thin Provisioning allocates disk storage space between multiple users based on the requirement by each user at a given time.


Days before Thin Provisioning:           

Traditionally, admins allocated additional storage beyond their current needanticipating future growth. In turn, admins would have a significant amount of unused space, directly resulting in a loss on capital spent on purchasing disks and storage arrays.


Applications require storage to function properly. In traditional provisioning, a Logical Unit Number (LUN) is created and each application is assigned to a LUN. Creating a LUN with the traditional method meant a portion of empty physical storage space from the array is allocated. For the application to operate, space is then allocated to the application. At first, the application will not occupy the whole storage space allocated, however gradually the storage space will be utilized. 


How Thin Provisioning Works:

In Thin Provisioning, a LUN is created from a common pool of storage. A LUN in Thin Provisioning will be larger than the actual physical storage assigned to the LUN. For example, if an application needs 50GB of storage to start working.  50GB of virtual storage is assigned to the application so that the application can become operational. The application uses LUN in the normal procedure. Initially, the assigned LUN will only have a portion of the actual needed storage (say 15GB) and the rest35GB will be virtual storage. As the actual utilization of storage grows, additional storage is automatically taken from the common pool of physical storage. The user can then add more physical storage (based on requirement) without disturbing the application or altering the LUN. This helps admins eliminate the initial physical disc capacity that goes unused.

dig1.png

A use case:

Consider an organization that has 3 servers running different applications—a database, a file processing system, and email. All these applications need storage space to work and the organization has to consider storage space for future growth.


While using traditional provisioning, and say each application needs 1 TB each to operate. But out of 1 TB only 250 GB (25 %) will be used initially and the rest will be utilized gradually. With the whole 3 TB already allocated to the existing 3 applications, what happens if you need a new server/application in the organization? In this case, you will need more storage and unfortunately, it won’t be cheapyou will need to search for budget.

dig2.png

Now let’s look to see how thin provisioning can help with the aforementioned situation.  For example, in this scenario each server/application is provided with a virtual storage of 1 TB, but the actual storage space provided is just 250 GB. The space from the storage is only allocated when needed. When a new server/application is added, you can assign 250 GB from the physical storage space, but the server/application will have a total 1 TB of virtual storage. The organization can add the new server/application without purchasing additional storage. Also, increase the physical storage as a whole when needed.

dig3.png

Thin provisioning has 2 advantages in this use case:

  • Adding a new server/application is no longer an issue.
  • Avoid provisioning a total of 3 TB while setting up the servers. The organization can start with 1 TB and add more storage space as and when needed without disturbing the setup. 


When to use thin provisioning:

This type of provisioning used is more related to the use case and not technology. Thin provisioning is beneficial in the following situations:

  • Where the amount of resources used is much smaller than allocated.
  • When the administrator is not sure about the size of the database that needs to be allocated.
  • Situations when more servers/ applications often get added to the network.
  • When the organization wants to reduce the initial investment. 
  • When the DB administrator wants get maximum utilization from their storage.


Thin provisioning is not the silver bullet in the virtualization world. It too has its limitations. For example:

  • In regards to performance, this becomes a major factorthin provisioned storage becomes fragmented very easily, in turn decreasing performance.
  • Storage over allocationthe actual storage during thin provisioning can result in over allocation. Further, any write operation can bring a terrible failure (which cannot be repaired) on one or several drives.


Even though thin provisioning has drawbacks all these can be overcome by continuous storage monitoring. Now what you need to do is transform your ‘fat’ volumes to thin ones. But there are issues that can arise while doing so. Have you experienced any issues while moving your storage? If so, how did you resolve your issues?

Filter Blog

By date:
By tag: