1 2 3 4 Previous Next

Geek Speak

1,476 posts

I like cake. Most people do, I suppose. And that’s why I use cake as an analogy in my classes and presentations whenever I talk about users, data, and performance.

 

See, there are many, many layers of cake between users and their data. Some layers are thicker than others, some more delicious, and some more important. But together the layers make the cake complete. When done correctly, these layers form something that you want to consume more and more, just like a good cake should.

 

You’ve been hearing a lot about AppStack this month. Truthfully whenever I hear “AppStack” I just think of cake. It makes it easier for me to understand all the layers, and I think the analogy works for others as well. Why is that? It is because AppStack means different things to different people. Just look at the previous posts and you’ll get a sense for the differing points of view about what is important.

 

Today I want to talk about AppStack from the database administrator’s point of view. For many people a database is a black box; Queries go in, data comes out. It’s a mystery as to what happens inside that magical box. And it’s not just a database that’s a mystery either. Admit it; you have no idea what a DBA does for work all day long. Besides a lot of Twitter and Facebook activity we are often busy providing evidence as to why the database itself is not the bottleneck in whatever business process is being seen as “acting slow”.

 

Let’s break this down into some delicious layers of cake:

 

Storage layer

The backbone of any infrastructure, your data is stored somewhere (Earthed or Cloud) on something (traditional hard drives or flash drives). Data is read from, or written to, these storage devices. Often these devices are shared among many systems, making it difficult to know if your data requests are the cause, or victim, of performance issues. All of this disk activity ultimately needs to travel through the network, too.

 

Server layer

It is no longer uncommon for database servers to be virtualized. Unfortunately, virtualized database servers are often improperly configured, leading to performance issues and concerns. Improper configuration choices at the host and guest layers are often the root cause for what is perceived to be a database issue. Knowing where to look (database, guest, host, storage) first is important for any administrator.

 

Database Layer

The database engine itself is often blamed as the source of performance issues, but the reality is that the true bottleneck exists elsewhere. In many cases, issues directly related to the database engine are caused by configuration options selected upon installation of the engine. I’ve lost track of the number of times I’ve tracked down the root cause of a database issue to be the fact that it’s so easy to click “Next, Next, Next” during an install.

 

User Experience

The icing on our cake is, of course, the end-user. But let’s think about what is under that icing. The end-user is interacting with an application (may or may not be local to their PC), they send a request through the network, the application takes the request and it may (or may not) need to interact with a database, and the database may (or may not) need to interact with storage, and once the data is retrieved/written, back up the chain we go.

 

The end-user knows nothing of the series of tubes known as the internet. And what if, at the end of all this, the end-user needs to wait for a report to be rendered? They can be sitting there, watching the hourglass turn, silently muttering under their breath about how slow the database is behaving, when in fact the issue is a piece of reporting software that takes minutes to draw a pretty graph.

 

And that’s why AppStack is so important to someone like me. I get the opportunity to visually show someone where the bottleneck truly lies. This saves time and reduces frustration. And in the end it makes for a more stable infrastructure for which a business is able to leverage.

In my Linkedin Profile, I write that I’m a fan of “elaborate IT Metaphors” yet, in a very literal way, I’ve never actually written a list of my favorites.

 

Listing out my favorite IT metaphors, sayings, aphorisms and such is risky. Too much pithiness, and I risk not being taken seriously. Too much cynicism and no one wants to talk to you.

 

And yet I must take that risk, because if you’re a practitioner of the IT arts as I am, then you’re used to engaging in these sorts of thoughtful/silly/humorous reflections.

 

Or maybe you don't but will find humor and value in them nonetheless. Enjoy and if you like, add your own!

 

MetaphorMeaningUsed in SentenceOrigin/Notes
Dark side of the moonWaiting for a host or device to reply to pings after reload/rebootServer's gone dark side of the moon, standby.NASA obviously
Eat our own dogfoodApplying same policies/tech/experience to IT that apply to usersThat dogfood tastes pretty nasty and we've been dishing it out to our users for yearsNot sure but heard on TWiT
DNS is like a phonebookComputers speak in numbers humans speak in wordsLook, it works like a phonebook, alright? Do you remember those things?My own metaphor to explain DNS problems
Fat fingerA stupid mistake in perhaps an otherwise solid plan (eg IP address keyed incorrectly)Jeff totally fat fingered itFormer boss/Homer Simpson?
Go FQDN or Go HomeAdmonishment to correct lazy IT tendency to code/build with IP addresses rather than FQDNThere are to be no IP addresses listed on the support page. Go FQDN or Go Home, son.My own
Garbage in Garbage OutYou get out of a system that which you put inI can't make beautiful pivots with a GIGOed setUnknown but s/he was brilliant
Debutante in the DatacenterA server or service that is high-profile/important but prone to failure and drama without constant attentionHyperion is doing its debutante thing againHeard it from someone somewhere in time
Cadillac SolutionA high-priced solution to a problem that may require only ingenuity/dilligenceDon't come to me with a Cadillac solution when a Kia will doMy own but really…Cadillac…I'm so old
The Sun Never Sets on InfrastructureA reference to the 24/7 nature of Infrastructure stack demand by way of the British EmpireAnd the sun never sets on our XenApp farm, so schedule your maintenanceI used this metaphor extensively in last job
Infrastructure is Roads/Applications are cars/Users are driversReference to the classic split in IT departmentsSee hereFormer colleague
Two Houses Both Alike in DignityAnother reference to AppDev & Infrastructure divide in IT-My own liberal abuse of Shakespeare's opening line in R&J
Child Partition/Parent PartitionReference to me and my son in light of Hypervisor technologyChild partition is totally using up all my spare compute cyclesMy own
Code is poetryThere is something more to technology than just making things work/be an aristan technologistJust look at that script, this guy's a poet!Google but adapted by me for scripting and configs
Going full FibonacciThe joy & euphoria inherent in a well-designed subnetting or routing plan wherein octets and route summaries are harmonized & everything just fitsHe went full Fibonacci to save memory on the routerMy own abuse of the famedFibonacci Sequence which honestly has nothing to do with IP subnetting and more to do with Dan Brown. Also applies to MAC address pools because encoding your MAC address pools is fun
Dystopian ITDysfunctional IT departmentsI thought I was going to work in the future, not some dystopian nightmare IT group!Not sure
When I was a Child I thought as a ChildHow I defend poor technical decisions that haunt me years later-A (perhaps blasphemous) homage to St. Paul
There are three ____ and the greatest of these is ____Another St. Paul referenceAnd then there were three storage systems: file, block and object, and the greatest of these is fileUseful in IT Purchasing decisions
IT White WhaleHighly technical problems I haven't solved yet and obsess overI've been hunting this white whale for what seems like foreverBorrowed from Herman Melville's Moby Dick
Servers are cattle, not petsA pithy & memorable phrase to remind systems guys not to get attached to physical or virtual servers, to view them as cattle that are branded at birth, worked hard throughout life, then slaughtered without pomp or circumstance. No more Star Wars references as  server names, ok? It's a cow, not your pet labrador! The guys who built Chef
Drawer of TearsThe drawer in your desk/lab where failed ideas -represented by working parts- go. My drawer of tears is filled with Raspberry Pis/Surface RTs etc Yeah I tried that once, ended up in the drawer of tearsMy own

“Cyber terrorism could also become more attractive as the real and virtual worlds become more closely coupled, with automobiles, appliances, and other devices attached to the Internet.”  -- Dorothy Denning

 

In my recent posts I have laid bare the landscape of cyber security, some of the risks, and some of the solutions and possible solutions to the current, untenable state of our networks. One of the biggest risks to the consumer, to everyone really, I haven’t yet brought up. The so called Internet of Things, or the Internet of Everything, is going to bring a level of risk an order of magnitude greater than anything we have seen to this point.

 

The Internet of Things is a marketing term for sure, but what it represents conceptually is very real: the interconnection of everything from light bulbs, to refrigerators, to door locks. All of these things and many others, all connected together via our home and public networks, all potential risks. When everything is an entry point to the network, and every device we own is connected, imagine the chaos of a large scale hack, virus, or distributed denial of service attack.

 

We’re not there yet, of course. These things are just beginning to become connected. Devices like the Nest thermostat, the Phillips Hue light bulbs, and some of the remote door locks and security systems accessible from our phones are just the opening act, the beginning of a future where literally everything in our lives are connected. How does our standard model of perimeter security look now? Where exactly is the perimeter we’re defending?

 

I don’t have many of the answers, and I doubt a lot of other people do either. That’s not to say that there aren’t a lot of smarter people out there working on the problem, it’s just a very hard nut to crack. The security landscape is moving so quickly, the things we think of today are all but ineffectual tomorrow. All I know is that the more things we connect, and the more we depend on those things, the bigger the imperative is that we figure something out soon.

A couple of years ago at Cisco Live, I was chatting with another admin about the future of monitoring. It was relatively quiet in the expo hall and we’d shooed away a couple of Cisco PMs to play with the UCS configurator demo. Mike (not his real name) was primarily a network administrator, but was increasingly involved with very non-network engineering, like compute (in the guise of UCS), and storage performance, at least in the context of connective to VM clusters. He kept saying “App.. Stack,” with pejorative emphasis on “App.”  I thought at the time, and have only become more convinced since, that perhaps his discomfort with the term actually stems from its historical incompleteness.

 

Stack (of Application Components, Related Technologies, Infrastructure, and Services) SACRTIS?

 

You might be thinking, “AppStack. Great, yet another thing I have to wrap my head around. All it needs is a trademark.”  In past posts of this series, joeld Joel Dolisy's A Lap around AppStack described related SolarWinds product evolution, LGarvin 's A Look at Systems’ Role in the AppStack discussed systems considerations, and, most recently, kong.yang 's AppStack Addresses the Dynamic Changes in the Virtualization and Storage Layers both broke out the role of virtualization and storage and set the record for longest article title. I expect that Leon Adato and Thomas LaRock will have more to say in upcoming posts as well, especially on the subject of databases. So, SolarWinds has been on a bit of a tear about AppStack recently, and you might think they have a new product, perhaps with AppStack in the title.

 

Nope. You’d be wrong. You and Mike already know what it is and it’s not necessary to reinvent anything. I think it’s simply an effort to draw attention to the inescapable and perhaps even unpleasant truth that we can’t think of just “our part” of IT, “our part” of the network, or even “our part” of the help desk. To win at IT, or maybe even just to win back our weekends, we have to keep the extensive tendrils of all elements of applications in the front of our minds AT ALL TIMES.

 

Way back in history, between the discovery of fire and invention of the wheel, applications were simple affairs. They sat on a mainframe, micro, or PC, with local storage. Yes, remote access was developed from green screen, to fat client, and ultimately to Web. But in most cases, the app below a minimal veneer of presentation infrastructure was unchanged. It’s that App-at-the-top context that, even with shared storage and server virtualization, belies a modern restatement of the meaning of AppStack. It’s A to Z, user to spindle (or EFD flash controller), and everything in between. And, with the complexity of today’s applications, “everything” includes quite a pile of… let’s just say bits.

 

VPN, in My AppStack? More Likely Than You Think

 

With a broader definition of AppStack, I thought back to Mike’s solution for operating a hybrid-cloud hosted virtual classroom system. I white-boarded out the components we discussed, and then projected all the components we didn’t discuss. HTTP services for presentation and data tier- check. Storage and virtualization- check. Core, distribution, WAN, and firewall also check. But what about the VPN and firewall to the AWS VPC? Traditionally, that’s infrastructure, but from the context of AppStack, that’s a required component of his service.

 

What about security certificate expiration for the Web portal? Or, the integration of durable subscription endpoint services? Or, Exchange servers, which transmitted student notification emails and relayed student questions? None of those components seem like part of a traditional application, but are critical to Mike’s uptime. He should not only be monitoring all of these elements, but ideally connecting them via context(s) so that specialist admins across his teams may troubleshoot quickly. And therein lies the greater initiative we should lead as trusted administrators—education.

 

Don’t be afraid to open up the definition of AppStack in your environment, human, and silicon. Encourage your team, from desktop support to the LUN master, to sit in the user’s chair. List out every element of the connectivity chain you can think of. Go to the white board and discover or imply some more. Lastly, identify linking context. I think you’ll find with the inclusion of previously unmonitored services, the proactive AppStack approach can keep users happy and the help desk quiet. OK, quieter.

20150204_113539.jpg

At VMware Partner Exchange 2015 in the Cisco booth with Lauren Friedman (@lauren) and Mike Columbus (@mec829) for an Engineers Unplugged session.


Engineers Unplugged is a Cisco production and has had many wonderful engineers enlighten and delight the audience with technical knowledge with a unicorn flair. At the VMware PEX 2015, I was given an opportunity to join the ranks of the “I Drew the Unicorn” on Engineers Unplugged.

 

My fellow white-boarding engineer was Mike Columbus, a Cisco Consulting Systems Engineer. Our topic was simple: the elusive IT Management Unicorn. What exactly is the IT Management Unicorn? Well it’s one-part monitoring, two-parts troubleshooting, three-parts reporting with a horn for more bacon. Trust me it will be EPIC.

 

Next, SolarWinds Head Geeks, patrick.hubbard  and adatole are teaming up with Lauren on something fantastic for Cisco Live 2015. It has so much software-defined networking goodness. The SolarWinds Head Geeks and Cisco Engineers Unplugged crew will make magic when you see us at Cisco Live 2015. Register for Cisco Live 2015. And follow Cisco Live news on Twitter with #CLUS.

 

Finally, two more reasons to attend Cisco Live 2015:

“One hundred twenty countries currently have or are developing offensive cyber attack capabilities, which is now viewed as the fifth dimension of warfare after space, sea, land and air…” - Jamie Shea, NATO Director of Policy Planning

 

In cyber warfare, as in any kind of warfare, there are two types of players: those on offense and those on defense. Many countries have offensive capabilities, though none rival those of the United States. All we have to do to confirm this anecdotally is to review some of the past years’ headlines of hacking attacks against foreign nations, ostensibly by the NSA or NSA-affiliated entities. I would also encourage you to view the 2014 Data Breach Investigations Report compiled by Verizon for an extensive list of attack vectors from the past year.

 

While offensive capabilities are, if not easy to develop and execute, at least fairly ubiquitous, defense is another matter entirely. As a nation we are largely able to defend key strategic military assets, but not all of them. We are able to defend our financial sector somewhat, though not as successfully as with our military assets. Most private enterprises that fall into the “other” category are either undefended today, indefensible due to lack of knowledge, staffing, or willpower, or have already been compromised. We are not good at defense.

 

Two approaches to this are needed. The first is for entities like the NSA and private corporations to have the ability to share information (not a forced mandate) and react using the resources of each, while still maintaining privacy. Most large enterprises immediately call in the government once they’re aware of a hack, so why can’t the government work with them proactively to mitigate attacks in the early stages?

 

The second approach, taken by only the largest of companies, is to monitor everything on their networks at full wire speed (no small task), and then to feed that real-time information into a big-data engine. Run real time analytics using something like SAP Hana where the sheer volume of information can be analyzed in real time. This generates alerts based on real-time anomaly detection in a much more sophisticated way than any IDS/IPS ever could, but it’s still missing one piece: the ability to remediate in real time. This is one of the use cases for SDN—something that I may explore in another post.

 

What approaches are you taking today?  What approach would you take if you had an unlimited budget to work with? What other suggestions do you have beyond the things we’re doing today?

On Friday the 13th, Kaspersky, a Russian anti-malware and research firm, released a report documenting a significant campaign to infiltrate banks worldwide to steal hard cash.  Somewhere between 300M and 1 billion dollars are estimated to have been pilfered.

 

Attackers entered the banks systems using phishing lures and then compromised systems that had known vulnerabilities.  The actual monetary theft came from observing the banks processes and compromising users that had the right access to the banks financial applications.

 

This was an extensive and coordinated operation as the cybercriminals moved electronic money through SWIFT (an international interbank transfer program), and cash through reprogrammed ATMs – essentially turning your local ATM into a Las Vegas jackpot. Clearly creating so many fraudulent receiver accounts and spewing cash required an extensive money mule network.

 

Given that the actual theft involved deep understanding of the target banks audit and reconciliation procedures, and actual understanding of banking software, this was a well-researched and staged attack – the essence of an Advanced Persistent Threat (APT). So if a sophisticated, regulated organization like a bank is vulnerable are there any lessons for the rest of us?

 

Here are a few takeaways we can all apply in our own organizations.

 

1. Staff security awareness

 

Your staff is your front line infantry in the battle against cybercrime.  Even small organizations can put together a meaningful security awareness program with open source tools.

 

2. Backup and Patch

 

If you have a good backup program, you can be more aggressive about patching.  Depending on your organization size and how mission critical your systems are, backup – test – patch is a tried and true method for avoiding infections that do not use 0-days.

 

3. Monitor

 

Use your logging, log management and Patch management systems to find:

  • Systems that aren’t patched
  • The same user ID logged into a critical systems simultaneously (especially from different IP addresses)
  • Critical Application anomalies – high rate of transactions, more logins than usual, low and slow e.g short bursts of application activity over a long period of time.
  • Suspicious software installations.
  • System file changes
  • Suspicious out bound network traffic via the firewall i.e. FTP, SMTP and other protocols that facilitate file or data transfer.

 

For more information see:

Kaspersky report of Carbanak APT

https://blog.kaspersky.com/billion-dollar-apt-carbanak/

 

Free Security Awareness

http://mindfulsecurity.com/

http://www.securingthehuman.org/blog/category/community-resources

http://www.dhs.gov/stopthinkconnect

"Do you think the guys running Azure or AWS care if a server gets rebooted in the middle of the day?" I asked the Help Desk analyst when he protested my decision to reboot a VM just before lunch.

 

"Well, uhh. No. But we're not Azure," He replied.

 

"No we're not. But we're closer today than we have ever been before. Also, I don't like working evenings." I responded as I restarted the VM.

 

The help desk guy was startled, with more than a little fear in his voice, but I reassured him I'd take the blame if his queue was flooded with upset user calls.

 

Such are the battles one has to fight in IT Environments that are stuck in what I call the Old Ways of IT. If you're in IT, you know the Old Ways because you grew up with them like I did, or because you're still stuck in them and you know of no other way.

 

The Old Ways of doing IT go something like this:

 

  • User 1 & User 2 call to complain that Feature A is broken
  • Help desk guy dutifully notes feature A is busted, escalates to Server Guy
  • Server Guy notices Feature A is broken on Server A tied to IP Address 192.168.200.35, which is how User 1 & User 2 access Feature A
  • Server Guy throws up his hands, says he can't fix Server A without a Reboot on Evening 1
  • Help Desk guy tells the user nothing can be done until Evening 1
  • User1 & User 2 hang up, disappointed
  • Server Guy fixes problem that evening by rebooting Server A

 

I don't know about you, but working in environments stuck in the Old Ways of IT really sucks. Do you like working evenings & weekends? I sure don't. My evenings & weekends are dedicated to rearing the Child Partition and hanging out with the Family Cluster, not fixing broke old servers tied to RFC-1918 IP addresses.

 

As the VM rebooted, my help desk guy braced himself for a flood of calls. I was tempted to get all paternalistic with him, but I sat there, silent. 90 seconds went by, the VM came back online. The queue didn't fill up; the help desk guy looked at me a bit startled. "What?!? How did you...but you rebooted...I don't understand."

 

That's when I went to the whiteboard in our little work area. I wanted to impart The New Way of Doing IT upon him and his team while the benefits of the New Way were fresh in their mind.

 

"Last week, I pushed out a group policy that updated the url of Feature A on Service 1. Instead of our users accessing Service 1 via IP Address 192.168.200.35, they now access the load-balanced FQDN of that service. Beneath the FQDN are our four servers and their old IP addresses," I continued, drawing little arrows to the servers.

 

"Because the load balancer is hosting the name, we can reboot servers beneath it at will," the help desk guy said, a smile spreading across his face. "The load balancer maintains the user's session...wow." he continued.

 

"Exactly. Now  you know why I always nag you to use FQDN rather than IP address. I never want to hear you give out an IP address over the phone again, ok?"

 

"Ok," he said, a big smile on his face.

 

I returned to automating & building out The Stack, getting it closer to Azure or AWS.

 

The help desk guy went back to his queue, but with something of a bounce in his step. He must have realized -the same way I realized it some years back- that the New Way of IT offered so much more the the Old Way. Instead of spending the next 90 minutes putting out fires with users, he could invest in himself and his career and study up a bit more on load balancers. Instead of rebooting the VM that evening (as I would have had him do it), he could spend that evening doing whatever he liked.

 

As cliche as it sounds, the new way of IT is about working smarter, not harder, and I think my help desk guy finally understood it that day.

 

A week or two later, I caught my converted help desk guy correcting one of his colleagues. "No, we never hand out the IP address, only the FQDN."

 

Excellent.

You sit down at your desk. It's 9:10AM and your coffee is still warm. There is a smell of bacon in the air.

 

Suddenly your phone rings. The trading system is down. The time for quick thinking is now.

 

Where would you begin to troubleshoot this scenario?

 

A lot of people will reach for the biggest hammer they can find: a tool that will trace all activity as it hits the database instance. For SQL Server, that tool is typically SQL Profiler.

 

The trouble here is this: you are in a reactive mode right now. You have no idea as to the root cause of the issue. Therefore, you will configure your trace to capture as many details as possible. This is your reaction to make certain that when the time comes you are prepared to do as thorough a forensics job as possible in the hope that you can fix the issue in the shortest amount of time.

 

And this method of performance monitoring and troubleshooting is the least efficient way to get the job done.

 

When it comes to performance monitoring and troubleshooting you have two options: tracing or polling.

 

Tracing will track details and capture events as they happen. In an ironic twist this method can interfere with the performance of the queries you are trying to measure. Examples of tools that use the tracing method for SQL Server are Extended Events and SQL Profiler.

 

Polling, however, is also known by another name: sampling. A tool that utilizes polling will gather performance data at regular intervals. This is considered a light-weight option to tracing. Examples of tools that use this method are Performance Monitor (by default it samples once per second) and 3rd party applications like Database Performance Analyzer that query dynamic management objects (which are system views known as DMVs in SQL Server, and x$ and v$ objects in Oracle).

 

See, here's the secret about performance monitoring and troubleshooting that most people don't understand: when it comes to gathering performance metrics it's not the what you gather as much as it is the how you gather.


Knowing what to measure is an easy task. It really is. You can find lots of information on the series of tubes known as the internet that will list out all the metrics an administrator would want. Database size, free disk space, CPU utilization, page life expectancy, buffer cache hit ratio, etc. The list of available metrics seems endless and often overwhelming. Some of that information is even useful; a lot of it can be just noise, depending on the problem you are trying to solve.


So which method is right for you?


Both, actually.


Let me explain.


Think of a surgeon that needs to operate on a patient. There's a good chance that before the surgeon cuts into healthy skin they will take an X-ray of the area. Once they examine the X-ray, they know more about what they need to do when they operate.

Polling tools are similar to X-rays. They help you understand more about what areas you need to investigate further.  Then, when you need to take that deeper dive, that's where you are likely to use a tracing tool in order to return only the necessary information needed to solve the problem, and only for the shortest possible duration.

 

I find that many junior administrators (and developers with novice database performance troubleshooting skills) tend to rely on tracing tools for even the most routine tasks that can be done with a few simple queries against a DMV or two. I do my best to educate when I can, but it often is an uphill battle. I lost track of the number of times I've been thrown under the bus by someone saying that they can't fix an issue because I won't let them run Profiler against a production box as a first attempt at figuring out what’s going on. Rather than make people choose between one tool or the other I do my best to explain how they work well together.

 

I never use a tracing tool as a first option for performance monitoring and troubleshooting. I rely on polling in order to help me understand where to go next. Sometimes that next step requires a trace but often times I'm able to help make positive performance improvements without ever needing to run a trace. Then again, I'm lucky that I have some really good tools to use for monitoring database servers, even ones that are running on VMWare, or Amazon RDS, or Microsoft Azure.

 

There's a lot for anyone to learn as an administrator, and it can be overwhelming for anyone, new or experienced.

 

So it wouldn't hurt to double check how you are currently monitoring right now, to make certain you are grabbing the right things and at the right frequency.

I’ll be honest, when I initially saw the words configuration management, I only thought of managing device configurations. You know, things like keeping backup copies of configurations in case a device bit the bucket. However, the longer I’ve been in the IT field, the more I’ve learned how short-sighted I was in relation to what configuration management truly meant. Hopefully, by the end of this post, you will either nod and agree or thank me for opening your eyes to an aspect of IT that is typically misunderstood or severely neglected.

 

There are several components of configuration management that you, as an IT professional should be aware of:

 

  • Device hardware and software inventory
  • Software management
  • Configuration backup, viewing, archiving, and comparison
  • Detection and alerting of changes to configuration, hardware, or software
  • Configuration change management

 

Let’s briefly go over some of these and why they are so integral to maintaining a healthy network.

 

Most (hopefully all) IT teams keep an inventory of hardware and software that they support. This is imperative for things like service contract renewals and support calls. But, how you keep track of this information usually calls for question. Are you manually keeping track of this information using Excel spreadsheets or something similar? I would agree that it works, but in a world so hellbent on automation, why risk human error? What if you forget to add a device and it goes unnoticed? Wouldn’t it be easier to have software that automatically performs an inventory of all your devices?

 

One of my favorite components of configuration management is configuration backup and the ability to view those backups as well as compare them to previous backups. If your Core switch were to fail today, right now, are you prepared to replace it? I’m not talking about calling your vendor’s support to have them ship out a replacement. I’m talking about rebuilding that new shiny piece of hardware to its predecessor’s last working state. If you have backups, that process is made easy. Grab the latest backup and slap it on the new device when it arrives. This will drastically cut down the recovery time in a failure scenario. Need to know what’s changed between the current configuration and 6 months ago for audit purposes? Having those backups and a mechanism for comparing them goes a long way.

 

There are a number of ways to know when an intruder’s been in your network. One of those methods is through the detection and alerting of changes made to your devices. If you don’t have something in place that can detect these changes in real-time, you’ll be in the dark in more ways than one. How about if a co-worker made an “innocent” change before going on vacation that starts to rear its ugly head? Being able to easily generate real-time alerts or reports will help pinpoint the changes and get your system purring like a kitten once again.

 

In conclusion, configuration management is not just about keeping backups of your devices on hand. It involves keeping inventories of those devices as well as being able to view, archive, and compare their configurations. It also includes being able to easily detect and alert on changes made to your devices for events like catching network intruders. Are you practicing good configuration management techniques?

Whether it be at work or in my personal life, I like to plan ahead and be readily prepared. Specifically, when it comes to allocating storage, you definitely need to strategically plan your allocation. This is where Thin Provisioning comes inorganizations can adopt this strategy to avoid costly problems and increase storage efficiency.


Efficiently optimizing available space in storage area networks (SAN) is known as Thin Provisioning. Thin Provisioning allocates disk storage space between multiple users based on the requirement by each user at a given time.


Days before Thin Provisioning:           

Traditionally, admins allocated additional storage beyond their current needanticipating future growth. In turn, admins would have a significant amount of unused space, directly resulting in a loss on capital spent on purchasing disks and storage arrays.


Applications require storage to function properly. In traditional provisioning, a Logical Unit Number (LUN) is created and each application is assigned to a LUN. Creating a LUN with the traditional method meant a portion of empty physical storage space from the array is allocated. For the application to operate, space is then allocated to the application. At first, the application will not occupy the whole storage space allocated, however gradually the storage space will be utilized. 


How Thin Provisioning Works:

In Thin Provisioning, a LUN is created from a common pool of storage. A LUN in Thin Provisioning will be larger than the actual physical storage assigned to the LUN. For example, if an application needs 50GB of storage to start working.  50GB of virtual storage is assigned to the application so that the application can become operational. The application uses LUN in the normal procedure. Initially, the assigned LUN will only have a portion of the actual needed storage (say 15GB) and the rest35GB will be virtual storage. As the actual utilization of storage grows, additional storage is automatically taken from the common pool of physical storage. The user can then add more physical storage (based on requirement) without disturbing the application or altering the LUN. This helps admins eliminate the initial physical disc capacity that goes unused.

dig1.png

A use case:

Consider an organization that has 3 servers running different applications—a database, a file processing system, and email. All these applications need storage space to work and the organization has to consider storage space for future growth.


While using traditional provisioning, and say each application needs 1 TB each to operate. But out of 1 TB only 250 GB (25 %) will be used initially and the rest will be utilized gradually. With the whole 3 TB already allocated to the existing 3 applications, what happens if you need a new server/application in the organization? In this case, you will need more storage and unfortunately, it won’t be cheapyou will need to search for budget.

dig2.png

Now let’s look to see how thin provisioning can help with the aforementioned situation.  For example, in this scenario each server/application is provided with a virtual storage of 1 TB, but the actual storage space provided is just 250 GB. The space from the storage is only allocated when needed. When a new server/application is added, you can assign 250 GB from the physical storage space, but the server/application will have a total 1 TB of virtual storage. The organization can add the new server/application without purchasing additional storage. Also, increase the physical storage as a whole when needed.

dig3.png

Thin provisioning has 2 advantages in this use case:

  • Adding a new server/application is no longer an issue.
  • Avoid provisioning a total of 3 TB while setting up the servers. The organization can start with 1 TB and add more storage space as and when needed without disturbing the setup. 


When to use thin provisioning:

This type of provisioning used is more related to the use case and not technology. Thin provisioning is beneficial in the following situations:

  • Where the amount of resources used is much smaller than allocated.
  • When the administrator is not sure about the size of the database that needs to be allocated.
  • Situations when more servers/ applications often get added to the network.
  • When the organization wants to reduce the initial investment. 
  • When the DB administrator wants get maximum utilization from their storage.


Thin provisioning is not the silver bullet in the virtualization world. It too has its limitations. For example:

  • In regards to performance, this becomes a major factorthin provisioned storage becomes fragmented very easily, in turn decreasing performance.
  • Storage over allocationthe actual storage during thin provisioning can result in over allocation. Further, any write operation can bring a terrible failure (which cannot be repaired) on one or several drives.


Even though thin provisioning has drawbacks all these can be overcome by continuous storage monitoring. Now what you need to do is transform your ‘fat’ volumes to thin ones. But there are issues that can arise while doing so. Have you experienced any issues while moving your storage? If so, how did you resolve your issues?

In the first installment of the AppStack series, Joel Dolisy took you for A Lap around AppStackproviding a high level overview of the concept. Lawrence Garvin then connected the dots from a systems perspective in A Look at Systems’ Role in the AppStack. As Lawrence concluded his piece, he stated, “The complexity of the systems monitoring space is continuing to grow. Virtualization and shared storage was just the first step.” So, let’s take a look at how those and the application affect the AppStack.

 

Virtualization Management

Recent announcements from VMware (vSphere 6), Microsoft (Windows 10), and cloud service providers like Amazon Web Services highlight the advances made to accelerate rapid provisioning, dynamic resource scaling, and continuous application & services delivery. These capabilities extend IT consumption of anything-as-a-Service (XaaS) from on-premises to off-premises, from private cloud to public cloud, from physical to virtualization to cloud and back again.


Storage Management

Policy based storage aka software-defined storage is the latest trend that abstracts storage constructs from the underlying storage hardware. The objective is to port the advantages inherent to virtualization over to storage for actions involving storage capacity, performance, and utilization in order to meet Quality-of-Service (QoS) service level agreements (SLAs).


The Application is What Matters

Constantly changing variables in each layer make it more complex to manage the entire environment. The bottlenecks and trouble spots can either be virtual or physical constructs. And the only thing that matters is the application delivery and consumption. IT management is needed to monitor, troubleshoot, report on this complex and quickly changing environment. To adapt to that speed of IT and business, SolarWinds AppStack provides the context to connect these layers and quickly provide a single point of truth on any given application as SolarWinds CTO/CIO, Joel Dolisy, pointed out. And as fellow Head Geek, Lawrence Garvin, pointed outmonitoring is converging towards consolidated monitoring and comprehensive awareness of the end-user experience.


Indispensable to Software-Defined IT Professionals

All of the above make AppStack indispensable to software-defined IT professionals that make their living in multiple clouds, driving multiple container vehicles, and engineering the automation & orchestration of self-healing and auto scaling policies in their ecosystem.

 

For the next installment of this series, Patrick Hubbard will share his insights and experience on the AppStack concept.

IT professionals are admittedly a prideful bunch. It comes with the territory when you have to constantly defend yourself, your decisions, and your infrastructure against people who don’t truly understand what you do. This is especially true for network administrators. “It’s always the network.” Ever heard that one before? Heck, there’s even a blog out there with that expression created by someone I respect, Colby Glass. My point is, as IT professionals, we have to be prepared at a moment’s notice to provide evidence that an issue is not related to the devices we manage. That's why it's imperative that we must know our network very well inside and out.

 

With that being said, It should be no surprise to you that when I started my career in networking in 2010, I thought NMS platforms were pretty amazing. Pop some IP addresses in and you’re set.¹ The NMS goes about its duty, monitoring the kingdom and alerting you when things go awry. I could even log in and verify it for myself by looking if I wanted to be certain. I could even dig in at the interface level and give you traffic statistics like discards and errors, utilization, etc. I had instant credibility at my finger tips. I could prove the network was in great shape at a moment's notice.  Want to know if that interface to your server was congested yesterday evening at 7pm? It sure wasn't and I have the proof! Can’t get much better than that, right?

 

Until…

 

I saw netflow for the first time. Netflow has a way of really opening your eyes. “How did I ever think I knew my network so well in the past?”, I thought. I had no visibility into the traffic patterns flowing through my network. Sure, I could fire up a packet capture pretty easily, but that approach is reactive and time-consuming depending on your setup. What if that interface really WAS congested yesterday evening at 7pm? I have no data to reference because I wasn't running a packet capture at that exact time or for that particular traffic flow. It’s helpful to tell someone that the interface was congested, but how about taking it a step further with what was congesting it? What misbehaving application caused that link to be 90% utilized when traffic should have been relatively light at that time of the day? The important thing to realize is that I’m not just an advocate for netflow, I’m also a user!² Here’s a quick recap of an instance where netflow saved my team and I.

 

I recently encountered a situation where having net flow data was instrumental. One day at work, we received multiple calls, e-mails, and tickets about slow networks at our remote offices. They seemed to be related, but we weren't sure at first. The slowness complaints were sporadic in nature which made us scratch our heads even more. After looking at our instance of NPM, we definitely saw high interface utilization at some, but not all of our remote sites. We couldn't think of any application or traffic pattern that would cause this. Was our network under attack? We thought it might be prudent to involve the security team, in case it really was an attack, but before we sounded the alarm, we decided to check out our netflow data first. What we saw next really baffled us.

 

Large amounts of traffic (think GBs/hour) was coming from our Symantec Endpoint Protection (SEP) servers to clients at the remote offices over TCP port 8014. For those of you who have worked with Symantec before, you probably already know that this is the port that the SEP manager uses to manage its clients (e.g. virus definition updates). At some point, communication between the manager and most of its clients (especially in remote offices) had failed and the virus definitions on the clients became outdated. After a period of time, the clients would no longer request the incremental definition update; they wanted the whole enchilada. That’s okay if it’s a few clients and the download process ends in success the first time. This wasn't the case in our situation. There were hundreds of clients all trying to download this 400+MB file from one server over relatively small WAN links (avg. 10Mb/s). The result of this was constantly failing downloads which triggered the process to start over again ad infinitum. As a quick workaround, we decided to QoS the traffic based on the port number until the issue with the clients was resolved. With this information at our disposal, we brought it to the security team to show them that their A/V system was not healthy. Armed with the information we gave them, they were quickly able to identify several issues with the SEP manager and its clients which helped them eventually resolve several issues including standing up a redundant SEP manager. Without net flow data, we would have had to set up SPAN ports on our switches and wait  for a period of time before analyzing packet captures to determine what caused the congestion. By having netflow, we were instantly able to capitalize on it by viewing specific times in the past to determine what was traversing our network when our users were complaining.

 

That’s just one problem netflow has solved for us. What if that port was TCP/6667 and it was coming from your CFO’s computer? Do you really think your CFO is on #packetpushers (irc.freenode.net) trying to learn more about networking? No, it’s more likely a command and control botnet obtaining its next instructions on how to make your life worse. From a security perspective, netflow is just one more tool to add in the never-ending fight against malware. So what are you waiting for? Get with the flow… with netflow!

 

1. Of course it's never quite that easy. You'll have to configure SNMP on all of your devices that you want to manage and/or monitor.

2. Hair Club for Men marketing

Most networks by now are slowly making the transition from IPv4 addresses to IPv6. This new availability and abundance of global IPv6 addresses will enable businesses to easily provide services to their customers and internal users. However, there are a few things to make note of when you have IPv6 running in your network.

  1. Unknown IPv6 enabled devices: Many current operating systems not only support, but also enable IPv6 by default. Devices like firewalls & IDS equipment may not be configured to recognize IPv6 traffic in the network. Unfortunately, the attacker community can leverage this gap to infiltrate and attack both IPv4 &IPv6 networks. Unauthorized clients using IPv6 auto-configuration, can configure their own global address if they are able to find a global prefix.
  2. Multicast: A Multicast address is used for one-to-many communication, with delivery to multiple interfaces. In IPv6, Multicast replaces Broadcast for network discovery functions like dynamic auto-configuration of devices and DHCP services. The IPv6 address range FF00::/8 is reserved for multicast and by combining this with a scope, unauthorized users can easily reach hosts or application servers if they want.
  3. Stateless Address Auto-configuration (SLAAC): SLAAC allows network devices to automatically create valid IPv6 addresses. It permits hot plugging of network devices i.e. no need for manual configuration. Devices can simply connect to the network, operate stealthily, and go unnoticed for a very long time.
  4. Devices not supporting IPv6: Some network security devices like firewalls, filters, NIDS, etc. may not support IPv6 or may not be configured to work with IPv6. As a result, these IPv6 enabled hosts can access the Internet with no firewall protection or network access controls. In turn, malicious tools can be used to detect IPv6-capable hosts, taking control of IPv6 auto-configuration & tunneling IPv6 traffic in and out of IPv4 networks undetected.

 

Mitigate Risks from IPv6 in Your Network


Here are a few important tips to help you stay in control of your network, while maintaining optimum use of your IPv6 address space:

  • Deploy network controls to be both IPv4 and IPv6 aware
  • Network admins should have the same level of monitoring for both protocols
  • Define and implement baseline security controls for IPv6 environments to meet the same or better security as IPv4 environments
  • To actively manage these risks, organizations are encouraged to adopt a comprehensive IP management strategy

 

It’s important to know your IPv6 management needs and be aware of what is required to efficiently and securely manage your IPv6 address space. IPv6 addresses are more complex, longer and harder to remember. Also, IPv6 does not have the concept of static IP addresses. It uses SLAAC locally and DHCPv6 remotely. Furthermore, the existence of both IPv4 and IPv6 addresses in the network increases the complexity in management of the entire IP space.

 

IPv6 is much more complex, so spreadsheets simply won’t work for IPv6as the address boundaries are much more difficult and longer. Further, dynamic assignment of addresses makes it very difficult to manually update spreadsheets and maintain up-to-date information.

 

In short, IPv6 networks need a comprehensive & automated IP Address Management solution. Utilizing automated software allows you to effectively track all IPv6 addresses in the network, manage IPv6 network boundaries, and track dynamic IPv6 assignmentswhile helping you ensure that the existence of IPv6 in the network is not causing a security threat.

I started a thread on Twitter this week asking Who are some awesome women in monitoring? One of the common reactions (privately and respectfully, I'm happy to say) has been asking me why I started the discussion in the first place. I thought that question deserved a response.

Because, I'm a feminist. Yes, Virginia, Orthodox Jewish middle-aged white guys can be feminists, too. I think that anything that can be done to promote and encourage women getting into STEM professions should be done. Full stop.

"But why 'women in monitoring'?" I'm then asked. "Why not 'awesome women in I.T.'?"

Running a close second to my first response is that I'm a monitoring enthusiast. I think monitoring (especially monitoring done right) is awesome, a lot of fun, and provides a huge value to organizations of all sizes.

I also think it's an under-appreciated discipline within I.T. Monitoring today, reminds me of InfoSec, Storage, or Virtualization about a decade ago. It was a set of skills, but few people claimed that it was their sole role within a company.

I want to see monitoring recognized as a career path, the same as being a Voice engineer or a data analytics specialist.

Of course, this all ties back to my role as Head Geek. Part of the job of a Head Geek is to promote the amazingamazing solutions, amazing trends, amazing companies, and amazing groupsas it relates to monitoring.

One reason this is explicitly part of my job is to build an environment where those people who are quietly doing the work, but not identifying as part of "the group" feel more comfortable doing so. The more "the group" gains visibility, the more that people who WANT to be part of the group will gravitate towards it rather than falling into it by happenstance.

Which brings me back to the point about "amazing women in monitoring". This isn't a zero-sum competition. Looking for amazing women doesn't somehow imply women are MORE amazing than x (men, minorities, nuns, hamsters, etc).

This is about doing my part to start a conversation where achievements can be recognized for their own merit.

I know that's a pretty big soapbox to balance on a series of twitter posts, but I figure it's gotta start somewhere.

So, if you know of any exceptional women in monitoring: Share their thwack ID or twitter handle below to help me give them a shout out.

Filter Blog

By date:
By tag: