Skip navigation
1 2 3 Previous Next

Geek Speak

1,895 posts

tldr;

SolarWinds and Vyopta now integrate so that you can monitor live data from your video infrastructure and access switch interface for any problem call in any conference room for Polycom or Cisco endpoints.

 

Key Features:

  • Simple API-level integration
  • Single click from any Cisco or Polycom endpoint to NPM interface details page
  • Live call stats, video device status, camera/display connection data, and registration info.

 

Eliminate the Video Conference Blind-Spot

Do you ever enter a never-ending blame game with your A/V team about why video conferences fail? Are you responsible for the video infrastructure in your environment? Perhaps even if you don’t want to be? Tired of those codecs and video infrastructure being a black-box in terms of actual call statistics and quality metrics? Want to bridge the visibility gap between your voice / video and the rest of your network infrastructure? Well perfect - because Vyopta’s real-time video monitoring now integrates with SolarWinds Network Performance Monitor.
     With this integration you are now able to monitor, alert, and diagnose end-to-end video call issues through Vyopta AND identify whether it is a network problem or a video device problem. Furthermore, with one-click access to NPM from every video endpoint, you can diagnose and fix the issue if it is a network problem. On the Vyopta side- call quality and hardware statistics are pulled directly from your endpoints and bridges via API. Whether you are using Cisco, Polycom, Acano, Pexip or Vidyo in whichever flavor, your data is combined and normalized in real-time. Based on this broad dataset, you are able to assess end-to-end call quality per call or determine whether an issue may be systemic within your video environment. Perhaps it’s as simple as the screen or camera being disconnected on the endpoint. Maybe the user dialed the wrong number. In Vyopta, you can get alerted for and diagnose the following issues at a glance:

  • Camera/Display disconnect
  • Endpoint becomes unregistered (unable to make calls)
  • Endpoint down
  • Bad call quality from gateway, bridge, or endpoint (packet loss or jitter)
  • High Packet Loss

 

DashboardWeb.png

 

Vyopta’s built in dashboards you can also quickly evaluate the health of your bridging infrastructure. Perhaps one of your MCU’s is at capacity, or you have a spike in traversal calls:

 

RT Capacity.png

 

If the issue isn’t with an endpoint or bridge, you can click on the helpful SolarWinds link next to the endpoint to take you right to the connected access-layer switch interface in NPM:

 

Picture1.png



Once in NPM, you can determine if there is a common interface-level issue (VLAN / duplex / etc) or start to drive upstream into the infrastructure. Enhance your situational awareness with Netflow data or perhaps proactive UDP IPSLA transactions in VNQM. Recent config change bork DSCP tagging? NCM has you covered.

 

Screen Shot 2016-09-10 at 2.53.50 PM.png

 

So next time users start rumbling that those darn vidcons “don’t work” or the CEO’s call drops in the middle of a board meeting, know that your video infrastructure doesn’t have to be a black-box. With Vyopta and SolarWinds integration, it’s easy to troubleshoot. No more chasing phantom issues - isolate the root cause of video conference issues in just a few clicks.

There is hardly a government IT pro who has not seen sluggish applications create unhappy users.

 

Because the database is at the heart of every application, when there’s a performance issue, there’s a good chance the database is somehow involved. With database optimization methods -- such as identifying database performance issues that impact end-user response times, isolating root cause, showing historical performance trends and correlating metrics with response time and performance -- IT managers can speed application performance for their users.

 

Start with these four database optimization tips:

 

Tip #1: Get visibility into the entire application stack.

The days of discrete monitoring tools are over. Today’s government IT pros must have visibility across the entire application stack, or the application delivery chain comprising the application and all the backend IT that supports it -- software, middleware, extended infrastructure and especially the database. Visibility across the application stack will help identify performance bottlenecks and improve the end-user experience.

 

Tip #2: See beyond traditional infrastructure dashboards.

Many traditional monitoring tools provide a dashboard focused on health and status, typically featuring many charts and data, which can be hard to interpret. In addition, many don’t provide enough information to easily diagnose a problem -- particularly a performance problem.

 

Tools with wait-time analysis capabilities can help IT pros eliminate guesswork. They help identify how an application request is executed step-by-step and will show which processes and resources the application is waiting on. This type of tool provides a far more actionable view into performance than traditional infrastructure dashboards.

 

Tip #3: Reference historical baselines.

Database performance is dynamic. It is critical to be able to compare abnormal performance with expected performance. By establishing historic baselines of application and database performance that look at how applications performed at the same time on the same day last week, and the week before that, etc. , it is easier to identify a slight variation before it becomes a larger problem. And, if a variation is identified, it’s much easier to track the code, resource or configuration change that could be the root cause and solve the problem quickly.

 

Tip #4: Align the team.

Today’s complex applications are supported by an entire stack of technologies. And yet, most IT operations teams are organized in silos, with each person or group supporting a different part of the stack. Unfortunately, technology-centric silos encourage finger-pointing.

 

A far more effective approach shares a unified view of application performance with the entire team. In fact, a unified view based on wait-time analysis will ensure that everyone can focus on solving application problems quickly.

 

Remember, every department, group or function within an agency relies on a database in some way or another. Optimizing database performance will help make users happier across the board.

 

Find the full article on Government Computer News.

Following my review of Solarwinds Virtualization Manager 6.3, the fair folks at Solarwinds gave me the opportunity to put my hands on their next planned release, namely VMAN 6.4. While there is no official release date yet, I would bet on an announcement within Q4-2016. The version I tested is 6.4 Beta 2. So what’s new with this release?

 

From a UI perspective, VMAN 6.4 is very similar to its predecessor. Like with VMAN 6.3, you install the appliance and either install VIM (Virtual Infrastructure Monitor component) on a standalone Windows Server, or integrate with an existing Orion deployment if you already use other Solarwinds products. You’d almost think that no changes have happened until you head over to the « Virtualisation Summary » page. The new, killer feature of VMAN 6.4 is called « Recommendations » and while it seems like a minor UI improvement there’s much more to it than it looks like.

 

While in VMAN 6.3 you are presented with a list of items requiring your attention (over/under-provisioned VMs, idle VMs, orphan VMDK files, snapshots etc. – see my previous review), in VMAN 6.4 all of these items are aggregated in the « Recommendations » view.

 

Two types of recommendations exist: Active or Predicted. Active Recommendations are immediate recommendations that are correlated with issues that are currently showing up in your environment. If you are experimenting memory pressure on a given host, an active recommendation would propose you to move one or more VMs to another host to balance the pressure. Predicted recommendations, on the other hand, focus on proactively identifying potential issues before they become a concern, based on usage history in your environment.

 

The « Recommendations » feature is very pleasant to use and introduces a few elements that are quite important from a virtualisation administrator perspective:

 

  • First of all, administrators have the possibility to apply a recommendation immediately or schedule it for a later time (out of business hours, change windows, etc.)
  • Secondly, an option is offered to either power down a VM to apply the recommendation or to attempt to apply the recommendation without any power operations. This features comes in handy if you need to migrate VMs, as you may run into cases where a Power Off/Power On is required, while in other cases a vMotion / live migration will suffice
  • Last but not least, the « Recommendations » module will check if the problem still exists before actually applying a recommendation. This makes particularly sense in the case of active recommendations that may no longer be relevant by the time you decide to apply the recommendation (for example if you decide to schedule a recommendation but the issue is no longer reported by the scheduled time)

 

A nice and welcome touch in the UI is a visual aid that shows up when hovering your mouse over the proposed recommendations. You will see a simple & readable graphical view / simulation of the before & after status of any given object (cluster, datastore, etc.) in case you decide to apply the recommendation.

 

Max’s take

 

The “Recommendations” function, while apparently modest from an UI perspective, is in fact an important improvement that goes beyond the capacity reclamation and VM sprawl controls included in VMAN 6.3. Administrators are now presented with actionable recommendations that are relevant not only in the context of immediate operational issues, but also as countermeasures to prevent future bottlenecks and capacity issues.

 

A few side notes: if you plan to test the beta version, reach out to the Solarwinds engineers. The new “Recommendations” function is still being fine-tuned and you may not be able to see it if you integrate it with your current VIM or Orion environment. Once you install VMAN 6.4, you should let it run for approximately a week in order to get accurate recommendations.

Flash storage can be really, really fast. Crazy fast. So fast that some have openly asked if they really need to worry about storage performance anymore. After all, once you can throw a million IOPS at the problem, your bottleneck has moved somewhere else!

 

So do you really need to worry about storage performance once you go all-flash?

 

Oh yes, you definitely do!

 

All-Flash Storage Can Be Surprisingly Slow

 

First, most all-flash storage solutions aren't delivering that kind of killer performance. In fast, most all-flash storage arrays can push "only" tens of thousands of IOPS, not the millions you might expect! For starters, those million-IOPS storage devices are internal PCIe cards, not SSD's or storage arrays. So we need to revise our IOPS expectations downwards to the "hundred thousand or so" than an SSD can deliver. Then it gets worse.

 

Part of this is a common architectural problem found in all-flash storage arrays which I like to call the "pretend SSD's are hard disks" syndrome. If you're a vendor of storage systems, it's pretty tempting to do exactly what so many of us techies have done with our personal computers: Yank out the hard disk drives and replace them with SSD's. And this works, to a point. But "storage systems" are complex machines, and most have been carefully balanced for the (mediocre) performance characteristics of hard disk drives. Sticking some SSD's in just over-taxes the rest of the system, from the controller CPU's to the I/O channels.

 

But even storage arrays designed for SSD's aren't as fast as internal drives. The definition of an array includes external attachment, typically over a shared network, as well as redundancy and data management features. All of this gets in the way of absolute performance. Let's consider the network: Although a 10 Gb Ethernet or 8 Gb Fibre Channel link sounds like it would be faster than a 6 Gb SAS connection, this isn't always the case. Storage networks include switches (and sometimes even routers) and these add latency that slows absolute performance relative to internal devices. The same is true of the copy-on-write filesystems protecting the data inside most modern storage arrays.

 

And maximum performance can really tax the CPU found in a storage array controller. Would you rather pay for a many-core CPU so you'll get maximum performance or for a bit more capacity? Most storage arrays, even specialized all-flash devices, under-provision processing power to keep cost reasonable, so they can't keep up with the storage media.

 

Noisy Neighbors

 

Now that we're reset our expectations for absolute performance, let's consider what else is slurping up our IOPS. In most environments, storage systems are shared between multiple servers and applications. That's kind of the point of shared networked storage after all. Traditionally, storage administrators have carefully managed this sharing because maximum performance was naturally quite limited. With all-flash arrays, there is a temptation to "punt" and let the array figure out how to allocate performance. But this is a very risky choice!

 

Just because an array can sustain tens or even hundreds of thousands of I/O operations per second doesn't mean your applications won't "notice" if some "noisy neighbor" application is gobbling up all that performance. Indeed, performance can get pretty bad since each application can have as much performance as it can handle! You can find applications starved of performance and trudging along at disk speeds...

 

This is why performance profiling and quality of service (QoS) controls are so important in shared storage systems, even all-flash. As an administrator, you must profile the applications and determine a reasonable amount of performance to allocate to each. Then you must configure the storage system to enforce these limits, assuming you bought one with that capability!

 

Note that some storage QoS implementations are absolute, while others are relative. In other words, some arrays require a hard IOPS limit to be set per LUN or share, while others simply throttle performance once things start "looking hot". If you can't tolerate uneven performance, you'll have to look at setting hard limits.

 

Tiered Flash

 

If you really need maximum performance, tiered storage is the only way to go. If you can profile your applications and segment their data, you can tier storage, reserving maximum-performance flash for just a few hotspots.

 

Today's hybrid storage arrays allow data to be "pinned" into flash or cache. This delivers maximum performance but can "waste" precious flash capacity if you're not careful. You can also create higher-performance LUNs or shares in all-flash storage arrays using RAID-10 rather than parity or turning off other features.

 

But if you want maximum performance, you'll have to move the data off the network. It's pretty straightforward to install an NVMe SSD in a server directly, especially the modern servers with disk-like NVMe slots or M.2 connectors. These deliver remarkable performance but offer virtually no data protection. So doing this with production applications puts data at risk and requires a long, hard look at the application.

 

You can also get data locality by employing a storage caching software product. There are a few available out there (SanDisk FlashSoft, Infinio, VMware vFRC, etc) and these can help mitigate the risks of local data by ensuring that writes are preserved outside the server. But each has its own performance quirks, so none is a "silver bullet" for performance problems.

 

Stephen's Stance

 

Hopefully I've given you some things to think about when it comes to storage performance. Just going "all-flash" isn't going to solve all storage performance problems!

 

I am Stephen Foskett and I love storage. You can find more writing like this at blog.fosketts.net, connect with me as @SFoskett on Twitter, and check out my Tech Field Day events.

Screen Shot 2016-09-20 at 11.34.34 AM.png

 

This week I will be in Atlanta for Microsoft Ignite, splitting time between the Microsoft and SolarWinds booths in the exhibit hall. I have the privilege of delivering a session in the Community Theater on Tuesday, the 27th of September, from 2:50-3:10PM EDT. The title of the talk is "Performance Tuning Essentials for the Cloud DBA," and it's a story that's been on my mind for the past year or so.

 

First, "cloud DBA" is a phrase I borrowed from Rimma Nehme who mentioned the term during her PASS Summit keynote in 2014. Dr. Nehme was reinforcing an idea that I have been advocating for years, and that is for database administrators to stop thinking of themselves as DBAs and start thinking of themselves as data professionals. To a DBA, it shouldn't matter where the data resides, either down the hall or in the cloud. And for those of us that are accidental DBAs, or accidental whatevers, we know that there will soon be accidental cloud DBAs. And those accidental cloud DBAs will need help.

 

And that help begins with this 20-minute session at Ignite tomorrow.

 

During that session, you are going to hear me talk about the rise of hybrid IT, the changing face of IT, and how we won't recognize things in five years. An accidental cloud DBA will be overwhelmed at first, but we will help provide the structure they need for a solid foundation to perform well in their new role. And I will share some tips and tricks with you to help all cloud DBAs to be efficient and effective.

 

So if you are at Microsoft Ignite this week, stop by to chat with me in the booth, or after my session Tuesday. I'd be happy to talk cloud, data, databases, and technology in general.

Leon Adato

Called to Account

Posted by Leon Adato Expert Sep 26, 2016

showmethemoney.jpg

As IT professionals who have a special interest in monitoring (because why ELSE would you be here on THWACK, except for maybe the snuggies and socks), we understand why monitoring - and more importantly, automation in response to monitoring events - is  intrinsically useful and valuable to an organization. We understand that automation creates efficiencies, saves effort, creates consistency, and most of all, saves money in all it's forms - labor, downtime, lost opportunity, and actual cost.

 

To us, the benefits of automation are intuitively obvious.

 

And that's a problem. When I was in university and a professor used the phrase "it's intuitively obvious" that meant one thing: run like hell out of the class because:

A) the professor didn't understand it themselves, and

B) it was going to be on the exam

 

The idea that things that we see as intuitive are the same things we find either difficult or unnecessary to explain was driven home to me the other day when I solicited examples of the measurable business value of monitoring automation.

 

I said,

"I’m putting together some material that digs into the benefit of automated responses in monitoring and alerting. I’m looking for quotable anecdotes of environments where this has been done. What’s most important are the numbers, including the reduction in tickets, the improvement in response, etc.

 

Example:

A certain company “enjoyed” an average of 327 disk full alerts each and every month, which required roughly 20 minutes of staff time to investigate. After implementing a simple vbscript to clear the TEMP directory before opening a ticket, the number of alerts dropped to 57 per month. Now each of those tickets required one hour to resolve, but that’s because they were REAL events that required action, versus the spurious (and largely ignored) quantity before automation was introduced.)

 

If you have any stories like this, I’d love to hear it. Feel free to pass this along to other colleagues as you see fit."

I received back some incredible examples of automation - actual scripts and recipes, but that's not what I wanted. I wanted anecdotes about the results. So I tried again.

 

What I am looking for is more like examples that speak to how monitoring can justify its own existence to the bean counters who don’t care if technology actually works as long as it’s cheap.

I got back more examples of monitoring automation. Some even more jaw-droppingly awesome than before. But it's still not what I wanted. I started cornering people one-on-one to see if maybe email was the wrong medium.

 

What I discovered upon intense interroga... discussion was that IT pros are very familiar with how automation is accomplished. We remember those details, down to individual lines of code in some cases. But pressed for information that describes how that automation affected the business (saved money, reduced actionable tickets, averted specific downtimes which would have cost X dollars per minute), all I got was, "Well, I know it helped. The users said it was a huge benefit to them. But I don't have those numbers."

 

That was the answer in almost every case.

 

I find this interesting for a couple of reasons. First, because you'd think we would not only be curious, we would positively bask in the glory that our automation was saving the company money. We are, after all, IT professionals. The job practically comes with a cape and tights.

 

Second, we're usually the first ones to shout "Prove it!" when someone makes a claim about a particular fact, event, effect, or even opinion. Did someone say Star Trek IV was the highest grossing movie of the franchise? Show me the BoxOfficeMojo.com numbers for that!* At a dinner party and someone says sharks are a major cause of death in the summer? You're right there to list out 25 things that are actually more likely to kill you than sharks.**

 

But despite our fascination with facts and figures when it comes to ephemera and trivia, we seem to have a blind spot with business. Which is a shame.

 

It's a shame because being able to prove the impact that monitoring and alerting has on the business is the best way to get more of it. More staff, more servers, more software, more time, and most importantly, more buy-in.

 

Imagine providing your CEO with data on how one little alert saved $250 each and every time it triggered, and then opening up the ticket logs to show that the alert triggered 327 times last month. That's a savings of $81,750 in one month alone!!

 

Put those kind of numbers against a handful of automated responses, and you could feel like Scrooge McDuck diving into his pool of money every time you opened the ticket system!

 

So prove me wrong. In the comments below, give me some examples of the VALUE and business impact that monitoring has had.

 

More than just giving me grist for the mill (which, I'll be honest, I'm totally going to use in an upcoming eBook, totally giving credit where credit is due, and THWACK points!) what we'll all gain is insight into the formulae that works for you. Hopefully we can adapt it to our environment.

 

* In actuality StarTrek IV ranked 4th, unless you adjust for inflation in which case it was 3rd. First place is held in both categories by the original motion picture. The more you know.

**Sharks kill about 5 people annually***. Falling out of bed kills 450. Heck, bee stings claim 53 lives each year. So go ahead and dive in. The water's fine and Bruce the shark probably will leave you alone.

*** Unless you are watching "Sharknado". Then the number is closer to 16 people.

It's been an interesting few days. Brian Krebs had his website taken down by the largest Distributed Denial of Service (DDoS) attack ever seen. It massed some 665 Gbps of traffic that assaulted Akamai like storm that couldn't be stopped. Researchers have been working to find out how this attack was pulled off, especially considering that this attack was already more than twice the size of the largest DDoS attacks Akamai had ever seen. A news article late Friday said that the attack likely started from IoT devices repurposed for packet flooding.

Most of the recent DDoS attacks have come from stressing tools or other exploits in UDP-based services like DNS or NTP. For the attack vectors to shift to IoT means that nefarious groups have realized the potential of these devices. They sit in the network, communicating with cloud servers to relay data to apps on smartphones or tablets. These thermostats, clocks, cameras, and other various technology devices don't consume much bandwidth in normal operations. But just like any other device, they are capable of flooding the network under the right conditions. Multiply that by the number of smart devices being deployed today and you can see the potential for destruction.

What can IT professionals do? How do these devices, often consumer focused, fit into your plans? How can you keep them from destroying your network, or worse yet destroying someone else's in an unwitting attack?

Thankfully, tools already exist to help you out. Rather than hoping that device manufacturers are going to wake up and give you extra controls in an enterprise, you can proactively start monitoring those devices today. These IoT things still need IP addresses to communicate with the world. By setting you monitoring systems to sweep periodically for them, you can find them as they are brought onto the network. With tools like those at Solarwinds, you can also trend those devices to find out what their normal traffic load is and what happens when it starts bursting well beyond what it should be sending. By knowing what things should be doing, you can immediately be alerted for things that aren't normal.

These tools can also help you plan your network so that you can take devices offline or rate limit them to prevent huge traffic spikes from ever becoming an issue. You can then wait for the manufacturer to patch them or even create policies for their use that prevent them from causing harm. The evidence from a series of traces of bad acting devices in your network can be a great way to convince management that you need to change the way things work with regard to IoT devices. Or even to ensure that you have an IoT policy in place to begin with.

All that's required for the bad guys to use your network for their evil schemes is for networking professionals to do nothing. Make sure you know what's going on in your system so you're not surprised by complacency.

squirrel.jpg

Have you ever had the experience where you start looking into something, and every time you turn a corner you realize you have uncovered yet another thing you need to dig into in order to fully understand the problem? In my workplace we refer to this as chasing squirrels; being diverted from the straight path and taking a large number of detours long the way.

Managing infrastructure seems that way sometimes; it seems that no matter how much time you put in to the monitoring and alerting systems, there's always something else to do. I've looked at some of these issues in my last two posts (Zen And The Art Of Infrastructure Monitoring and Zen And The Ongoing Art Of Infrastructure Monitoring), and in this post I'm chasing yet another squirrel: the mythical baseline.

 

BASELINING ALL THE THINGS

 

If we can have an Internet of Things, I think we can also have a Baseline of Things, can't we? What is it we look for when we monitor our devices and services? Well, for example:

 

  • Thresholds vs Capacities: e.g. a link exceeds 90% utilization, or Average RAM utilization exceeds 80%. Monitoring tools can detect this and raise an alert.
  • Events: something specific occurs that the end system deems worthy of an alert. You may or may not agree with the end system.

 

These things are almost RESTful in as much as there are kind of stateless: absolute values are at play and it's possible to trigger a capacity threshold alert, for example, without having any significant understanding of the previous history of the element's utilization. There are two other kinds of things I might look at:

 

  • Forecasting: Detecting that the utilization trend over time will lead to a threshold event unless something is done before then. This requires historical data and faith in the curve-fitting abilities of your capacity management tool.
  • Something Changed: By way of an example. If I use IP SLA to monitor ping times between two data centers, what happens if the latency suddenly doubles? The absolute value may not be high enough to trigger timeouts, or to pass a maximum allowable value threshold, but the fact that the latency doubled is a problem. To identify this problem requires historical data, again, and the statistical smarts to determine when a change is abnormal compared to the usual jitter and busy hour curves.

 

This last item - Something Changed - is of interest to me because it offers up valuable information to take into account when a troubleshooting scenario occurs. For example if I monitor the path traffic takes from my HQ site to, say, an Office 365 site over the Internet, and a major Internet path change takes place, then when I get a call saying that performance has gone down the pan, I have something to compare to. How many of us have been on a troubleshooting call where you trace the path between points A and B, but it's hard to know if that's the problem because nobody knows what path it normally takes when things are seeming going well. Without having some kind of baseline, some idea of what NASA would call 'nominal', it's very hard to know if what you see when a problem occurs is actually a problem or not, and it's possible to spend hours chasing squirrels when the evidence was right there from the get go.

 

Many monitoring systems I see are not configured to alert based on a change in behavior that's still within thresholds, but it's something I like to have when possible. As with so much of infrastructure monitoring, triggering alerts like this can be plagued with statistical nightmares to figure out the difference between a system that's idle overnight seeing its utilization increase when users connect in the morning, and a system that usually handles 300 connections per second at peak suddenly seeing 600 cps instead. Nonetheless, it's a goal to strive for, and even if you are only able to look at the historical data in order to confirm that the network path has not changed, having that data to hand is valuable.

 

KILLING THE DEAD THINGS

 

Moving in a different but related direction, knowing if what you're monitoring is actually active would be nice, don't you think? My experience is that virtualization, while fantastic, is also an automated way to ensure that the company has a graveyard of abandoned VMs that nobody remembered to decommission once they were no longer needed. This happens with non-virtualized servers too of course, but they are easier to spot because the entire server does one thing and if it stops doing so, the server goes quiet. Virtual machines are trickier because one abandoned VM in ten deployed on a server can't be detected simply by checking network throughput or similar.

 

Knowing what's active helps minimize the number of squirrels that distract us in a troubleshooting scenario, so it's important to be able to tidy up and only monitor things that matter. In the case of VMs, SolarWinds' Virtualization Monitor has a Sprawl Management dashboard which helps identify VMs which have been powered down for over 30 days, as well as those which appear to be idle (and presumably no longer needed). In addition, if there are VMs running at 100% CPU for example (and triggering alerts most likely), those are also identified as being under-provisioned, so there's a chance to clean up those alerts in one place (referred to as VM right-sizing). Similarly for network ports, SolarWinds' User Device Tracker can identify unused ports so they can be shutdown to ensure that they do not become the source of a problem. This also allows better capacity planning because unused ports are identified as such, and can then be ignored when looking at port utilization on a switch.

 

PULLING THE THINGS TOGETHER

 

Looking at the list of things I want my monitoring and alerting systems to do, it seems that maybe no one system will ever provide everything I need in order to get that holistic view of the network that I'd like. Still, one thing SolarWinds has going for it is that Orion provides a common platform for a number of specialized tools, and the more SolarWinds uses data from multiple modules to generate intelligent alerting and diagnosis, the more powerful it can be as a tool for managing a broad infrastructure. Having a list of specific element managers is great for the engineering and operations teams responsible for those products, but having a more unified view is crucial to provide better service management and alerting.

 

Do you feel that Orion helps you to look across the silos and get better visibility? Has that ever saved your bacon, so to speak?

We just finished THWACKcamp last week and now we turn around and head to Atlanta for Microsoft Ignite next week. If you are at Ignite please stop by our booth to say hello! I'd love to chat with you about data, databases, or tech in general. While at VMworld I found myself getting involved in a lot of storage discussions, I suspect that next week I will be in a lot of Cloud discussions. Can't wait!

 

Here's a bunch of stuff I thought you might find interesting, enjoy!

 

What Carrie Underwood’s success teaches us about IBM’s Watson failure

Until I read this I didn't even know IBM was offering Watson as a service. No idea. None. In fact, I'm not sure I know what IBM even offers as a company anymore.

 

Microsoft surpasses IBM's Watson in speech recognition

Years ago I worked in the QA department for speech recognition and Microsoft invested heavily in our software. So I'm not surprised that they continue to make advances in this area, they've been investing in it for decades.

 

The Internet knows what you did last summer

If you want to retain your privacy, don't use computers. Everything is tracked in some way, no matter what. Data is power, and every company wants access to yours.

 

The Rivers of the Mississippi Watershed

Because I love data visualizations and so should you.

 

Ford charts cautious path toward self-driving, shared vehicles

Another week, another article about self-driving cars. You're welcome.

 

Someone Is Learning How to Take Down the Internet

This could have been titled "Someone is learning how to make clickbait titles and you won't believe what happens next". Do not read this one unless you have your tinfoil hat on.

 

DevOps and the Infrastructure Dumpster Fire

Yeah, DevOps is a lot like a dumpster fire.

 

This one time, at THWACKcamp, I crashed the set to take a selfie with my favorite dead celebrity Rob Boss:

rob-boss - 1.jpg

arjantim

To the cloud and beyond

Posted by arjantim Sep 20, 2016

The last couple of weeks we talked about the cloud and the software defined data center, and how it all comes into your it infrastructures. To be honest I understand a lot of you have doubts when talking and discussing cloud and SDDC. I know the buzzword lingo is strong, and it seems the marketing teams come up with new lingo every day. But all-in all I still believe the cloud (and with it SDDC) is a tool you can’t just reject because of looking at it as a marketing term.

 

One of the things mentioned was that the cloud is just someone else’s computer and that is very true, but saying that is forgetting some basic stuff. We have had a lot trouble in our datacenters, and departments sometimes needed to wait months before their application could be used, or the change they asked for was done.

 

Saying your own datacenter can do the same things as the AWS/AZURE/GOOGLE/IBM/ETC datacenters can do, is wishful thinking at best. Do you get your own CPU’s out of the Intel factory, do you own the microsoft kernel, and I could continue with much more you will probably never see in your DC. And don’t get me wrong, I know some of you work in some pretty amazing DC’s.

 

Let’s see if we can put it all together and come to a conclusion most of all can share. First I think it is of utmost importance to have your environment running at a high maturity level. Often I see the business running to the public cloud and complaining about their internal IT because of lack of resources and money to perform the same levels as in a public cloud. But throwing all your problems over the fence into the public cloud won’t fix your problem. No it will probably make it even worse.

 

You’ll have to make sure you’re in charge of your environment before thinking of going public, if you want to have the winning strategy. For me the Hybrid cloud or the SDDC is the only true cloud for much of my customers, at least for the next couple of years. But most of them need to get their environments to the next level, and there is only one way to do that.

 

Know thy environment….

 

We’ve seen it with outsourcing, and in some cases we are already seeing it in Public Cloud, we want to go in but we also need the opportunity to go out. Let’s start with going in:

 

Before we can move certain workloads to the cloud we need a to know our environment from top to bottom. There is no environment where nothing goes wrong, but environments where monitoring, alerting, remediation and troubleshooting is done at every level of the infrastructure and where money is invested to keep a healthy environment, normally tend to have a much smoother walk towards the next generation IT environments.

dart.png

The DART framework can be used to get the level needed for the step towards SDDC/Hybrid Cloud.

 

We also talked about getting SOAP, Security, Optimization, Automation and Reporting to make sure we get to the next level of infrastructure, and it is as much importance as the DART framework. If you want to create the level IT environment you need to be in charge of all these bulletpoints. If you are able to create a stable environment, on all these points, you’re able to move the right workloads to environments outside your own.

Screen Shot 2016-09-20 at 20.24.28.png

 

I’ve been asked to take a look at Solarwinds Server and Application Monitor (SAM) 6.3 and tell something about it. For me it is just one of the tools you need in place to secure, optimze and automate your environment and show and tell your leadership what your doing and what is needed.

 

I’ll dive into SAM 6.3 a bit deeper when I had the time to evaluate the product a little further. Thanks for hanging in there, and giving all those awesome comments. There are many great things about Solarwinds:

 

  1. 1. They have a tool for all the things needed to get to the next generation datacenter
  2. 2. They know having a great community helps them to become even better

 

So Solarwinds congrats on that, and keep the good stuff coming. For the community, thanks for being there and helping us all get better at what we do.

If you’re not prepared for the future of networking, you’re already behind.

 

That may sound harsh, but it’s true. Given the speed at which technology evolves compared to the rate most of us typically evolve in terms of our skillsets, there’s no time to waste in preparing ourselves to manage and monitor the networks of tomorrow. Yes, this is a bit of a daunting proposition considering the fact that some of us are still trying to catch up with today’s essentials of network monitoring and management, but the reality is that they’re not really mutually exclusive, are they?

 

In part one of this series, I outlined how the networks of today have evolved, and what today’s new essentials of network monitoring and management are as a consequence.

 

Before delving into what the next generation of network monitoring and management will look like, it’s important to first explore what the next generation of networking will look like.

 

On the Horizon

 

Above all else, one thing is for certain: We networking professionals should expect tomorrow’s technology to create more complex networks resulting in even more complex problems to solve.

 

Networks growing in all directions

 

Regardless of your agency’s position, the explosion of IoT, BYOD, BYOA and BYO-everything is upon us. With this trend still in its infancy, the future of connected devices and applications will be not only about the quantity of connected devices, but also the quality of their connections tunneling network bandwidth.

 

Agencies are using, or at least planning to use, IoT devices, and this explosion of devices that consume or produce data will, not might, create a potentially disruptive explosion in bandwidth consumption, security concerns and monitoring and management requirements.

 

IPv6 eventually takes the stage…or sooner (as in now!)

 

Recently, ARIN was unable to fulfill a request for IPv4 addresses because the request was greater than the contiguous blocks available. IPv6 is a reality today. There is an inevitable and quickly approaching moment when switching over will no longer be an option, but a requirement.

 

SDN and NFV will become the mainstream

 

Software defined networking (SDN) and network function virtualization (NFV) are expected to become mainstream in the next five to seven years; okay, maybe a bit longer for our public sector friends. With SDN and virtualization creating new opportunities for hybrid infrastructure, a serious look at adoption of these technologies is becoming more and more important.

 

So long WAN Optimization, Hello ISPs

 

Bandwidth increases are outpacing CPU and custom hardware’s ability to perform deep inspection and optimization, and ISPs are helping to circumvent the cost and complexities associated with WAN accelerators. WAN optimization will only see the light of tomorrow in unique use cases where the rewards outweigh the risks.

 

Farewell L4 Firewalling

 

Firewalls incapable of performing deep packet analysis and understanding the nature of the traffic at the Layer 7 (L7), or the application layer, will not satisfy the level of granularity and flexibility that most network administrators should offer their users. On this front, change is clearly inevitable for us network professionals, whether it means added network complexity and adapting to new infrastructures or simply letting withering technologies go.

 

Preparing to Manage the Networks of Tomorrow 

 

So, what can we do to prepare to monitor and manage the networks of tomorrow? Consider the following:

 

Understand the “who, what, why and where” of IoT, BYOD and BYOA

 

Connected devices cannot be ignored. According to 451 Research, mobile Internet of Things (IoT) and Machine-to-Machine (M2M) connections will increase to 908 million in just five years. This staggering statistic should prompt you to start creating a plan of action on how you will manage these devices.

 

Your strategy can either aim to manage these devices within the network or set an organizational policy to regulate traffic altogether. Curbing all of tomorrow’s BYOD/BYOA is nearly impossible. As such, you will have to understand your network device traffic in incremental metrics in order to optimize and secure them. Even more so, you will need to understand network segments that aren’t even in your direct control, like the tablets, phablets and Fitbits, to properly isolate issues.

 

Know the ins and outs of the new mainstream

 

As stated earlier, SDN, NFV and IPv6 will become the new mainstream. We can start preparing for these technologies’ future takeovers by taking a hybrid approach to our infrastructures today. This will put us ahead of the game.

 

Start comparison shopping now

 

Evaluating virtualized network options and other on-the-horizon technologies will help you nail down your agency’s particular requirements. Sometimes, knowing a vendor has or works with technology you don’t need right now but might later can and should influence your decisions.

 

Brick in, brick out

 

Taking on new technologies can feel overwhelming. Look for ways that potential new additions will not just enhance, but replace the old guard. If you don’t do this, then the new technology will indeed simply seem to increase workload and do little else. This is also a great measuring stick to identify new technologies whose time may not yet have truly come for your organization.

 

To conclude this series, my opening statement from part one merits repeating: learn from the past, live in the present and prepare for the future. The evolution of networking waits for no one. Don’t be left behind.

 

Find the full article on Federal Technology Insider.

September has now become a series of IT industry events. From VMware VMworld to SolarWinds THWACKcamp to Oracle OpenWorld to Microsoft Ignite, it seems like an endless procession of speaking sessions, in booth demo presentations and conversations with IT professionals in those communities.  That last aspect that is my favorite aspect of industry events. The work we do needs to have meaning and the people interaction is my fuel for that meaningful fire. Technologies, people, and processes will always change. Similarly, the desire to learn, evolve and move forward remains the constant for successful integration into any new paradigm. Be constant in your evolution.

 

Here's a brief recap of occurrences at recent events that I had the great privilege of attending and participating in:

 

VMworld 2016

SolarWinds Booth Staff at VMworld 2016sqlrockstar kong.yang at VMware vExpert Party Mob Museum
chrispaap before he rocked the booth with his Scaling Out Your Virtual Infrastructure
VMworld2016-booth.PNGHeadGeeks-vExpert-MobMuseum.pngCPaap-VMworld.png

 

 

THWACKcamp 2016

Head Geeks with their Executive Leader jennebarbour
Radioteacher kong.yang DanielleH photo bombed by KMSigmasqlrockstar - it's make-up time #ChallengeAccepted w/ hcavender. Peace out brother :-)
HeadGeeks THWACKcamp 2016.jpgTHWACKcamp - Community.pngTHWACKcamp - putting lipstick on a pig.JPG


Coming soon to an IT event near you: IT Pro Day, Microsoft Ignite in HAWT-lanta, Chicago SWUG, and AWS re:Invent in Las Vegas. Stay thirsty for IT knowledge and truths my friends! Let me know if you'll be at any of these events, always happy to connect with THWACK community members and converse the IT day away.

nannas_ring2.jpg

We just spent two days wrestling with this years’ THWACKcamp theme, and I think we’ve all come away much richer for the discussions held, the information shared, and the knowledge imparted.

 

And as I sit here in the airport lounge, tired but exhilarated and energized, an old post appeared on FB feed. It was written and posted by a friend of mine a few years ago, but came back up through serendipity, today of all days.

 

It tells the story of why this amazing woman who's been my friend since 7th grade chose the sciences as her life's path. And it starts with with a ring - one which was given 50 years late, but given never the less. And why the ring was actually secondary after all.

 

We lived near each other, played flute one chair apart in band, shared an interest in all things geeky including comic books and D&D, and when she got her license in high school we carpooled to HS together because we both suffered from needing to be “in place” FAR too early in the morning. Many a sleepy morning was spent sitting outside the band room, where I would test her on whatever chemistry quiz was upcoming. Even then her aspiration to be a a biologist was clear and firm, and she was driven to get every answer not only correct, but DOWN. Down pat. To this day whenever I call, I ask her the atomic weight of germanium (72.64, if you are curious).

 

She graduated a year before me, was accepted to the college of her choice, and from there easily attained all of her goals.

 

No small credit for this goes to the two women in her life: “Nanna” – her grandmother, who you can read about below; and her mother, a gifted chemical engineer with a long and illustrious career at BP, who was herself also inspired by Nanna.

 

When we talk about the energy behind the “challenge accepted” theme, this story really drives home a powerful set of lessons:

 

  1. The lesson that the challenges accepted by others have paved the way for us. They are a very real and tangible gift.
  2. The reminder that our willingness to face challenges today has the potential to impact far more than we realize: more than our day; more than our yearly bonus; more even than our career.

 

In simply getting up, facing the day, and proclaiming (whether in a bold roar to the heavens or a determined whisper to ourselves) “Challenge Accepted” we have the opportunity to light the way for generations to come.

 

************************

Nanna's Ring

It is a simple steel band. No engravings, nothing remarkable. It has always been on her right hand pinkie finger since she got it.

 

It was May in the summer of 1988. I had graduated with my bachelor degree in biology and was getting ready to start the Master's program at the University of Dayton. Nanna needed to get to Ada, Ohio and I needed to drop some stuff down at Dayton. We made a girl's weekend trip around Ohio.

 

We talked about traveling to college and how in her day it was all back roads; the interstate system that I was driving had not come about. I was speeding (young and in a hurry to go do things) and Nanna said, "Go faster." I made it from Eastside Cleveland to Dayton in record time. Dropped off the stuff with my friends and back on the road!

 

Made it to Ohio Northern in time to grab dinner in the dorm hall cafeteria, wander around a little bit until Nanna's knees had enough of that, then we settled into the dorm room for the night. I got the top bunk, she took the bottom. We talked and giggled like freshmen girls spending their first night at college.

 

The next morning we got dressed up, grabbed breakfast, then made our way to the lecture hall. The room was packed with kids my age and professors. The ceremony began. It was the honor society for engineers and the soon-to-graduate engineers were being honored. One by one, the new engineers were called to the front. Last of all the head of the society called out, "Jane Cedarquist!" Nanna smiled and, with a little more spring in her achy knees, went to the front of the hall.

 

"About 50 years ago, this young lady graduated with a degree in engineering. She was the first lady to do so from our college so we honor her today - an honor overdue." She got a standing ovation and a number of the young engineers that stood with her gave her hugs and shook her hands. After the quiet returned, the engineering students gave their pledge and received their rings of steel and placed them on their right hand pinkie fingers.

 

Jane Cedarquist went out into the world as an engineer and managed to survive trials and tribulations of being a woman in a man's world. Eventually she meet Dick Harris and they married and had two kids. She stayed home because that's how things worked. Eventually her kids grew up and had kids of their own. She never got back to engineering, though some of the landscape projects and quilts she made had the obvious stamp of an engineer's handywork. She traveled around the world and marvelled at the wonders, both man-made and God-given. She loved jewlery and always had her earrings, necklace, bracelets, and rings on her person. All of them had meaning and value. Some pieces would come and go, but after that day in 1988 she was never seen without that ring of steel.

 

It is a simple steel band. No engravings, nothing remarkable. It has always been on her right hand pinkie finger since she got it - until now.

 

(Leon's footnote: Jane Cedarquist Harris passed away, and passed her ring to my friend. She wears it - on a chain, since she understands the gravity of the pledge her Nanna made - carrying the legacy forward both professionally and symbolically).

stencil.linkedin-post (1).jpg

In my previous posts, I shared my tips on being an Accidental DBA - what things you should focus on first and how to prioritize your tasks.  Today at 1PM CDT, Thomas LaRock, HeadGeek and Kevin Sparenberg, Product Manager, will be talking about what Accidental DBAs should know about all the stuff that goes on inside the Black Box of a database.  I'm going to share with you some of the other things that Accidental DBAs need to think about inside the tables and columns of a database.

 

I'm sure you're thinking "But Karen, why should I care about database design if my job is keeping databases up and running?"  Accidental DBAs need to worry about database design because bad design has significant impacts on database performance, data quality, and availability. Even though an operational DBA didn't build it, they get the 3 AM alert for it.

 

Tricks

People use tricks for all kinds of reasons: they don't fully understand the relational model or databases, they haven't been properly trained, they don't know a feature already exists, or they think they are smarter than the people who build database engines. All but the last one are easily fixed.  Tricky things are support nightmares, especially at 3 AM, because all your normal troubleshoot techniques are going to fail.  They impact the ability to integrate with other databases, and they are often so fragile no one wants to touch the design or the code that made all these tricks work. In my experience, my 3 AM brain doesn't want to see any tricks.

 

Tricky.png

 

Tricky Things

Over my career I've been amazed by the variety and volume of tricky things I've seen done in database designs.  Here I'm going to list just 3 examples, but if you've seen others, I'd love to hear about them in the comments. Some days I think we need to create a Ted Codd Award for the worst database design tricks.  But that's another post...

 

Building a Database Engine Inside Your Database

 

You've seen these wonders…a graph database build in a single table.  A key-value pair (or entity attribute value) database in a couple of tables. Or my favourite, a relational database engine within a relational database engine.  Now doing these sorts of things for specific reasons might be a good idea.  But embracing these designs as your whole database design is a real problem.  More about that below.

 

Wrong Data Types

 

One of the goals of physical database design is to allocate just the right amount of space for data. Too little and you lose data (or customers), too much and performance suffers.  But some designers take this too far and reach for the smallest one possible, like INTEGER for a ZIPCode.  Ignoring that some postal codes have letters, this is a bad idea because ZIPCodes have leading zeros.  When you store 01234 as an INTEGER, you are storing 1234.  That means you need to do text manipulation to find data via postal code and you need to "fix" the data to display it.

 

Making Your Application Do the Hard Parts

It's common to see solutions architected to do all the data integrity and consistency checks in the application code instead of in the database.  Referential integrity (foreign key constraints), check constraints, and other database features are ignored and instead hundreds of thousands of lines of code are used to ensure these data quality features. This inevitably leads to data quality problems.  However, the worst thing is that these often lead to performance issues, too, and most developers have no idea why.

Why Do We Care?

 

While most of the sample tricks above are the responsibility of the database designer, the Accidental DBA should care because:

 

  • DBAs are on-call, not the designers
  • If there are Accidental DBAs, it's likely there are Accidental Database Designers
  • While recovery is job number one, all the other jobs involve actually getting the right data to business users
  • Making bad data move around faster isn't actually helping the business
  • Making bad data move around slower never helps the business
  • Keeping your bosses out of jail is still in your job description, even if they didn't write it down

 

But the most important reason why production DBAs should care about this is that relational database engines are optimized to work a specific way - with relational database structures.  When you build that fancy Key-Value structure for all your data, the database optimizer is clueless how to handle all the different types of data. All your query tuning tricks won't help, because all the queries will be the same.  All your data values will have to be indexed in the same index, for the most part.  Your table sizes will be enormous and full table scans will be very common.  This means you, as the DBA, will be getting a lot of 3 AM calls. I hope you are ready.

 

With applications trying to do data integrity checks, they are going to miss some. A database engine is optimized to do integrity checks quickly and completely. Your developers may not.  This means the data is going to be mangled, with end users losing confidence in the systems. The system may even harm customers or lead to conflicting financial results.  Downstream systems won't be able to accept bad data.  You will be getting a lot of 3 AM phone calls as integration fails.

 

Incorrect data types will lead to running out of space for bigger values, slower performance as text manipulation must happen to process the data, and less confidence in data quality.  You will be getting a lot of 3 AM and 3 PM phone calls from self-serve end users.

 

In other words, doing tricky things with your database is tricky. And often makes things much worse than you anticipate.

 

In Thwack Camp today, sqlrockstar Thomas and Kevin will be covering the mechanics of databases and how to think about troubleshooting all those 3 AM alerts.  While you are attending, I'd like you to also think about how design issues might have contributed to that phone call.  Database design and database configurations are both important.  A great DBA, accidental or not, understands how all these choices impact performance and data integrity.

 

Some tricks are proper use of unique design needs. But when I see many of them, or over use of tricks, I know that there will be lots and lots of alerts happening in some poor DBA's future.  You should take steps to ensure a good design lets you get more sleep.  Let the database engine do what it is meant to do.

TODAY IS THWACKCAMP! Have you registered yet? I will be in Austin this week for the event and doing some live cut-ins as well. Come join over 5,000 IT professionals for two days of quality content and prizes!

 

As exciting as THWACKcamp might be, I didn't let it distract me from putting together this week's Actuator. Enjoy!

 

Ransomware: The race you don’t want to lose

Another post about ransomware which means it's time for me to take more backups of all my data. You should do the same.

 

How to Stream Every NFL Game Live, Without Cable

I had a friend once that wanted to watch a football match while he was in Germany and he used an Azure VM from a US East datacenter in order to access the stream. Funny how those of us in tech know how to get around silly rules blocking content in other countries, like Netflix in Canada, but ti still seems to be a big secret for most.

 

Why lawyers will love the iPhone 7 and new Apple Watch

Everything you need to know about the recent Apple event last week. Even my kids are tuned in to how Apple likes to find ways to get people to spend $159 on something like an AirPod that will easily get lost or damaged, forcing you to spend more money.

 

Delta: Data Center Outage Cost Us $150M

Glad we have someone trying to put a price tag on this but the question that remains is: How much would it have cost Delta to architect these systems in a way that the power failure didn't need to trigger a reboot? If the answer is "not as much", then Delta needs to get to work, because these upgrades will be incremental and take time.

 

Samsung Galaxy Note 7: FAA warns plane passengers not to use the phone

Since we are talking about airlines, let's talk about how the next time you fly your phone (or the phone of the passenger next to you) may explode. Can't the FAA and TSA find a way to prevent these phones from being allowed on board?

 

Discipline: The Key to Going From Scripter to Developer

Wonderful write up describing the transition we all have as sys admins. We go from scripting to application development as our careers progress. In my case, I was spending more time managing my scripts and homegrown monitoring/tracking system than I was being a DBA. That's when I started buying tools instead of building them.

 

What if Star Trek’s crew members worked in an IT department?

Because Star Trek turned 50 years old last week, I felt the need to share at least one post celebrating the series that has influenced so many people for so many years.

 

Presented without comment:

DBA.jpg

Filter Blog

By date:
By tag: