Better Metrics. Better Data. Better Analytics. Better IT.

datachick over 7 years ago 4 minute read time

A few years ago I was working on a project as a project manager and architect when a developer came up to me and said, "You need to denormalize these tables…" and he handed me a list of about 10 tables that he wanted collapsed into one big table. When I asked him why, he explained that his query was taking four minutes to run because the database was "overnormalized." Our database was small: our largest table had only 40,000 rows. His query was pulling from a lot of tables, but it was only pulling back data on one transaction. I couldn't even think of a way to write a query to do that and force it to take four minutes. I still can't.

I asked him to show me the data he had to show me the duration of his query against the database. He explained that he didn't have data, he had just timed his application from button push to results showing up on the screen. He believed that because there could be nothing wrong with his code, then it just *had* to be the database that was causing his problem.

I ran his query against the database, and the results set came back in just a few milliseconds. No change to the database was going to make his four-minute query run faster. I told him to go find the cause that was happening between the database and the application. It wasn't my problem.

He eventually discovered that the issue was a complex one involving duplicate IP addresses and other network configuration issues in the development lab.

Looking back on that interaction, I realize that this is how most of us in IT work: someone brings us a problem, ("the system is slow"), we look into our tools and our data and make a yes-or-no answer about whether we caused it. If we can't find a problem, we close the ticket or send the problem over to another IT group. If we are in the database group, we send it over to the network or storage guys. If they get the report, they send it over to us. These sort of silo-based responses take longer to resolve, often lead to a lot of chasing down and re-blaming. It costs time and money because we aren't responding as a team, just a loose collection of groups.

Why does this happen?

The main reason we do this is because typically we don't have insights into anyone else's systems' data and metrics. And even if we did, we wouldn't understand it. Then we throw in the fact that most teams have their own set of specialized tools and that we don't have access to. I had no access to network monitoring tools nor permissions to run any. It wasn't my job.

We are typically measured and rewarded based on working within our own groups, be it systems, storage, or networks, not on troubleshooting issues with other parts of infrastructure. It's like we build giant walls around our "stuff" and hope that someone else knows how to navigate around them. This "not my problem' response to complex systems issues doesn't help anyone.

What if it didn't have to be that way?

Another contributing factor is the intense complexity of the architecture of modern application systems. There are more options, more metadata, more metrics, more interfaces, more layers, more options than ever before. In the past, we attempted to build one giant tool to manage them all. What if we could still use specialty tools to monitor and manage all our components *and* pull the graph of resources and their data in one place so that we could analyze and diagnose issues using a common and sharable way?

True collaboration requires data that is:

Integrated
Visualized
Correlated
Traceable across teams and groups
Understandable

That's exactly what SolarWinds' PerfStack does. PerfStack builds upon the Orion Platform to help IT pros troubleshoot problems in one place, using a common interface, to help cross-platform teams figure out where a bottleneck is, what is causing it and get on to fixing it.

From <https://thwack.solarwinds.com/community/solarwinds-community/product-blog/blog>

PerfStack combines metrics you choose from across tools like Network Performance Monitor Release Candidate @network and Server & Applications Monitor Release Candidate from the Orion Platform into one easy-to-consume data visualization, matching them up by time. You can see in the figure above how it's easy to spot a correlated data point that is likely the cause of less-than-spectacular performance your work normally delivers. PerfStack allows you to highlight exactly the data you want to see, ignore the parts that aren't relevant, and get right to the outliers.

As a data professional, I'm biased, but I believe that data is the key to successful collaboration in managing complex systems. We can't manage by "feelings," and we can't manage by looking at silo-ed data. With PerfStack, we have an analytics system, with data visualizations, to help us get to the cause faster, with less pain-and-blame. This makes us all look better to the business. They become more confident in us because, as one CEO told me, "you all look like you know what you are doing." That helped when we went to ask for more resources

Do you have a story?

Later in this series, I'll be writing about the nature of collaboration and how you can benefit from shared data and analytics in delivering better and more confidence-instilling results to your organization. Meanwhile, do you have any stories of being sent on a chase to find the cause of a problem? Do you have any great stories of bizarre causes you've found to a systems issue?

Top Comments

rschroeder over 7 years ago +2

A top IBM server hardware guru came to me, once upon a time, with a complaint that the network was slow. He said "I can ping from one of my IBM interfaces to another address on that same box, and the…
vinay.by over 7 years ago +1
vinay.by over 7 years ago +1

Eagerly waiting to try PerfStack ....

datachick over 7 years ago

I like the way you think.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
datachick over 7 years ago

Excellent point. Have you been reading the yet to be published third post in this series?
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
designerfx over 7 years ago

Same. I'm also happy that it also marks the EOS for Server 2008R2, because apparently I need to use that as leverage to upgrade.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
tallyrich over 7 years ago

Perfstack is very promising - I'm hoping to get the team excited about this.
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel
byrona over 7 years ago

Great Story!
One of the challenges is getting people to bring us problems instead of solutions. The guy in your story brought you a solution, not a problem. The issue with this is he was not equipped with enough data to be proposing a solution, before you jump to a solution you need to fully understand the problem you are trying to solve. Once he had enough info about the lab he was able to implement a much better solution that would actually solve his problem.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel