cancel
Showing results for 
Search instead for 
Did you mean: 

Answers in Data: So Much Data...So Little Time...Data Time Zones

Level 12

DataArrowSmall.png

As you've been following along, I started this series with Data Is Power .In my original graphic I listed QUESTIONS, DATA, ANSWERS as the information pipeline we need to keep systems humming.  I provided a list of questions I might ask in Question Everything.  A few of you jumped in with some more great questions.

One comment provides the perfect segue to this week's post:

There is a mass of information, we try and teach our clients they do need to know everything - otherwise will just be swamped and noise. Only the essentials, this saves a lot of unnecessary data - hulattp

I think the key where here is know.  As a data evangelist I'm a bit biased towards making data-driven decisions.  That means collecting data even before I need it. Once an incident is underway we may not be able to understand what happened without the right data.  And that's the hard part: how do you know what data to collect before you know you will need it?

Types of Data Collections

  • Inventory data: The catalog of what resources our systems are using.  Which servers? Databases? SANs? Other Storage? Cloud resources? How long have they been around?
  • Log data: The who, when, where, how, why, to what, by what, from what data.
  • External systems conditions: What else was going on? Is it year end? Are there power outages? Was a new deployment happening? Patches? New users? All kinds of things can be happening outside our systems. What is the weather doing right now (really!)?
  • Internal conditions: What was/is resource utilization at a point in time? What is it normally? What is our baseline for those things?  What about for today/this month/this date/this day of week? What systems have the most pain points?

That's a lot of data.  Too much data for someone to know.  But having that raw data lets us answer some of the questions that we collected in Question Everything.

When we are diagnosing an issue (and batting away our Pointy-Haired Boss asking "how is it going?"), having that data is going to help.  Having historical data is going to help even more.  If production files are missing, we can replace them with a backup.  But if an automated process is deleting those files, we haven't fixed the problem. We've just entered into a whack-a-mole game with a computer.  And I know who is going to win that one.

So we need to find ways to make that data work for us.  But it can't do that if we aren't collecting it. And we can't do it if we rely on data about the system right now.

Data Timezones

The task we are doing also impacts the timeliness of the data we need.  There's a huge difference in what data we need depending on whether we are doing operational or remediation work. We don't just sit down and start pouring through all the data.  We need to use it to solve the problems (questions) we have right now. I think of these as time zones in the data.

ActivityData Time Zone
Operations (plate spinning)Now & Future
Diagnostics (firefighting, restoration)Recent and now
Strategic (process and governance)Recent and past

  • Keeping the plates spinning: Our normally job, running around keeping everything going like clockwork. Keeping the plates spinning so they don't break. I these cases, we want data that is looking forward.  We are looking for green across all our dashboards.  We want to know if a resource is having issues (disk space, timeouts, CPU utilization, etc.) We aren't looking back at what happened last week.  We won't actually have future data, but we can start predicting where problems are likely to pop up so that we can prioritize those activities.
  • Firefighting: Ideally, we want to know there's a fire long before the spark, but we don't always have that luxury.  We want to look at current data and recent data so that we know where to start saving systems (and sometimes even people).  We aren't here to redesign the building code or architectural practices.  We need to put out the fire and save production data. We need to get those systems plates back spinning. In database management, this might be rolling back changes, rebooting servers or restoring data.  It's fixing the problem and making safe the data. We get systems back up and running. We need data to confirm we've done that. Maybe we put in place some tactical changes to mitigate more 3 AM calls.  But we have to get up and do more plate spinning in another hour.
  • Strategic responses: We can't be firefighting all day, everyday.  Keeping those plates spinning means having time to make strategic responses. Changing how, when, where, why, and who does things.  Making improvements and keeping things going. This is where we really start mashing up the trends of the data collections.  What is causing us pain and therefore user pain? What is costing the company money? What is costing your manager money?

Questions for You

What other data time zone perspectives are there? Is there an International Date Line for data timezones?  What about a Daylight Saving Time scheme for these time zones? Do these time zones vary by job title?

Next week I'll talk about how these data collections and data timezones impact how we use the data and how we consume it. In other words, how we take raw data and make it powerful.

15 Comments
sqlrockstar
Level 17

Good topic for discussion here. I've usually only thought about time zones in the standard way, and only as a result of having to deal with people that cannot think in UTC.

But for the years I was a production DBA in financial services...the concept of time zones certainly rings true. Each department operated at slightly different times of the day, and usually it meant 18 hours worth of operations to complete one "cycle". You find that you need to close the books on one day before you can open the books for business the next day.

I'd say that the time zones vary by job title, no question there. More importantly the time zones aren't known to the other groups. It would be valuable for the "recent and past" groups to understand that there is another group that is more focused on "now and future". I'd like to think that bit of knowledge may help break down some communication barriers.

cahunt
Level 17

That is a nice break down of our data time zones. This is not something I consider regular shop room chat, in fact specifically I don't think I have ever heard it put like this.

Specific Dates or time frames, sure. 30 days, 60 days log retention; even as long as 180 days for some things, logs, stats, events. Understanding the definition of each time frame is the beginning to being able to consider what the proper exact time frame should be. Knowing that after each instance passes your operational data becomes your strategic data adds depth to logs, traps and other events. It's more than just shedding light on why something went down, but an in depth look for those trends that continually cost you money.

One thing I have put together is a notification and tracking DB/List. Each outage is tracked; time frame, issue, impacted support service, depth of issue (# affected), resolution, who fixed, and automated emails to other support groups to keep them aware of the issue and progress. Down time calculated is then used and graphed for each fiscal year.

We use this to show trends and progress with level of service; all going back to the mythical 5 9's. ... and work has delayed my graphing and visual tracking of issue types on more of a service provider issue level. I track the user service out, but also the support service, and our root cause or specific entity related type data for more granular information. It's is nice to be able to tell that your internet outage was related to either your service provider hardware that failed, or a hardware related issue with a Distribution Switch, or even misconfiguration on your Border Edge. Adding that granularity gives your more power with your Diagnostic and Strategic Activities.

byrona
Level 21

I think you have the timezones captured (past, present, future).  I am curious what type of data resolution should be required for each of those zones?  It's always a difficult balance to keep your data manageable but still have the resolution you need.  I am thinking that the resolution requirements are different depending on the different timezones.  Now that I say this I also think that "past" may need to be broken out into short term past and long term past... and possibly the same thing with "future".

As the person that manages our monitoring systems I find myself in constant battles against people that want all data at the highest possible resolution stored forever... I call them virtual hoarders and they are a vicious breed.  Having the data sets for the different timezones is important and the resolution of that data can make or break you.

bluefunelemental
Level 15

Very interesting perspective-

I think my implementation of severity, while based on Win events, might have an inherent data timezone aspect.

Critical alerts go out immediately and are about now now now.

Error events indicate something non critical HAS happened and are summarized by shift or daily.

Warning events are predictive of future issues based on current and past metrics- summarized daily and weekly.

Informational events are for operational or strategic data such as when a new node, application, or database instance is seen so as to trigger an audit.

Between Orion's object based and FCAPS, ITIL event management, ITU RecM3000 , and now data timeline I have to say its a little overwhelming.

kevinrak
Level 9

Very well put. I already have some changes planned for our data collection based on reading this thread. As a small shop which just moved out of the 100% firefighting role, we have a long way to go. Just two years ago, we wouldn't have even known that one of our branches lost connection or a server went down until a user called to complain. Now we have automated emails which notify us within seconds. The system we're running, however, doesn't keep any historical data -- only what is happening now. If I can reconfigure this to graph network utilization over the past week, then that will likely help us hone in on problems much faster. *ponders*

datachick
Level 12

All of these discussions are the same ones I'm having with end users: how much data do we need to keep, how do you use it for decision making, how much data do you need to make a reliable decision, how do you make those decisions...we need to manage our own data the same way.

datachick
Level 12

I agree that we probably need finer slices of timezones.  And I think it varies by the type of data, too.  If someone where trying to track a trend in unauthorized login attempts, I'm betting they'd want to see years of data, perhaps.  If someone wants to know the impact of various hardware configurations, they might only want to go back as far as the first installation of the current standard hardware set. 

datachick
Level 12

Good point about alerts. I wrote this week briefly about Alert Burn Out.  And one of the issues related to that is establishing the criticality of alerts.  That sort of thing is usually tied to an escalation process as well. In other words, when should your boss be notified that you haven't responded to a certain type of alert? And when should her boss get alerted?

datachick
Level 12

Ah, firefighting.  I used to have a boss that LOVED being a firefighter, to the point that he seemed to invent fires everywhere he went.  Then he'd fan the flames, get a raging inferno going and swoop in to put it out.  Then he would repeat the process on some other project.

I do know that there are some roles that are all about fighting fires.  But too many of us are stuck there, with little time or resources to do some fire prevention. 

sevier.toby
Level 10

Interesting topic. 

jay.perry
Level 11

I gained a lot of knowledge after reading this.

jay.perry
Level 11

Good post

esther
Level 12

Great post.

jkump
Level 15

Thanks for the information.

Jfrazier
Level 18

" Ah, firefighting.  I used to have a boss that LOVED being a firefighter, to the point that he seemed to invent fires everywhere he went.  Then he'd fan the flames, get a raging inferno going and swoop in to put it out.  Then he would repeat the process on some other project."

In the firefighting world we call that an arsonist....and sadly, yes there are some firefighters that are also arsonists.

About the Author
Data Evangelist Sr. Project Manager and Architect at InfoAdvisors. I'm a consultant, frequent speaker, trainer, blogger. I love all things data. I'm an Microsoft MVP. I work with all kinds of databases in the relational and post-relational world. I'm a NASA 2016 Datanaut! I want you to love your data, too.