Day 2 Keynote: Career Power-up: Unlock Your Inner Data Scientist
Data scientists are an increasingly valuable (and well-compensated) asset to optimizing complex IT operations. However, most organizations can’t take full advantage of them, because they either aren’t large enough to justify dedicated headcount, or it seems too complex. But in reality, you may already have a strong researcher with years or even decades of experience—you!
In this session, we’ll speak to industry experts, technical gurus, and community experts who’ve made the transition to Data scientist—at least part-time. They’ll discuss advancements in monitoring and visualization technology, new ways to think about the mountains of IT data they’re collecting, and how to delight management with true insight. Remove complexity, make experienced operations staff more proactive, streamline troubleshooting, and solve IT’s most stubborn issues—with science!
Hello and welcome back to day two of SolarWinds THWACKcamp. It's great to have you with us, and hopefully you were able to attend the full day of sessions yesterday. Your comments have been amazing. Especially, it's so great to see so many of you saying not just, "Wow, I didn't know that SolarWinds did that," but discussing with each other that indeed, the challenges that you're facing in 2017 are, for many of you, profoundly different than they were just a couple of years ago. Veteran CatOs config experts are talking about how they're learning to use distributed tracing to do APM. But also, cloud experts are discussing the challenges of discovering nests of out of date firewall policies that are preventing access to their cloud APIs.
You could say that it's just another day on THWACK, with the type of cross- community conversations that we regularly have, but it's way more fun at THWACKcamp, where we get to participate with you live in real time. One theme seems to resonate more than ever this year-- the theme about the value of treating monitoring as a discipline. You're saying it in a couple of ways. First, taking a disciplined approach to monitoring, you're able to make smarter decisions about investments and your IT budget. And second, it's leading to improved communication with your management, and in some cases, about elevating your voice with management.
Which brings me to, and just hear me out, data science. As IT professionals, I think sometimes we forget that we are data people. We're not just the conduit through which data flows or the repository where it rests; we potentially have more insight, more empirical, data-based insight into the operations of our companies than almost anyone else. While we may not be training machine-learning models today on business data sitting in the databases that we manage, we are ourselves huge generators of IT data-- IT operations data. That data is both potentially valuable, and moreover, it's a great sandbox to learn at least the tools of a discipline that is rapidly defining all areas of technology, including IT operations.
That tool is data science, or at least, taking a more scientific approach to analyzing data. It's easier than you might think to get started. I could relate stories that I've heard from you over the last year at SWUGs, Cisco Live and DevOpsDays. Another option would be to interview our Head Geek, Thomas LaRock, a dyed-in-the- wool but unrepentant DBA. But he just got his Microsoft Professional Program for Data Science certification. Instead, we thought it would be more interesting to ask external industry experts for their take on the value of data science.
Over the last couple of days, I had a chance to sit down with thought leaders like Karen Lopez, Coté, Stephen Foskett--but also hands on experts, like Phoummala Schmidtt and Mindy Marks. As it turns out, more and more IT professionals are beginning to think like scientists about the huge amount of IT data that they generate. And interestingly, there seems to be a correlation between those who think like data scientists and those with accelerated IT careers.
Karen Lopez. You, I'm pretty sure, have some opinions on the only slightly overloaded term, data science.
Yeah, I sure do. One of the things is, there used to be this saying that if you call yourself-- if what you're practicing has the word science in it, then it's not really science. Computer scientist, political scientist-- but you know, a biologist is just called a biologist, right? So data science is kind of that way, because in data science, we don't practice--
[Laughs] Yeah, and we don't practice, you know, the scientific method as much. We don't put together an idea and then go about collecting data to prove it and do controlled studies.
We don't follow the scientific method. But you're right, it's both an overloaded term, but it's an important term. I'm thinking it's more data and math together than it is science together-- between stats and algorithms and machine learning and all that stuff.
Do you think that the scary word there is 'math'?
In IT, we have a tendency to think about relationships and systems and operations, and the idea that maybe math is something that happens in a spreadsheet, but it's not something we want to do.
Right, we might do averages, and then we'll debate which type of average math we're going to do. I think there's part of that. The other thing, along in IT professions and careers roles, we've kind of run out of terms, because we can't pull the engineering term, we can't--
And so science was the next one up. Even though data science has been around for a long time, it just became the sexy role in the last few years.
We do have, in SolarWinds for example, we actually have a data science team. There are these roles where people are-- they're not just glorified analysts. These are people that are mostly math-focused, that actually can come up with insights that you otherwise wouldn't see.
I think that is the distinction, because it's not just someone with data visualizations on top of a spreadsheet. There actually is a practice of data science with applying certain models, and training those models and being able to do those things. We in IT, we've always sort of been the shoemaker's children of not using the things that we're saying that we want the business to use--to make use of their data, to get more profit or sell more widgets. And yet, we don't even do that much with our own IT data.
Do you find if you take a more thoughtful approach to the way that you are doing reporting-- more insight than just lots of data--when you communicate to management, that they take you more seriously? Or you maybe have some influence that you wouldn't?
I think so. I mean, I'm a bit biased, but I tell everyone who comes to me and wants me to make a decision as a project manager, or as a manager, you know, bring data. Bring your opinion; bring your experience. But if I'm not really clear on which direction we should go, I'm going to go with the person who brought data, not only about what they want to do, but what the other options are.
So, find some medium between being a PhD mathematician and being a business-operations expert-- there's something in between there.
Yes. And I think that's important, because you can bring this data, but if the data actually means something else and you thought it meant one thing, then all the analysis, and all the models you put on top of it, don't work. In one example, in the mid-80s, about eight million children disappeared out of the US on one single day. And it was the day that the US started requiring social security numbers for dependents on tax returns. Eight million dependents disappeared on a single day. If you didn't know that about the data, you might be drawing some inference about something else that happened.
We generate lots of data in IT. And I think that inside of all of us, there is an inner data scientist that's trying to get out.
I love that. So where do you start? I mean, does it start with tools or math or just think time?
Of course, start everywhere, but I think the real key is understanding the data. There are these stats that 20% of a data scientist's time is just finding the data and understanding it. And then another 60% of their time is cleaning it up, wrangling it, trying to figure out what's wrong with it, excluding parts of it that aren't of the right quality. That's 80%.
Is data janitorial?
Yeah, basically. Basically.
But then you end up with a data set that's a lot easier to analyze. And ideally, it's a one-time investment and you can put some automation around that, and in that process, at least get some insight into what is spurious, what doesn't matter in your organization.
There's that. There's also the fact that if your organization has been practicing bad data practices--
That never happens.
[Laughs] Like not documenting what it meant, and why it was collected, and what its provenience was, and all of those things, then you just keep repeating that cycle and another scientist or another end user-- who're also starting to do some data science and everything-- you end up just doing that 80% work over and over and maybe coming to different result sets out of it. So that's a real problem. If your organization has already been collecting the data, they've amassed it in one location, or one sort of logical location for it, and they understand what it means, and all the anomalies that happen when you bought this company, or you sold off this business, or you stopped this line of business and got a new one. If you have all that then it's so much easier to move forward. I think that's what we're going to see. With the influence on data science, we'll see the influence on data quality and data collection that we might not have seen in the past.
Or even cost of resource utilization, right? I'm trying to get a manager, one day, to hear me say that I want to do less, better. With a comma in the middle of it, right? But if you can actually say, as a part of the data clean-up process, actually discover that, you know, we're collecting a lot of stuff that we don't need, or we're persisting it longer than we ever do. That gives you an opportunity to go back to management and say, you know, we can decrease what we're collecting, we don't have to save it as long, and that's going to decrease our cost to maintain it.
Right, so that is sort of the data hoarding part of it. That is a problem, especially when you're hoarding different versions of the same data, because that just causes the sourcing and cleanup to go longer. I think we'll also see with data science that people will want to go back to older data. Maybe it doesn't have to be online. Maybe it's something just like a data archive or something like that. So, really cold data. I've kind of learned some lessons recently that keeping data around might be okay, but it shouldn't impact your production and your regular operations.
I mean, I would separate out data science as we talk about it into two separate things. One of them is, you might call it advanced analytics, to use another term. I think that's more about the business side of the house, figuring out with all of this data that I have, how can I make more money or fulfill the mission I have? How can I tune my business better to run it? That's where you get just a pile of business related data and you don't really need infrastructure metrics and things in there. But that is worth considering because I think ultimately, you want to feed into that at some point. If you want to enter the Latvian market to sell your widgets, you're probably going to need a bunch of data about how things are operating there. As you're experimenting with new ways of doing things, you're going to use data science to kind of analyze and try to find patterns and things like that. If I've only got one rep covering all of the Baltic States and I keep sending them out to there, maybe I can come up with some pattern to do repping better, or whatever. So, I think a lot of the things you see about data science is all about that--really analyzing and having an analytical approach to how you run a business. Now, over in IT land, I think data science might be more-- it's sort of like applying that if I were to-- I'm trying to think of a different word than 'analyze'-- but if I were to analyze this pile of data that I have, might it help me predict an error happening, or find a--what do dogs have--a hot spot that I can go and cure, or whatever? I need to go diagnose problems that I have, and everyone always wants to do predictive analytics. That kind of data science, I guess you could apply a lot of science to it, but it's probably more coming up and keeping up with various things you would search for to identify that, like--commonly, when a network is going bad, it has this behavior, and commonly when a storage array is going bad, it has this behavior. And just looking to apply that. The only case where there might be more exploratory data science or analytics is if, like, you're writing your own applications and you have no idea how they're going to behave in production, and so you need to come up with some models and ways of analyzing and diagnosing what the issues are. I don't know. I mean, I think something like data science is always a little more gussied up, to be a lot more highfalutin than it is. It always starts with, like, let's figure out what problem you have and see if this tool of data science can be applied to solve that problem.
It doesn't need to just have ML or AI, like that's--
Yeah, yeah, AI and machine learning and stuff like that.
Oh yeah, and PhD math.
Sure. If you get yourself in a situation where you do need that kind of stuff then sure. But, you know, you could spend a week trying to figure out how to make pivot tables and that'll probably move the needle a lot.
Not all the way to cube query, just start with a pivot table.
Oh yeah, cube query. Yeah.
Cube query's good, right?
To that point, what does business want right now? Another buzzword, of course, 'digital transformation.' What they want is for IT to help actually transform businesses, right?
And if you're in IT, you feel that in the form of, "I want you to move faster. "Move faster, or we'll replace you," is what a lot of--what we're beginning to hear a lot too. Isn't part of--certainly with DevOps, and to your point a minute ago, when you don't know what the data is going to be-- when the data is changing and there's a lot of change-- that taking a more scientific approach to looking at the numbers instead of saying, "Oh, I know how to monitor these systems. They're static; they offer the same set of APIs, the same basic approaches." The data science is how you can kind of mitigate some of the anxiety when you start to move faster and you're changing in response to the business.
Yeah, yeah. I think if you were to over-rotate on the second word, the science, right? If I remember my basic American education, science is all around the scientific method of, like-- you're faced with some unknown problem thing, come up with a theory about it. How would you test that theory? Test your hypothesis? And then analyze what the results were and then start the cycle over again. So in that sense, if the approach you're taking is, I don't know how I should increase sales in Latvia, and at the same time run my data center effectively in Latvia. So these two different areas. What are theories I would have about how to do that? How would I experiment with it? And then analyze the result and see if I was right? In both cases, there is a certain amount of science to it. And having people focused on that loop that you're going through is an effective way to explore that. I mean, I think that's where the speed thing comes up. You probably want to do these experiments on a weekly basis. Because if it takes you 6 or 12 months to test out a theory, who cares? Your competitors have nailed Latvia and locked it up. That's why people want to move rapidly, so that they can do more and more experimentation to explore out all the sciences.
I think that's a great point, especially when you consider how multi-variable a lot of those experiments are in IT. You never get--in a pure scientific method, you can control all of the inputs. The idea being to make them repeatable. And how often in IT is your environment static?
Or it even gets six months to test? You're actually running multiple experiments that may impact each other at the same time.
Oh, yeah, yeah. You've got a Heidegger's cat-in-a-box, Schrodinger thing or something.
One last question. It is certainly valuable to stop and take a minute and consider how you look at your data. But of course, if you're responding to help desk tickets all day long, if you are behind on implementations and rollouts and understaffed, like a lot of people in IT are, how do you make the time? Assuming that you say, okay, a more scientific approach is valuable. How do you cut some time out? Because you can't go to management and say, "Hey, I'm going to set up an R and D lab "and do some data science."
Right. Right. They're going to say, "I'm already not back filling positions, so no." How do you do that? Is that something that someone thinks about something on the weekend maybe, or it's something they start reading about or they follow blogs? What's the best way to start?
Yeah. I mean, I think, given a realistic-- the scenario you've painted out-- the first thing you need to do is make an argument in terms of money to your bosses. At the end of the day, you need time. The problem you started with is the only problem to solve because you need time to figure this out. You need to say something. You need to prove—‘prove’ is the wrong word-- but you need to proffer the theory that if I spend time optimizing this help desk ticket thing, it will save us this amount of money or gain us this amount of money. And you present that to whoever is controlling how you spend your time at work and they either say yes or no. Similar to a scientific method, you need to give a report to them if your theory is right or not. So, you buy yourself the time to do it, right? And if you can't do that and you can't find a new job then you just suffer or you just live with it. So, there's that. Then, when you do the actual work, yeah--I mean, I think there probably is enough basic-- I wouldn't even call it science-- but just the way you would analyze something-- and just come up with a theory of, like, we have way too many help desk tickets and maybe if we automatically searched for a password, we could send an email that said, "Here's how to automatically reset your password." And if this decreases the amount of time we have to spend by 5 to 10%--that seemed to work-- let's go back to the bosses and say, "I'd like to do this on something else." And then you're like, "Hey, here's another one. Their badge doesn't work on the building." What's the theory of how we might automate that process?" And on and on and on. But again, it all starts with buying the time to actually do that kind of thing. Which you just basically have to know how to align things in PowerPoint and pitch, related to the sort of business result that you would want as a result of it.
To me, data science is taking a look at the data that the company's collecting, or many companies across the board are collecting, and how do we use that to create business innovations, or how do we save money for the business? So, not just looking at it as a pain point in infrastructure of how I back it up, but how do I really use it to analyze and save money and innovate? There's so many things that data science can mean, but that's what it means to me.
Right, and a lot of times, you know, we think that we don't know anything about data science, or hey, we just have systems that we monitor and manage. And the reality is we sit on huge piles of data and in some cases, we kind of almost are big data unto ourselves with IT.
Do you find that a lot of times, IT has insight-- can ask those questions that the business actually cares about in ways that maybe people who are more siloed in different parts of the business wouldn't?
Yeah, I absolutely think so. If you're touching it every day to some degree, you're actually taking a look at what's being put there. You know your business; you know what you're doing. We're in media, and media files are large and they grow. They're huge. Video files--
They never stop growing.
Yeah, never, I mean, we're talking, you go to a shoot and they're bringing back two terabytes of data within a week, and how do I deal with that? But really, how do I deal with it? Take it further; start thinking about, what is this data? Why am I storing it? And am I really taking a look at-- is there value that I can bring up that no one is seeing? And understanding your business, and what your business is doing, is probably the most important part for the IT personnel who is taking a look at the data. You have to understand the business before you can really take a look at data. For it to mean something to you; for it to bring value to you.
What's interesting is that you call out value, and that's one of those things where there's a lot of hype around big data. At least, the buzzwords are big data. Sort of a Hadoop all the things, right? That just becomes one more thing that IT has to manage.
Yes. Do you find that being an expert on actual data value is something that really is unique to IT? Where you can kind of balance the cost of 'we're going to save everything' or 'we're going to MapReduce everything' or 'we're going to calculate everything.' Instead, you can say, the questions that the business really needs to ask are X.
Yeah. So for instance, I can maybe use an example in my industry right now, in media. I had a problem, right? In infrastructure. Big data, right? Big data infrastructure means I need more SANS, I need more backups, I need more money. Really, I think it is a workflow process. How is this data being put there, why is it growing so fast? Instead of just putting more hardware and backing it up, why are we not asking the people who are putting it there, why are you putting it there? Is there a workflow in place? Why aren't we deleting it? Is this data important? Do I need to move it to an archive? I've gotten to some situations, even just recently, where it's completely full and I've put it on my users to take care of the problem. Not so much as my problem in IT, even though it is a problem in general, but it's a business problem, a workflow problem-- getting all of them together, to communicate across board. I have different brands, right? Different publications, different brands. And they all have different amounts of data that they're putting there. And they're all different areas. You've got luxury, you've got beauty, etc. So all different types of industries of information. I think what we've done really well also, just from working with the users and doing a workflow process, is that all that data that's going across the boards, they can bring value to each other. If you're using something for an ad that's in a luxury, new beauty design, you could probably marry it with somebody in the interior design world and kind of create business by selling ads. I think that has to go a little bit back to more someone really understanding that portion of the business. I think you can bring up those questions and say, "Hey, look, we can cross pollinate across all these brands." And they could probably bring value to the business by communicating what they're putting there.
But it's so funny, because you talk about, like, E-comm ad sales, user cross- media identification. Those are not the kinds of things that a lot of people in IT think that they would know anything about. And the reality is, domain specific knowledge is maybe one of the key assets that IT has.
Yeah, I think that it definitely is specific. But I think a lot of times people don't think that the IT person that's doing the infrastructure is someone who can come out into the business world and really take a look at the process and bring value. I think when you pull that person, typically, that's sitting in their office or their closet and they're writing code or they're working on systems or they're in the data center or they're rack mounting or they're up in the ceiling working on an axis point, or they're communicating for DevOp projects or they're talking about websites, E-comm, talking about fulfillment-- we're involved in the business every day. IT really should be brought in, I think, from the start. Sometimes we're like an afterthought. They don't really think--I think a lot of the times, "Hey, what about IT? "Why do I not have an IT person on this panel?" Because we're crafting a solution, we're building a business, and why do I not have an IT person telling me, "Hey, these things will work and these won't." Maybe we could shift you in this direction. Maybe you could leverage this department over here that could probably help you. Because we see all the functions. I have to manage the applications, I have it on my budgets, I have to do maintenance across the board. I think getting IT involved in those big projects would probably save companies major money on an afterthought of something they've already picked.
Okay, let's wrap up with this question. I think some of our viewers have had the same thing happen. You sort of have to earn the right to be viewed as a magician, right?
You do that by gleaning data. Like, really look at your IT data as a scientist and realizing that you kind of know what the questions are that the business would ask. Give us an example of a report or a business insight that you were able to give the business that really helped them move forward and they might not have expected it would come out of IT.
Well, there's a couple different examples I can think of. I'll go back to retail. So I obviously worked for City Furniture in retail for 10 years as a senior IT analyst. One of the projects that I found, that was really interesting by using data, was by installing these little boxes-- I don't remember the company--that would collect statistics and data about how fast somebody was going, how long their truck was sitting idle, and on all of these things, are data that are being collected on a box that's being put and stored someplace. So getting involved and bringing up questions to, "Hey look, this truck has been sitting for 45 minutes. "If you turned it off, it would probably save "45 minutes of fuel." It may sound very small at that moment, but then you add that up by a fleet of trucks that are going out 24 hours a day. You're talking 24 trucks that are on the road all day, 365 days a year, it starts to add value. Then, I think your business leaders will be like, "That is a great question, "and why aren't we thinking about that?"
Right, you come into the business, you're like, "Hey, I can affect the bottom line, "and I've already done the math for you and here it is." They have an Excel spreadsheet that they can drill into and actually see that that's real data. They just automatically ask the question, "Oh, so how did you come to get this data?" And you get to remind them, "No, we're IT. "This is what we do."
We do. That's what we look at. Another great example I would think of is something that maybe brought more of a staffing benefit. So sales teams, right? What do you deal with all day? You deal with--you want to make a sale. And then you have to have a document, right? A contract that somebody signs. And that contract was coming back. For us, we had six different brands and these contracts were going in six different systems, so how do you make it easier for accounting? It's not so much like thinking about the data as the raw data sitting; I'm going to analyze it. It's almost like, how am I getting this data into my systems and making the systems better? We're collecting all these contracts, they're all coming in, and we're sending them to six different locations with six different people doing the jobs. How do we see that from an IT person, I think is data science as well. I see a problem; we have contracts that are coming in all over the place. How do we make the process better and give them tools?
So is there a trick to logging out of the console, logging your brain out of that part of your job in IT, of administration, and then taking a moment to be more scientific, to be more analytical?
I think it happens every day, naturally.
It runs itself in the background.
I think it naturally happens as you see problems arise and you see them reoccurring over time. You hear people complaining about their hard points in their departments. You hear people coming to you for simple things, the simplest thing. They walk into your office and you think it's very simple but it's a problem. It's a problem that needs to be fixed. And it's not so much an IT issue; it's something that's occurring that people need to deal with. And they need to fix it within their departments.
So, data science, it's a good thing.
Phoummala Schmidtt, you know more about data science, I think, maybe than you think. It's a pretty ‘buzzwordy’ word, right?
It's a very intimidating word.
What makes it intimidating?
Data science. Just the word itself, it just seems so formal. But really, it's just data.
Stodgy. You have to have a special degree.
Outside of the normal realm of IT. Okay, so you were telling me a story earlier. This is part of an email migration to 2016, right?
2016, we don't need real servers. It's all fine, just flat.
That's a sour subject we're talking right now.
[Laughs] But you said that you have a user with 37 thousand...
37,000 folders, okay. That's going to have some effects on that migration.
Yes, it does.
For example, what are the extra complexities in having that many folders?
Well, when you're migrating a user mailbox, one version to the next or even just from one server to the other server, Exchange has to enumerate all the folders. It basically has to go through and says, okay, you've got this folder, you've got this folder. When you have 37,000 folders, that takes time. A long time. And sometimes you have to increase certain thresholds in your configuration files to allow just enough time to go through the folders. And then we had to make certain accommodations to enumerate this one mailbox.
Right, because you have scripts that you can run on thousands of mailboxes and it's fine, and then here this one chokes. The first question is, why does this user have 37,000 mailboxes?
We don't know why. But they apparently needed 37,000 folders. When we were trying to figure out why it was failing, data told us that there was a failure and it failed to enumerate folders. After digging in to some more data, that's where that 37,000 number came up.
So, you're looking at things like Folder Create Time. Was it some person that had this mailbox for 10 years and they had been lovingly creating each of these mailboxes by hand? Or where they coming in bursts, that maybe were programmatically generated? Or maybe some automated scripting that was running that you didn't know about?
It's a combination of everything. But this person was with the company for over 30 years.
Okay, well that would be it.
Yeah, so we just said, well, it's one folder for every year [laughs].
In that case, you would be using a scientific look at the data to just make a gross decision about whether you would maybe tell somebody, look, you've programmatically generated-- a developer, for example-- lots and lots of folders, and we're just not going to carry them forward. Or someone who was a valued, tenured employee who you might need to make sure that you invest enough time in assuring that they have a seamless migration experience because they're a really valuable part of the company.
Well, as any email administrator, just because they have a lot of folders doesn't mean they don't get migrated. [Laughter] Everybody gets migrated. So as the administrator, we have to find ways to accommodate that. So that's us modifying configuration files, re-looking at our data and seeing what we need to do to just modify the systems. She may not be the only person.
Right. In fact, we have several users that have many thousands and thousands of folders. So that takes us finding that data and saying, "What do we need to do with it?" And that's where...
Email generates just a little bit of data, right?
Just a tiny bit.
No huge transit logs. So how do you make sense out of that? Like log data, for example. It's almost like static, right?
Um, no. To me, it's white noise. I'm going to be honest with you. Because we get so much data, and especially with Exchange 2016, there is so much logging by default. In one day, you could generate up to 10 gigs of logs. That's a lot.
I mean, it's a lot, and it becomes white noise. You're just tuned out. You basically have to tell yourself, "What do I want from this data? "What do I feel is valuable from it?" From there, we're just going to extract what we need.
Then how do you find the time to start? You're generating 10 gigs per day, and you realize that there is something really valuable, especially in a migration to a new platform. Maybe there's novel issues, whether they're performance or actually failures that you've never seen before. How do you take the time to step back, and instead of looking at it with your, kind of, 'I need to fix this or just streamline it or move onto the next task IT' hat. How do you look at it more as a scientist and say, there is goodness within all of this white noise? How do I start to find it?
I mean, there are tools out there that can analyze your logs for you and then turn it into a nice, really pretty chart and graph. Not everyone has the resources and capabilities to get those tools. So PowerShell becomes your best friend.
PowerShell is becoming a good friend of mine, yeah.
PowerShell becomes everybody's best friend, because you can take the data from those logs and do something with it, if you're looking for something specific. Like I said, not everyone has the luxury of having some type of monitoring tool or any type of BI tool that can make good use, good valuable information for you. A lot of shops are mom and pop shops. I mean, if they're running Exchange on-premises, they're probably large organizations, because almost everybody is in the cloud now. But it does cost money for tools, so you know, that's where PowerShell comes in, and you run your scripts. You're only going to run them for what you really want to get out.
Right. So one more question: What's the biggest insight, sort of, surprise, that you were able to give management, backed by actual numbers and math that really affected whether investment in the business or the way that they were operating IT that really made a difference based on a scientific approach to looking at the data?
Um, I guess server performance. You're gathering your logs, you're gathering data from server performance, you know, and you're analyzing it. We found out that putting everybody in the one server all at once isn't the best thing to do sometimes. I know I found certain settings are requirements from the business, that they thought were requirements, was having a negative impact to the environment. We'd have to go back and say, "The decision you made for this particular setting, "what's the business justification for it?" Because it's doing X, Y, Z to the server. You know, it's increased load by 20 or 30% CPU. This is something we didn't account for.
So, then management says, "Oh, we're able to more effectively use our resources "and we don't have to buy as much hardware."
Ah, you know. Well, yes. But the impact is going to be slower performance for our end users. What would you rather want? [Patrick laughs] Slow email coming to your Outlook, or do you want acceptable performance? Because if you want acceptable performance, then we'd have to adjust those settings that you felt in the beginning were proper business requirements.
Oh, management wants both. Great performance and low cost.
It's a catch-22. It really is, because they just make it happen and you know-- IT, we shoot ourselves in the foot all the time. We always try to find ways to make things happen. And sometimes we have to say, "You know what? "This isn't the best thing to do." Right now, we can make those modifications and changes but a year from now, it's going to end up hurting us. So why don't we do what's best practice and what's best for the system and also the end users. And try to come up with, I guess, a medium point in your requirements. A lot of my time now is doing that because they want something and I know it's not good for the system. So we have to use the data that we have and say, "Here's what we can do for you." And everybody's happy.
What does data science actually mean to you?
I think it means a couple of things. And I think this is one of those things for IT people to maybe get a little bit scared of, because data science brings to mind this whole world of, you know, artificial intelligence and machine learning and algorithms and, you know, actual science. But from an IT guy perspective, it could also mean simply better understanding your environment through metrics and automated measurement. And improving how you're looking at things and how you're reporting things. So to me, I'm just going to throw all that stuff aside-- you know, the actual mathematics and data science, and think more in terms of, how can this help me to understand the environment better?
And communicate better about it to business, for example.
Yeah, exactly. That's more of the sysadmin take on data science. Of course, I guess there's another angle too, which is considering the value of the business data that you're storing and things like that. But I feel like, for most of us, most businesses are not really data-science businesses. Most businesses are doing whatever it is that they do. And the angle that they should take, I think, on data science is how can this help me better do whatever it is that we do, not how can I transform the business with my machine learning hoodoo?
Right. So if you look at the data, let's say, the base data that we start with or have ultimate root level control over, which is the metrics data that we're generating. I think we tend to be pack rats, right? We tend to collect a lot of it. Why do you think, in IT, we have a habit of persisting everything in terms of monitoring metrics forever?
Yeah. Well, because maybe it's going to be valuable someday. Maybe I need to know these things. The good news is that if you have that data, maybe you can actually do something with it. Now, the problem, I guess, is that most of us just have way too much data and not enough science, not enough analysis of that data. We just have data points. Yeah, you can look at some graphs and you can make some guesses based on the graphs. But for most of us, that's basically as far as it goes. I've got some metrics, I've recorded them in a database with timestamps, and huh, what's that?
Where does this actually start with? Does it start with the data? Is this, I've got to go learn maths? Or it's tools? Where do people start to actually do hard analytics on data?
I guess it would be more practical to focus on tools, because it's probably not realistic to think that we're going to be able to come up to speed on the magic and wizardry of mathematics in time to actually get anything of value out of it. Again, from my perspective, if I was a sysadmin, or an IT manager, I would be focused on, you know, is there some tool that can look at these logs, or these system metrics and extract some kind of useful information out of them? Whether it's retrospective information like, you know, this is how we're growing and this is how our needs are changing, or whether it's more proactive-- Watch out! We're-about-to- hit-a-wall kind of things.
If you go to management and say, "Hey, I want to take a data sciences approach to analytics." They are going to say, "You don't have time for that, "and it's not budgeted." Where do you start? How do you carve out time to start really considering the data of IT in a more scientific way, as more of an analyst as opposed to just, I'm fixing tickets that are coming off the help desk?
I think that's really the key to it. If you're managing systems in a reactive mode where it's all, you know, I've got this task, I've got to accomplish this task, how do I get this thing done-- then you're really never going get ahead of it in any way. If you start making time for yourself to try to get ahead of things, I think that's good. Whether you call it data science or data analytics or whatever is really, I guess, a political decision. I would say though that it is a very good idea for IT people to start trying to get ahead of themselves, ahead of their daily job, and try to figure out where is this stuff all going? How can I make connections between the day-to-day management tasks that I have and bigger picture items? Whether it's IT big picture items, like, when does the backup happen? And what are the busy load times on the database and stuff? Or whether it's actual business things.
More insight than aggregation.
So, what's really more useful, a PhD in math or years of hard won experience in operations in the way that the business actually works?
I think that if you have years of experience, you're going to come up intuitively with some judgements on what you're seeing. But the problem is that we rely, I think, too much on our historic experience and we tend to overlook new things that way. I think, indeed, that it's better to have a deep understanding of the business and it's better to have years of experience, but don't fall into the trap of thinking that because this is how things have always been that that's how things always are. It's very easy for us to just reject new findings in favor of, you know, what we've always seen. That's why I think that we need to maybe start inviting more of that mathematical analysis to the table.
Taking a scientific approach kind of keeps you off the rails for decision bias.
Yeah, yeah. It's science.
It's interesting that technology leaders not only don't dismiss data science as just more buzzwords, but they agree that it's a useful approach to finding new value in the data that they're already collecting. And it's not a surprise that a common theme from these conversations is that data is good and analyzing it scientifically is better. But then again, it's not a surprise that people like bacon. Bacon makes everything significantly better too. The challenge with IT data science isn't that we don't know that the data is good; the challenge is the same as it is for everything in IT-- Finding time and resources to experiment. Yes, there's an initial investment or at least some overtime to get started.
But hopefully, you also heard our experts say that the results are worth it. Improved operations performance and accelerated digital transformation is pretty abstract and not much of a motivator, but surprising management with unexpected insight that contradicts conclusion bias and backing that up by numbers that leadership understands. --That, dear THWACK friends, is science. Dig into your IT data. Let others know what you discover. You'll likely find that there's a data scientist, or at least a data hobbyist, already inside you, ready to help accelerate your IT career.
And that brings us to the end of today's keynote. Shortly, we'll begin our second day of tracks. Two quick notes. First, make sure you check out the session calendar on this page for details on today's sessions, and as always, you will need to be signed in to participate in THWACKcamp. If you're not logged in, the stream you're watching now will end after this keynote. But you're going to want to be signed in, because geeky prizes and giveaways are back this year, which we'll be doing live between sessions. And of course, all attendees get swag bags this year, so check out the THWACKcamp page for all those details. You can also earn points attending sessions and for each survey that you fill out-- so let us know how we're doing.
Again, it's great to have you with us today. We'll be back with our first session in just a few minutes.