Optimizing the Data Lifecycle – A Monitoring with Discipline Series
Data is the new gold to be mined, analyzed, controlled, and wielded to create disruption. The value of data driven decisions are driving the next generation of services. Data can be utilized to frame any story. Be careful that you are not framed by your data.
Industry experts Stephen Foskett and Karen Lopez join Head Geeks Kong Yang and Thomas LaRock to discuss the challenges of the data driven era and share best practices to optimize the consumption of that data.
Howdy folks, welcome to Optimizing the Data Lifecycle. A Monitoring with Discipline series. I'm your host, Kong Yang. SolarWinds Head Geek. And I'm honored to introduce this distinguished panel today to talk to you about optimizing the data lifecycle. We have Karen Lopez, @datachick on Twitter.
Stephen Foskett, Chief Organizer of Tech Field Day and Gestalt IT. And Thomas LaRock. @SQLrockstar on Twitter and my fellow Head Geek. Welcome, everybody.
So, let's jump right into this talk. The data lifecycle and optimizing it. There's so many definitions of data lifecycle because everybody's appending data to everything. Data analytics, big data, data monitoring, data management. What exactly is the data lifecycle, Karen?
That's a good question. I was going to ask you that, as well. [Laughing] So, traditionally it's been create, read, update, and delete. And then we've refined that over the years because we don't get to delete data anymore. And it's really taking on kind of a new trend of not just data in one application, where we think about that CRUD matrix. But data, as it becomes collected or acquired and makes its way through all the systems integrations, enterprise service buses, wherever it goes and then goes onto screens that are now multiple sizes and data visualizations, all that good stuff. So it really is the path of the data in addition to how it's treated along the way.
Awesome. How about you, Stephen?
I'm with her. Yeah, she—it's totally true. That's absolutely it. And from my perspective, as more of a storage admin, more of a storage person, one of the things that became very true very quickly to me was that managing storage and managing data are two very different things. And that the storage lifecycle and storage management and so on is a very different discipline than the data aspect of it. Because as anyone who's seen a Hollywood movie knows, you can't just say, "I have the data." There's a difference. You can move data, you can copy data, you can transmit it without necessarily handling the physical storage media. So, I think it's important for people to realize that aspect, that difference between storage and data.
For me, a data lifecycle, of course, I think we all believe it tries to describe something that is a cradle to grave. But as we know, there is no grave for data. It lives forever. It persists. There's only so much information in the universe, right? Information is neither gained nor lost. It just all seems to be collecting on this one point in the universe called Earth right now.
That's very Zen.
Yes, we are surrounded by it and immersed in it more and more with each passing day. So, it's things like the curation of data the collection, the custodian. All of those words that you use to describe every part of the process as data flows in and out of your work environment.
Okay so, we have very interesting and unique perspectives in here, because what you mentioned, Tom, is a platform for said data. You know, Stephen you've kind of bridged the two. You've talked about the underlying infrastructure services that ties the zeros and ones. And Karen, you talked about data from a perspective of a DBA, right? You're managing those relationships those columns, those rows. And so forth. As we know, technology is changing. The application is changing, right? The application lifecycle is changing. We lived in a world that was physical and then it went virtual. And within that virtualization and instantiation, you had apps that live for months to years. But took a while to deploy. It might be a few weeks to a few months to deploy onto that. And then you go to the cloud instantiations. It may take you a few hours just to provision it with a credit card and whatnot. And it might live for a few hours to a few weeks, right? Then you have services and containers, spinning out micro milliseconds. All of that is generating data. And so the data is coming at, by definition, big data, right? High volume, a lot of variety, high velocity. There's the fourth aspect, depending on whom you may subscribe to. A veracity of data, as well. What are the challenges that you guys see not only yourselves, but customers running into?
Big data specifically?
Around data in general. And it could cover the whole gambit. It could be big data. It could be data analytics.
Well, I think, from a big data perspective it's really interesting because again, from my perspective in IT infrastructure how I define big data, what they mean by that is essentially, data you can't handle. Or data I can't handle. And to me, the difference between you know, big and not big is whether it can be— whether it can live on conventional old school data center systems, right? Can it be on one server? Can it be on a finite SAN that I buy one day and use for three years, you know? This is anything that fits into that box to me is not big data. And is not cloud and is not anything. It's just traditional, conventional stuff. Whereas anything that can't has got to be big. And then I have to change everything I do because it just can't function. Whether it's for capacity reasons or performance reasons or application reasons. If it can't live on conventional systems then I need to change everything about how I approach that data problem.
So that's an interesting point of view. And I see the big data question— it's all of what you said, but also things like, we're answering data questions now that we just literally couldn't before. So, you know we talk about the Vs for big data, but the real thing is that we, none of our conventional systems or processes or error handling was used to dealing with just streaming data, coming in on a fire hose. So we have sensors we're monitoring, even in traditional businesses. But we have new data problems we can solve because we're able to bring in that data. But it changed how we thought about it. Because most of the data we worked with before was transactional. We didn't want to write half a transaction or record bad data. We kept it out, and in a lot of big data solutions, we're saying yeah, bring me the bad data too. And yeah, so a week's worth of data is missing. Okay, we have a way of dealing with that later. So it is a different problem statement. But I'll give you that there's a lot of hype for it. And their performance was the number one reason why it started to catch on. So the technologies to increase the performance of it. But it really is a new type of data problem as well. Even with all the hype in the term.
So my question for you and maybe you know, both of you who understand this stuff better than me is, is it—like, what's the cart and what's the horse here? Is it a new thing because it's a new thing? Or is it a new thing because it didn't work on the old stuff that we always had?
You know? I mean, do you have to approach this data differently just because we couldn't handle it?
So, I've said more than once that to me, big data is nothing more than a marketing term made by some company that's trying to sell you products and solutions. And I've had push back from people saying no, big data does—I go, yes. Do you know who had a big data problem? NASA--when they put a guy on the moon, right? You know who has a big data problem? Was when we had to break codes during WWII. Those were big data problems. We actually built computers to do that. We just didn't call it big data back then. So it is—it's the same problem just coming at us in a slightly different way and now we're trying to apply technology in order to solve it. And for me, the biggest problem with all of this is just the sprawl, right? Data now is everywhere. This is what we see with our customers, is they have to manage this environment that is almost unmanageable at times. Data is coming from everywhere, not just cloud, right? But it's also—somebody just signs up for some service somewhere and says, give me that data and give me a list of these customers so I can build a new report for somebody else. And then, now that's an asset for the company. And data is the most critical asset for any company. And as a result, now with all the sprawl, it just becomes harder and harder for you to manage your environment. So now, you have look for ways to make that a little bit easier. So to me, that's what I believe we see as some of the biggest problems.
So, how does one tackle that? Because you guys are talking about process shift, technology shift, a shift in probably one's role and responsibilities as well. What do we tell all the DBAs out there? All the data IT professionals who used to be storage admins. You know, we had this conversation last year at THWACKcamp around storage ad, you know the ever-changing role of the storage admin. Well, what can one do to fortify their career?
Well, that's a big question. So one of the things that hasn't changed in all this new focus on data, years of data, is data quality is still an issue. Data protection is still an issue. The security of it. So, there are a lot of people who are specializing in the technology and tools where there are new tools that are going great things. But security, privacy, and quality were kind of left behind in tackling the how to solve the fire hose of data coming in and the sprawl of the data and everything. So what I'm seeing is, for people who want to advance their career, is they need to understand that getting the wrong answer faster might cause problems. And getting a lot more wrong answers faster could be a career-limiting move. So, that's one of the things that I see with these new technologies. Like the old technologies, like relational databases and all of those things, we've learned to build those into those engines. And some of the new technologies, they just want you to do it in your application. And we learned that in the transactional world, we can't do it that way and still protect the data. So I always tell people that one of their job roles that isn't on their job description is keeping their CIO and CEO out of jail. And protecting the data and making sure that it's valid data, and you're not reporting the wrong person, or arresting the wrong person, or selling something for the wrong price. That's all-important.
And those are great points because it's a double-edged sword, right? We want to reduce the barrier of consumption, right? The friction to consume a service and application and so forth. And in some cases, that service is a data service in there. But the security compliance issues are challenges that IT professionals deal with. Stephen, you've been talking about the infrastructure services in those pieces. What challenges have you seen come up through because of the impending technology shifts that are coming through? Both on the storage side and you know, there's buzz worthy terms like ‘hyper converge,’ infrastructure. That's my next-gen storage or let me get a blob in Azure or S3Bucket. Because that's somebody else's storage they have to manage, but I can just dump my objects, my files there.
Well, there's definitely a lot of technology changes happening right now. And one of the things that you know, for infrastructure people to keep in mind, is that some of the old rules of what we used to do no longer apply simply because of technology changes. So, now that we can have scale-out storage. Now that we can have non-relational massive databases. And now that we have things like localized storage and flash storage and things like that. A lot of the optimizations we used to need to do we may not need to do anymore. And it means that we can approach our technology very differently. You know, one thing in particular is it used to be that we were really optimizing for capacity. Everything we did was to try to conserve space. Not because space was so expensive, necessarily, because of course, it was. But because it was so hard to make systems grow. Well now, with scale-out systems, now that we've kind of punted on you know, ACID and we allow systems to be eventually consistent; you can have a system as big as you want. Optimizing for capacity no longer makes as much sense as it used to. And it's the same with performance. We used to have an absolute limit to storage performance because of spindles. Well, now we have flash. We don't have to worry about that stuff anymore. So it means that we have a lot of new technologies open to us that can allow us to turn our attention to something different. And hopefully turn our attention more to the business and less toward trying to keep everything running.
Or you have people applying the old rules to the new technology, that doesn't help.
Stephen did go old school on us. ACID, for those you out there: Atomicity, Consistency, Isolation, and Durability.
Of data, yes.
Yes, in there.
Show off. [Laughing]
What are your thoughts, sir? What are your thoughts? I mean, we have those challenges coming down the pipe.
My thoughts on?
How is that affecting the data professional's job? Has it made it better? Has it made it worse?
Yeah, it's definitely made it more challenging. I wouldn't say it's made it worse. I would just say it's more challenging. It's more—they also have more opportunity, right? So when you think of how the volume of data and the sprawl that's happening and everything that we see all these companies trying to manage these days--what that means is you have to be able to do more with less, because they're not going to suddenly increase head count by 10X over the next few years just to manage these databases. They're going to find a way to automate or to buy as a service something in order to keep payroll down, right? They're just going to have to. So there's lots of opportunity where you can be the one to step forward and say, all right, there was this process. If I go back to my Sig Sigma training. Here was the process. Here are so many different loop backs. And I was able to streamline it and automate it and it should run fine. And if it breaks every now and then, I'll be the one to fix that. But for the most part, I won't be spending this amount of time on a weekly, monthly, yearly basis. And now I can take that time and now I can do something a little more, say, human-oriented. So machines are really— Computers are— How does that saying go? Computers are pretty stupid because all they can do is give you answers. It's that humans are the ones that can ask the questions. So let the machines do what they're good at. They want to just process through whatever it is you're doing. And now for databases, that's redoing the indexes, updating the stats, backups, restores, all of that stuff. Making sure everything is working as expected. That's all stuff that gets offered now as a service. So there's opportunity for you to take that on to help make things in your company a little more streamlined. To look to the cloud for some services that will help you. And that frees you up, gives you an opportunity to move into a maybe a slightly different role. Maybe one more focused on security. Maybe one more focused on better storage, right? I think it's the old idea that if technology is moving at such a fast pace, it pays to be more of a generalist. You need to know a little bit about a lot of things as the technology keeps moving. So I think that's the opportunity for data professionals right now.
Awesome. The thing that has struck me with your guys' response is that it all highlights things like, you mentioned automation. You mentioned scale; you mentioned security. And things of this nature. It's those things that based on SolarWinds IT Trends Report; folks are going to a hybrid IT model because they're going to that Azure service or as I call it, AAS. You know, Agility, Availability, Scalability model. To take advantage of those said features because the things that you mentioned— ease of use, right? Easier to consume said services. And the one thing that caught my attention is you said we have to ask better questions of our data. So let's segue into that piece. You talk about the evolution of the data professional. Asking better questions, that usually leads towards some sort of AI machine learning, networks, deep learning-type models. What can our database professionals, our data professionals who are in IT right now, do to better themselves?
So, one of the exciting things for me is the advent of machine learning, AI cognitive services that are available. So in the olden days, we used to just apply some very basic statistical queries on databases to try to profile the data. So don't just ask the DBA what the data means. Don't ask the headers on the spreadsheet. Actually, go look at the data to see what it tells you. But I was really limited in the amount of time I had to do it. And all of that. But now I can point services that can say things like, what's the sentiment of this data? Do we have comments about customers in the CRM system that might need to be phrased a little bit better? Do we have employee feedback that is overly skewed in one direction or not? But even looking at, you know, what's the distribution of our addresses for our customers and how many of these addresses are likely fake? I mean, there's just all kinds of great stuff that we can do. Actually talking to the data. I find that really exciting. And then we can ask our databases the same questions. Like, what's keeping you happy? What's making you unhappy? Balancing workloads, any of that stuff. So we get to use those tools for our work as well as for the, how the business sees the data.
Yeah exactly, and data science is not just for those highfalutin business questions, right? It's not just for--you know--how can we be transformative and develop new services? It's also for, how can we make the system work better? The whole system. And how can we make the system more autonomous? And as we move forward in this new world, I think that's important. You're going to have to take a step back and take your hands off the keyboard and kind of let the system run itself. And in order for that to happen, we're going to see a lot more artificial intelligence. You know, kind of machine learning applied to systems management. And it's pretty exciting for me to see how companies are starting to aggregate system information, performance information across customers and across the entire, you know--across the entire universe of their users in order to improve the functioning of each of those systems. And that, I think, is— it's really exciting to a systems administrator/operator you know, kind of person. But it's also a little bit scary because you're saying, "Wait a second." So you know everybody's experience is going to impact my system. But of course, everybody's learning is going to benefit my system too. So I think that's all good.
So, when I think of the types of skills that I would tell like, I'd want data professionals or DBAs to be a little more involved in. What I try to think of is, some of the stuff we talked about during the soft skills presentation. We mentioned empathy. That's an understanding of what somebody else is doing. So I can recall being a younger person. But as a DBA with a handful-of-years experience, I knew the difference between good and bad design. And you come across a system and you find this table and you're having trouble doing your administrative tasks and you say, so why does this table have 450 columns? That seemed wide and it seemed silly to me. It's like, these people don't know database design. And you start thinking that you know everything. Or at least you know more than whoever built this thing. And then I would tell you, go build yourself your own little machine-learning project and you'll understand why something might be 450 columns wide. And that's perfectly fine. And it's getting the job done for them. And so if you learn a little bit about a lot, you start to learn about how people are using the data. And that helps you be a better administrator of that data. It helps you be better in your role when you start figuring out why this one group is doing things that seem all wrong. It's not just because they don't know any better. It's because this is how that function is and they're trying to use the tools that the company is giving them. You can help them and that will be a little bit different than the help you need to give to the people who are doing the accounting in the back office, right? Or the front office traders, or whatever it is you're trying to support. You have to support different teams in different ways. And I think when you start opening up to the idea that you may not know everything and there's different use cases for all this data, it can make you a better professional in the end.
Awesome. Because usually, and I love what you all have stated there. Because usually when one thinks of the data lifecycle, they think of the speeds and feeds. They think of tuning queries. They think of, how do I optimize my relational database? Or how do I tune the underlying system so that I can hit X number of transactions per second and still meet the business needs there. But the business need is changing, as you all have mentioned here. Because you have to ask the proper questions. Because the business is going to take that, those results, and hopefully disrupt the industry right? Let's talk about practical tips that one can leverage that our THWACKcamp attendees can put into practice to become better database or data professionals. Any of you, you guys have mentioned quite a bite of things.
How much time do we have? [Laughing] Because I have about three days of this.
As much time as you need. And let's cover it from a multitude of acts...
Speak quickly, let's go.
Because I mean, you're absolutely right. You touched on security and compliance. You've touched on the new technologies coming in. How to consume that and still make the data work for you. Tom, you've talked about automating and scaling that. And changing the questions that we ask of this. So, that's a lot to ingest, right? If I'm out there and I may deviate and I'm like holy cow, are they telling me to learn AI, ML neural networks, and deep learning? Or where do I start? Do they want me to become a security ops professional and learn compliance? You know, HIPAA, and how long am I supposed to retain data and protect that? Am I supposed to learn about NVME? And all these new storage constructs that are coming down the pipe, in addition to what cloud service providers are? So, what practical tips would you guys give for our attendees?
I don't want to go first. Take them all.
All right, you win.
Ah yes, so okay. So, the answer is yes.
Yeah, all of those.
That was a long question.
Basically all of those. Here's the short of it, is you have to do something. It doesn't matter what it is. But something you enjoy. I'm not going to tell somebody to go learn HIPAA because they have to know HIPAA. But I am going to tell somebody that they should pay attention to the industry trends. And when they see clouds and billions of dollars being spent on clouds and AI and machine learning and things like that, have an awareness of where that technology might be headed. So, if they think they're going to specialize in a particular area of data that is right now, being automated away, might not be the best career choice. Could be just fine, though. They could make a living forever because they would be the one.
We still have Cobalt program.
We still have people that program for Voyager sitting in NASA. They've been there for 50 years, right? So I mean, you could be that--that's fine. And if you enjoy it, there's nothing wrong with that. But if you really want to say, hedge your bets for the future, you're going to want to know a little bit about a lot. And I would focus on areas of data security. No question in my mind, if somebody in my business comes to me and says, "Can you tell me how the encryption works for our data?" You better be able to explain that. Can you—is this data encrypted or not? What tier is this? Is this protected, so on and so forth. Data security is one area. I would tell you data analytics is another one. That doesn't mean that you have to be a data scientist. It doesn't mean I'm going to tell you, you have to do machine learning. What I mean is, I want you to understand how people are using PowerBI and Excel and Tableaux in your company right now. Where's the data coming in from? Where's it going to? Is it sitting on a USB stick on a train somewhere? That ties into security. So understand who's doing the analytics, how they're doing it. How can you help them? Maybe they're struggling with a report that runs for an hour, and it's something you could help and it would run at only five minutes. But you two haven't really talked. Right, that's part of the empathy in the soft skills as well. So for me, the two big areas I have people focus on right now: security and analytics.
Awesome. How about you, Stephen?
Well, yeah I'd say— from my perspective, you're right, you can't expect to be a master of everything. But I think it would be critical for anyone involved in this data field to understand the challenges of scalability. Basically, making systems grow large. And why that's a problem. The hint is coordination, that's the challenge. It's easy to build a massive number of independent things. It's very difficult to build a massive single thing that all works together. So whether you're talking about storage or servers, or database, or data platforms, or scalable web applications, or whatever. It's all about overcoming scalability. Overcoming scale, overcoming the challenges of coordinating distributed systems. So I think that would be a very important thing to learn about. And then I think the other specific things to investigate are the changing nature of storage in IT as we move from discs to flash. As we move from flash to storage-class memory. That's something to look into. Just Google "storage-class memory," you'll figure it out. [Laughing] Look at how that impacts the design of systems. Whether we need centralized systems anymore. Whether we need RAID anymore. You know, look up erasure coding and understand what that means. Because these technologies are going to impact the future of systems generally and specifically, the future of whatever systems you're going to be designing for the next 10 or 15 years.
Yeah, so glad you guys covered all that.
We covered everything, right?
So yeah, everything. I agree with all them, of course. But the one thing I've been giving people advice on is, a lot of people have been positioning the question as do we stay with on-prem data or do we go to the cloud? And that's the wrong question. It's not and/or. It's a hybrid thing. And I don't even mean that one application will be hybrid. But it would be really easy and affordable for most people to backup to the cloud as maybe a second place to backup. Or your off-site backup. Or to play with that. And people need to learn the lingo and the new way of thinking. Like, you brought up scalability, to understand the nature of it. Why distributed systems are different, and you think about them differently and why your architects need to think of those things differently. Because as we start automating some of the tasks--and that's already happening and many more are going to be automated, to let the computers do the stuff that they're good at. They're accurate; they make very few typos. They do exactly what you tell them, which sometimes is a problem. But is that someone needs a lot broader knowledge these days, because even if you're specialized in Oracle or SQL Server or DB2, now there's a bunch of options for that. Whether it's a cloud base, whether it's database as a service. There's even now ‘data as a service,’ and that's in a lot of places. We had that before; they were called ‘Import.’ And Update, and now there's true data as a service. Go ask for this data for this thing and bring it in and now it's much faster. So now, we can do it in real time. So, as things become offered as a service, that's a completely different way of thinking about the cost, benefit, and risks of deploying them. And when someone sitting in a conference room saying, oh I think we ought to use this thing because it's new, and you don't know what it is, you're not going to be able to contribute to that conversation. You're just going to be known as the person that specializes in this one widget, fixing it every day. So, cloud thinking or scalability thinking, whatever you want to do, big data thinking, whatever it is--is that even if you think you're not passionate about it, you won't ever do it, you're going to be a specialist in what you're really good at, you still need to know these things in order to fit your part in, and defend if you think it's right for the company to do that. So there's the cloud thinking about it. And then I can't say enough about automation of— If you don't automate some of your tasks, they'll hire someone who will, because that person will be able to do a lot more. And you know, people saying a lot of my job can't be automated. I get that, but even simple things can be really automated. And part of the reason is so that you're not messing up deployments or something like that. Those are my big things this year.
Karen, thank you for those practical tips on that. And panel, thank you for walking us through how to put into practice the normalization of data before you start to optimize that data lifecycle. Now it's time for us to get into the R&R of the data lifecycle. And I'm not talking about rest and relaxation, even though we're here at THWACKcamp. I'm talking about the fun stuff: retention and retirement. And with that—oh, we're going to go to Mr. Tom LaRock. [Laughing]
Retention, retirement. Yeah, data never dies, right? [Laughing]
It just smells that way.
Yeah, where does it go when you erase it off the disc, right? It goes to the recycle bin in the sky. One of my struggles as a DBA was the idea of proper archiving. Because to me, at some point, data shouldn't be old. You might still want it around, but you might want to get it out of the database. Because if you run a query and if you don't need all 20 years’ worth of sales data, you just need one year's worth, maybe instead of just thinking about all these different ways to get that done--what if that old data could just go somewhere else? Like a reporting database somewhere else that you didn't need to touch every time you're running these other queries. So, I worked for a little startup and we tried doing real proper archiving for data. So when I got into this other company it was kind of a shock to me to see that data just never died. It just stayed in the systems forever. And it would just make things slow over time. And you could try different things but it was always a Band-Aid. The real solution was to archive. Some data had to be retained. Some data needed to be retired. And the argument you get back always is, "Oh, but if I need it, it has to be there." How often do you need it? Maybe once a year. If it took me two hours to recover it from off storage, nope, not good enough. It has to be here. Really, tell me more about this requirement of yours. Because it really sounds like it's just something you want, not necessarily something you need, right? And do you know the cost of this? And it just becomes an administrative overhead for you. So, I've always joked that data never dies. There's very rarely true archiving or retention or retirement policies in place. You probably see that too for storage. Storage is just ridiculous. I've had plenty of storage admin come to me and say, the databases are kind of big. Can you just get rid of some data? And you kind of laugh like, what do you want me to do? Just drop a few tables on the business? Like, they won't notice? I can't just do that. Plus, it's already allocated, then I'd have to shrink. I'd have to do all these other things in order to reclaim the space. It's just an administrative nightmare. It's just easier to do nothing and to just leave it there.
In fact, it's probably better to keep it. And this is one of those crazy things. So yeah, I mean think about it. There's a couple of aspects here. So number one, as you said, data never dies. Because data is moveable. Because data is not a physical thing, right? You can destroy the thing, but can you really destroy the data? As we've seen in the news, and as we've seen in fiction, somebody else might have a copy of it. And so I actually, I did litigation readiness consulting for a couple years with companies. And it was funny because the general council— when I would sit down at the initial meeting, the general council's opinion was delete that stuff. Let's get rid of it. We don't want it around because if it's around, than they can discover it and they can see all the bad stuff we've been doing or not. So it's better to get rid of it. But once we discussed it and once we talked about the fact that data can live on and that somebody else might have a copy of it. I mean, you're talking about email. You should never delete email. Why? Because email, by definition, two people got it, right?
At least two people got it. And it went out there, right? Isn't it better to have it than not have it?
So you can refute it.
And isn't it better to have it in context than have it just what the opponents have produced and so on. And so it was funny because it would go from delete, delete, delete to oh ******, don't delete, don't delete. Never delete, never delete. So that's one aspect. And so then, the question is archiving. Okay well, so, I mean I guess the definition of archiving would be moving data from the primary production location to some other location where it could still be accessed if needed, right? The problem with that is that if you— if you archive it somewhere else, can you be sure that it's still protected? Can you be sure that it's still accessible? I once had to recover, I think it was, just five-year-old data off of tapes. And in order to do that, I had to get a tape drive. I had to get a server. I had to get storage. I had to get an operating system that was way out of date. I had to get a backup system that was out of date. I had to restore the data then I had to get a database system that would be able to basically access that. Because we didn't— You can't be sure what you're going to be able to access in the future. And so there, again, I've gone from saying archive, archive, archive to saying, just keep it alive. Just keep it alive and hope that Moore's Law and the expanding and— expansion of storage will allow me to continually have that data available. Because if you think about it, a truism of storage is that any system you buy today will be twice as big of every system you've ever bought in the past combined. Which means there will be space. The challenge is, is managing it and keeping it alive and keeping it around. Basically, I refute the R&R. [Laughing] There's no retirement. [Laughing]
There's no retirement for data.
It never dies.
You're going to work forever.
But the other lawyers you should be talking to are the people who do compliance. So, in certain jurisdictions, you are required by law to forget somebody if it's PII, if it's protected data.
Certain jurisdictions. And you don't always know if you have data that belonged to people that live or work or were in that. It's a tough problem, right?
How do you fix that?
Well, so there are some people in some jurisdictions. They either have completed this or were starting to. It's not only do you need to forget them in your production system, in your reporting system, but you have to go find them in your backup systems and delete that stuff. Can you imagine? Do you know how many companies?
Good luck with that.
I know. You know, how many companies are going to say, "Yep, we did it." [Laughing] Right, but...
Yup, we tried.
So this whole, the end of the lifecycle— and by the way, I think data does go to the rainbow bridge because to play with all the other deleted data, both data that was lost and never recovered.
Oh, data. Is that like going to your dryer and you never see it again?
Yeah, it's like a sock. Because we have this data...
It is like a sock. Because it's still going to be sitting around.
Well we know, we have this data retention problem where things get deleted when they weren't supposed to be and we can never get them back. And some data, we just really want to get rid of and we find out—so, I mean, I'm going through something now where I'm trying to work with my bank and I have to find a copy of my mortgage and I can't find my original mortgage. So I said, but it was with you, so you guys go get it. And they're like, “We're a big company. We can't keep this data all the time.” [Laughing] And I'm like, “Great! I want my money back from the mortgage.” But anyway, if I think the data retention problem, it's got legal issues. It's got customer service issues. It's got business continuity issues. Because you relied on a team of people and they all left or hit the lottery and aren't working anymore. It's not just as easy as setting the archival setting on a database or the backup setting on a database. It really is. You have to understand the data. And like Tom said, how it's going to be used? How often? What the workload is? Like, I'm excited about the technologies that we have now from inside databases where your app doesn't know, the spread sheets don't know, that some of the data is off in the cloud, in an archive, and some of it's here in your production database. And the database system engine itself is the one that goes and gets the data, the older data, and brings it up. Because that means it's always going to be at least in a somewhat usable format. But not cluttering up your local queries. I think it's going to be a combination of all this stuff. Understanding your data, understanding what the--to keep your CIO and CEO out of jail. Of knowing how much storage costs and what format things are in. We've all personally probably have been hit with, "There's that floppy disc. I know the pictures that are on that."
How can I read this thing?
I have a Hi-8 camera with some really nice videos on it.
I can't even read a CD. [Laughing]
I found a CD the other day and I was like, what am I supposed? I can put it in my Xbox. The only thing in my house at this point.
So, essentially you guys have highlighted the challenges with data and data retirement in there. There's tech inertia—tech debt, certainly, in Stephen's example in there. Karen's talked to the fact that accountability of that data and the compliance of it. Tom talked about the archival process. I'm just happy that you didn't say, "Snapshots equals backup." [Laughing]
Now why would I say?
Why would he do that?
Why would I say that?
I know you would not say that. [Laughing] That's why I brought it up. That's why I said that.
Stephen would say it.
Don't be casting these persons at me. [Laughing]
But that is, therein lies the challenge with retention and retirement of data. Because it has to persist through in certain situations. And in some other situations, you would like it to go away but when you need it, do you have all the proper technologies? All the proper tools to pull it back out and make your point right? Because that data is still, probably has some use if you need to go back and get it in there. So, we've walked through the lifecycle of data. From the creation thereof, the aggregation. All of the data appended to analysis. Big data and so forth. To retention and retirement of data. What I would say is that, it seems to me, a key theme in there--that you guys have talked through--is Tom, to your point of empathy. You have to empathize with the data. You know, embrace the inner zeros and ones, right? And use it for your purpose.
Stephen has talked about how integrating that with the incoming technologies, right? Because you're going to be able to do more. To your point on doing more with less, Tom. You're going to be able to do more with the underlying infrastructures in there. So, how can you make that data work efficiently for yourself? And then Karen, you hit on data security and compliance. And it goes to your two key points, Tom, of security, data security--And what was your other point?
Security and analytics.
The big "A" word with data in there. I'd like to thank the panel here. Karen, thank you. Stephen, thank you. Tom, thank you for sharing your knowledge on data. Because I've seen how you guys have evolved your careers. And we've shared enough stories. We spent enough time with each other to see each other grow in our careers in there. And I think at the end of the day, for the THWACKcamp attendees, that is part of the data lifecycle. Is how to tie in your career and make it viable as the data lifecycle changes in there. For THWACKcamp, I'm Kong Yang. Thank you for joining us.