In the past year I have earned a couple of certificates from the Microsoft Professional Program. One certificate was in Data Science, the other in Big Data. I’m currently working on a third certificate, this one in Artificial Intelligence.
You might be wondering why a database guy would be spending so much time on data science, analytics, and AI. Well, I’ll tell you.
The future isn’t in databases, but in the data.
Let me explain why.
Databases Are Cheap and Plentiful
Take a look at the latest DB-Engines rankings. You will find there are 342 distinct databases listed, 138 of those are relational databases. And I’m not sure that’s a complete list, either. But it should help make my point: you have no idea which one of 342 databases is the right one. It could be none of them. It could be all of them.
Sure, you can narrow the list of options by looking at categories. You may know you want a relational, or a key-value pair, or even a graph database. Each category will have multiple options, and it will be up to you to decide which one is the right one.
So, a decision is made to go with whatever is easiest. And “easiest” doesn’t always mean “best.” It just means you’ve made a decision that allows the project to move forward.
Here’s the fact I want you to understand: Data doesn’t care where or how it is stored. Neither do the people curating the data. Nobody ever stops and says “wait, I can’t use that, it’s stored in JSON.” If they want (or need) the data, they will take it, no matter what format it is stored in to start.
And the people curating the data don’t care about endless debates on MAXDOP and NUMA and page splits. They just want their processing to work.
And then there is this #hardtruth - It’s often easier to throw hardware at a problem than to talk to the DBA.
Technology Trends Over the Past Ten Years
Let’s break down a handful of technology trends over the past ten years. These trends are the technology drivers for the rise of data analytics during that time.
Business Intelligence software – The ability to analyze and report on data has become easier with each passing year. The Undisputed King of all business analytics, Excel, is still going strong. Tableau shows no signs of slowing down. PowerBI has burst onto the scene in just the past few years. Data analytics is embedded into just about everything. You can even run R and Python through SQL Server.
Real-time analytics – Software such as Hadoop, Spark, and Kafka allow for real-time analytic processing. This has allowed companies to gather quality insights into data at a faster rate than ever before. What used to take weeks or months can now be done in minutes.
Data-driven decisions – Companies can use real-time analytics and enhanced BI reporting to build a culture that is truly data-driven. We can move away from “hey, I think I’m right, and I found data to prove me right” to a world of “hey, the data says we should make a change, so let’s make the change and not worry about who was right or wrong.” In other words, we can remove the human factor from decision making, and let the data help guide our decisions instead.
Cloud computing – It’s easy to leverage cloud providers such as Microsoft Azure and Amazon Web Services to allocate hardware resources for our data analytic needs. Data warehousing can be achieved on a global scale, with low latency and massive computing power. What once cost millions of dollars to implement can be done for a few hundred dollars and some PowerShell scripts.
Technology Trends Over the Next Ten Years
Now, let’s break down a handful of current trends. These are the trends that will affect the data industry for the next ten years.
Predictive analytics – Artificial intelligence, machine learning, and deep learning are just starting to become mainstream. AWS is releasing DeepLens this year. Azure Machine Learning makes it easy to deploy predictive web services. Azure Workbench lets you build your own facial recognition program in just a few clicks. It’s never been easier to develop and deploy predictive analytic solutions.
DBA as a service – Every company that makes database software (Microsoft, AWS, Google, Oracle, etc.) is actively building automation for common DBA tasks. Performance tuning and monitoring, disaster recovery, high availability, low latency, auto-scaling based upon historical workloads, the list goes on. The current DBA role, where lonely people work in a basement rebuilding indexes, is ending, one page at a time.
Serverless functions – Serverless functions are also hip these days. Services such as IFTTT make it easy for a user to configure an automated response to whatever trigger they define. Azure Functions and AWS Lambda are where the hipster programmers hang out, building automated processes to help administrators do more with less.
More chatbots – We are starting to see a rise in the number of chatbots available. It won’t be long before you are having a conversation with a chatbot playing the role of a DBA. The only way you’ll know it is a chatbot and not a DBA is because it will be a pleasant conversation for a change. Chatbots are going to put a conversation on top of the automation of the systems underneath. As new people enter the workforce, interaction with chatbots will be seen as the norm.
There is a dearth of people that can analyze data today.
That’s the biggest growth opportunity I see for the next ten years. The industry needs people that can collect, curate, and analyze data.
We also need people that can build data visualizations. Something more than an unreadable pie chart. But that’s a rant for a different post.
We are always going to need an administrator to help keep the lights on. But as time goes, on we will need fewer and fewer of them. This is why I’m advocating a shift for data professionals to start learning more about data analytics.
Well, I’m not just advocating it, I’m doing it.