Culture of Data Protection: Development Counts, Too.

In my earlier post in this data protection series, I mentioned that data security and privacy is difficult:

Building a culture that favours protecting data can be challenging. In fact, most of us who love our data spend a huge amount of time standing up for our data when it seems everyone else wants to take the easiest route to getting stuff done.

Today I'll be talking about one of the most insecure and unprotected ways we harm data during our development processes. And I'm warning you: these are all contentious positions I hold. I'm experienced (old) enough to stand up for these positions. I know I'm going to get flak for stating them. I'm ready. Are you? Let's start with the most contentious one.

USING A BACKUP OF PRODUCTION DATA FOR DEVELOPMENT AND TEST PROCESSES IS WRONG

In all my years of presenting to audiences about data protection, this is the position where we get the most disagreement. I understand those positions. But it is still my opinion that taking a backup and restoring to development and test environments is wrong.

Development environments are notoriously less secure than production ones. In the old days, developers connected to servers to do their development. Most of those services were located in a data center and at least had those protections. Now I'm much more likely to see developers working on their own laptops, often personally owned ones. These are often shared devices, with little or no enterprise protections because developers can't be constrained by such overhead and governance during development.

Because developers are system administrators of their local environments, they often remove most of the security features within their development environment to speed up development and test processes. Encryption, row level security, data masking, and other database features are likely to be removed.

Unlike a production environment, there aren't any data access monitoring and alerting solutions. Developers and DBAs tend to memorize data to facilitate their dev and test processes. They know the interesting customer data, transactions, and financial information that they use over and over for testing. In fact, they often have the data itself memorized. The very thing they would not be allowed to do in production they do every day in the dev environment that hosts production data.

Developers and DBAs tend to treat development data with less respect than production data because "it's just development." If something goes wrong, we can just rebuild it. The challenge with this thinking is that it's a dev environment with production data. It's not just development. This is one of the reasons we have recently seen a spike in data breaches due to IT professionals sharing their dev data in unsecured cloud storage buckets. Or they email production data, log it into bug systems, share it on a file server. Carry it around on unsecured thumb drives. "It's just test data, after all." No, it's not. It's still production data, even though you've moved it to a test environment.

"USING PRODUCTION DATA IS THE ONLY WAY WE CAN HAVE TEST DATA"

When I recommend that teams stop using production data for development, the first thing they say is that this is the only test data they have. Yes, that's true. But it isn't the only way. We could develop test data based on the test cases we are going to run. That way we can test all the features of the applications, not just the features our current customer set reveals. We could generate more test data based on that data. We could require test case writers to create test data. Yes, that takes time.

I believe our industry should be developing a set of what I call lorem ipsum data for People, Places, Things, Events, Transactions, etc. This dataset could be crowdsourced and curated by the entire community. Yes, this will be hard work. Yes, it will require a lot of curation. No, it will be not much fun. But, think of all the places your own personal data is located. How comfortable are you that your data is being used for development and test on perhaps millions of laptops around the planet? How comfortable are you that someone has chosen your home address and the list of your family as the one record they have memorized for running their test-driven development processes?

THERE ARE SOME EXCEPTIONS

I do believe there are valid cases where production data should be used in special testing scenarios. First, if you are doing a data migration project, at some point you still need to do a test migration using a subset of your production data. These tests would be just prior to completing the actual migration. But that production data would still be in its secured formats; it would not be used for the earlier dev and test processes.

Another example is when you are testing the application of new security features in your database environment and you want to see if there are any cases where your existing data or design has challenges implementing those features. Again, this test would be done late in the process.

There may be other use cases where testing of production data is required, but none of them should form the bases of using production data for general development and test methods. You can test them on me in the comments.

FINALLY...

I've come to this position after decades of watching people's data being treated with disrespect because it's a faster way to do lesser testing (more on this “lesser testing” in a future post). That should seem wrong to all of you, and I know a few people who share these positions with me, even if I'm overwhelmingly outnumbered. One of the key things that has changed over the last few years is news of data breaches due to exposure of production data in development environments. I don't believe the number of breaches has increased; I believe the number of breach disclosures has increased due to data privacy notifications. With GDPR, we will also have significant fines and even jail terms due to sloppy data practices. How much do you really love that job that you are willing to risk this?

I recommend you start the discussions about generating real test data now. At least you will have shown you tried to take steps to protect data. Your defense lawyer will thank you.

Thwack - Symbolize TM, R, and C