cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Culture of Data Protection: Development Counts, Too.

Level 12

In my earlier post in this data protection series, I mentioned that data security and privacy is difficult:

Building a culture that favours protecting data can be challenging. In fact, most of us who love our data spend a huge amount of time standing up for our data when it seems everyone else wants to take the easiest route to getting stuff done.

Today I'll be talking about one of the most insecure and unprotected ways we harm data during our development processes. And I'm warning you: these are all contentious positions I hold. I'm experienced (old) enough to stand up for these positions. I know I'm going to get flak for stating them. I'm ready. Are you? Let's start with the most contentious one.

USING A BACKUP OF PRODUCTION DATA FOR DEVELOPMENT AND TEST PROCESSES IS WRONG

In all my years of presenting to audiences about data protection, this is the position where we get the most disagreement. I understand those positions. But it is still my opinion that taking a backup and restoring to development and test environments is wrong.

Development environments are notoriously less secure than production ones. In the old days, developers connected to servers to do their development. Most of those services were located in a data center and at least had those protections. Now I'm much more likely to see developers working on their own laptops, often personally owned ones. These are often shared devices, with little or no enterprise protections because developers can't be constrained by such overhead and governance during development.

Because developers are system administrators of their local environments, they often remove most of the security features within their development environment to speed up development and test processes. Encryption, row level security, data masking, and other database features are likely to be removed.

Unlike a production environment, there aren't any data access monitoring and alerting solutions. Developers and DBAs tend to memorize data to facilitate their dev and test processes. They know the interesting customer data, transactions, and financial information that they use over and over for testing. In fact, they often have the data itself memorized. The very thing they would not be allowed to do in production they do every day in the dev environment that hosts production data.

Developers and DBAs tend to treat development data with less respect than production data because "it's just development." If something goes wrong, we can just rebuild it. The challenge with this thinking is that it's a dev environment with production data. It's not just development. This is one of the reasons we have recently seen a spike in data breaches due to IT professionals sharing their dev data in unsecured cloud storage buckets. Or they email production data, log it into bug systems, share it on a file server. Carry it around on unsecured thumb drives. "It's just test data, after all." No, it's not. It's still production data, even though you've moved it to a test environment.

"USING PRODUCTION DATA IS THE ONLY WAY WE CAN HAVE TEST DATA"

When I recommend that teams stop using production data for development, the first thing they say is that this is the only test data they have. Yes, that's true. But it isn't the only way. We could develop test data based on the test cases we are going to run. That way we can test all the features of the applications, not just the features our current customer set reveals. We could generate more test data based on that data. We could require test case writers to create test data. Yes, that takes time.

I believe our industry should be developing a set of what I call lorem ipsum data for People, Places, Things, Events, Transactions, etc. This dataset could be crowdsourced and curated by the entire community. Yes, this will be hard work. Yes, it will require a lot of curation. No, it will be not much fun. But, think of all the places your own personal data is located. How comfortable are you that your data is being used for development and test on perhaps millions of laptops around the planet? How comfortable are you that someone has chosen your home address and the list of your family as the one record they have memorized for running their test-driven development processes?

THERE ARE SOME EXCEPTIONS

I do believe there are valid cases where production data should be used in special testing scenarios. First, if you are doing a data migration project, at some point you still need to do a test migration using a subset of your production data. These tests would be just prior to completing the actual migration. But that production data would still be in its secured formats; it would not be used for the earlier dev and test processes.

Another example is when you are testing the application of new security features in your database environment and you want to see if there are any cases where your existing data or design has challenges implementing those features. Again, this test would be done late in the process.

There may be other use cases where testing of production data is required, but none of them should form the bases of using production data for general development and test methods. You can test them on me in the comments.

FINALLY...

I've come to this position after decades of watching people's data being treated with disrespect because it's a faster way to do lesser testing (more on this “lesser testing” in a future post). That should seem wrong to all of you, and I know a few people who share these positions with me, even if I'm overwhelmingly outnumbered. One of the key things that has changed over the last few years is news of data breaches due to exposure of production data in development environments. I don't believe the number of breaches has increased; I believe the number of breach disclosures has increased due to data privacy notifications. With GDPR, we will also have significant fines and even jail terms due to sloppy data practices. How much do you really love that job that you are willing to risk this?

I recommend you start the discussions about generating real test data now. At least you will have shown you tried to take steps to protect data. Your defense lawyer will thank you.

22 Comments
MVP
MVP

Good article

Level 16

Thanks for the write up! I have seen that going on for years and the auditors rarely ever go through the test environment to see what kind of data is stored there.

Valid points all. Glad to see you were pragmatic enough to admit that data migration tests should involve real data.

And here's your catch - "curated and crowdsourced" to arrive at a homogenized, useful dataset. It's not about comfort, really - it's about the fact that with such a small subset of the population of folks who *could* curate and crowdsource, getting buy-in and contribution would be even harder work than actually building the common-use dataset itself. You've been in this business a long time, so you've met your share of devs and DBAs. Is that really the population you think of when you think 'disparate sources coming together to arrive at consensus and actually make something for the use of all'?

I get what you're saying, and I agree with you - but I don't see that happening.

Level 13

Good Article

Level 20

It is true that developers often don't like security because of the hit they take because of it.

Let's go back and set this right-side-up instead accepting it up-side-down.  I think part of the philosophies that may be widely accepted must be changed instead of accommodated.  Specifically:  Dev and Test should have the same Security requirements as Prod, instead of having little or no security, simply because they aren't Prod.  That just doesn't seem like a safe practice.

Especially when someone puts Prod data into either one, for convenience, instead of creating phony data.

  • We should not be allowing Dev & Test environments to be built with security requirements that are less than Prod's security.  Especially if ANYTHING associated with Prod is included in Dev or Test.
  • Let's not support an environment for Dev and Test that doesn't match the rigid security requirements of Prod.  Developing or supporting a culture that has the idea of Dev & Test environments not needing the same security as Prod doesn't get us a great Dev or Test, which probably results in slowdowns and broken flows in Prod when Security is applied.  How can a business expect their apps & data to work and flow properly with the required security in Prod when the flows and apps were designed and tested without that same security?
  • When we don't budget enough funds for training / resources / people / time to do the builds with the correct and identical security that Prod requires, we're planning to fail.
  • Industry standards (PHI, PCI, HIPAA, SOX, etc.) should NOT allow or accept a Dev or Test environment to exist outside of the same security restrictions required for Prod.  Even when the data in Test or Dev is phony.

Sure, it's harder to do it right--if "doing it right" means applying "Prod-level Security" to Dev and Test.  It takes a lot longer when Security is the foundation and the goal, instead of having faster creation of Dev & Test.  But we're constantly told "There's always time to do it right."

I offer my apologies for the opinion and the apparent irrationality of it on the surface.  Yes, it goes against everything developers want and assume, against all their traditions and customs.  Who can blame them?  Who would WANT to have full security in place before Dev and Test are started?  It would only slow things down, right?  Yet that philosophy leads to products built without enough security, products that are vulnerable, all for the sake of speed to get something into Dev and then into Test quickly.

I think that philosophy still isn't appropriate.  Security needs to be first, to take precedence over development and testing.   I'm sorry for taking a stance that makes life harder for Dev & Test folks.  Yet it remains the right thing to do:  Secure Dev and Test the same way Prod is secured.  Even when Dev or Test do NOT include any relationship to data and permissions and accounts in Prod.

It's sort of like that old saying "Raise up your children in the way they should go, and they will not stray from that path."  That's not bad advice, even if it's not always accurate.  It helps make good people.  Why not treat Dev & Prod as your "children", and raise them up in the path they must follow when they mature to "Prod"?

Level 12

That's a good point.  Auditors SHOULD be writing orgs up on this. practics.

Level 12

Well, I like to refer to my points of view as being a "cynical Pollyanna".  A hybrid.

I specifically focused on the basics of data.  People, places, things, and events. One could even introduce subsets of data (Indonesian people who have no family names, places with no mailing address. People with no middle names.  Events with no known start date.  Klingon names and titles, etc.)  Then those could be used or ignored based on business requirements.

It's not so much of a universal agreement.  But a crowdsourced curation that a particular subset met the needs of some requirement to represent a dataset.

Level 12

Which hits?

Level 12

Since you've started with "what's wrong", I'll come back with "no, it's not wrong".

Dev environments: Modern dev environments often have less security. Many security tools and licenses aren't available for the 450 dev machines in a company.  Not saying it's right, but it's real.  Just like not all dev environments have the same scale of data (please, please, please someone send me a 35 PB storage laptop with 512 GB of memory.)

Testing for the full security suite: this is often done in QA and other pre-production testing environments that are post-dev and pre-prod.  Think scalability and availability testing.  Think load testing and performance testing.  Is it the most optimal-wise for development tasks? No, because some issues found at this time have to go back all the way to design to be fixed.  But the COST of giving 450 devs their own PB test machine and enterprise licenses for everything in the stack will likely be too high.  I'm not in any way saying these things should not be tested.  But for certain they won't be tested in an individual dev's sandbox laptop.

I agree with you on the last point about compliance issues. And from what I can tell, the entire industry ignores that.  I don't see them buying every dev a complete stack (hardware and software), so my point is DON'T USE PRODUCTION DATA IN AN UNPROTECTED ENVIRONMENT.   It is an issue with SOX, GDPR, PIPEDA and more.  Giving devs a super secure stack STILL doesn't make that access to data they don't need access to go away.

If you personally can get all the devs provisioned with a complete stack, then do that.  And then for all the other reasons I gave in my article, don't let the devs use production data to do their dev and dev testing. 

Now, what I did miss?

Level 12

In earlier years, we used shared dev environments.  In those cases, the environments could more closely resemble production.  But it's been years that we've had highly distributed development environments with a greater focus on shared repositories and smoke testing, not functional testing until later in the process. There are costs, benefits and risks of both environments.  But I don't see us going back to shared dev environments any time soon. 

Even in the narrow SQL Server world a DBA or dev can use SSMS, Code, Visual Studio or a host of other tools for development.  They can use third party tools and even homegrown tools.  Devs and DBAs check in their code to Github or some other shared repository after running it through a test.  No one cares about their code until then.  We've shifted some of the testing and work to other places in the dev cycle. That's why we can't mimic production everywhere. 

I reworded some of my phrases & paragraphs & bullets to read better, to make more sense to me.

I can accept your point of view.  I'd LOVE for no actual "Prod" data to be in Test or Dev.

My philosophy is that building Dev and Test should be started only after the full security that's applied to Prod has been put in place on Dev and Test.  This must happen before Dev and Test are built, before they have any data--real or phony.

This speaks to the article's title:  "Development Counts, Too."

I think my takeaway from the article is that some folks may always put real data into Dev & Test, when they know they ought not.  So at least lock Dev & Test down with the fully security that Prod has.  Then you've done your due diligence from the Security point of view, and we can start working to change the culture that wants to use real data in Dev and Test simply because it's convenient.

Does that make better sense?

I'll borrow some words from a Television Captain to his ship's Doctor after she reads his mind and realizes he's picked an arbitrary path without real knowledge of the right direction to go:   

"I may be wrong, but I'm not uncertain."

http://memory-alpha.wikia.com/wiki/Attached_(episode)

Level 13

I think I need to agree with rschroeder​ on this.  Unless and until DEV environments are regularly audited with the same rigorous standards as PROD environments to guarantee there is no prod data in the DEV environment,  it needs to be locked down the same or possibly even more than the PROD environment.  I say more because I know DEV folks that NEVER work under limited accounts, but accounts with full ADMIN rights.  Even that concerns me.  Admin rights should only be used for Admin tasks, even in DEV and code development.  I know that is a hit to the development folks, but I for one don't want my personal data at risk in an environment that isn't secure.  If anyone thinks about that risk, they shouldn't want their own data unsecured there either.  It is out there at risk badly enough in many production environments.  Unsecured data in a DEV environment is still mistreating the data in my mind.

There's a difference between mimicing production everywhere, and applying the same security posture everywhere. I think our viewpoints differ.

Level 16

Same has been my experience. The Dev folks usually have elevated rights in that environment so they can access the data in ways they can not in production.

Level 12

We share the same concern; we just look at the solution as taking different paths.  I think making extra copies of production data is riskier, even with tall those security measures in place (and I know those security measures will be removed just as soon as they are point a dev laptop.  Just like all the database constraints are removed when a dev works on her local dev database.

My stance is that if we stop making hundreds of copies of production data, then that data is safer.

Level 12

What do you mean by "mimicking" production?

Level 13

I do agree with not making hundreds of copies of production data.  And ideally dummy data is what dev should be using in their riskier environment.  I still think it might be a good idea to audit those environments since we know that these things can or are happening.

Data, that is.

Level 12

If you mean mimicking data to be using non-prod data that has a similar profile as production data, I have no issue with that.  But it would still need to be enhanced to include data that needs to be supported but currently does not exist in production data.

Level 12

I guess the cynical side of me says organizations will never pay for all that, so it's cheaper and easier to use constructed test data and then no production data is at risk.  Of course audit for real data being in the wrong place.

Now that you've italicized it, I agree.

About the Author
Data Evangelist Sr. Project Manager and Architect at InfoAdvisors. I'm a consultant, frequent speaker, trainer, blogger. I love all things data. I'm an Microsoft MVP. I work with all kinds of databases in the relational and post-relational world. I'm a NASA 2016 Datanaut! I want you to love your data, too.