1 2 Previous Next

Geek Speak

23 Posts authored by: datachick

Thomas LaRock and Karen Lopez fighting an octomonster to protect servers.

In my previous posts about Building a Culture of Data Protection (Overview, Development, Features, Expectations) I covered the background of building a culture.  In this post, I'll be going over the Tools, People, and Roles I recommend to successful organizations.




Given the volume and complexity of modern data systems, teams have to use requirements, design, development, test, and deployment tools to adequately protect data.  Gone are the days of "I'll write a quick script to do this; it's faster." Sure, scripts are important for automation and establishing repeatable processes.  But if you find yourself opening up your favorite text editor to do design of a database, you are starting in the wrong place. I recommend these as the minimum tool stack for data protection:


  • Data modeling tools for data requirements, design, development, and deployment
  • Security and vulnerability checking tools
  • Database comparison tools (may come with your data modeling tool)
  • Data comparison tools for monitoring changes to reference data as well as pre-test and post-test data states
  • Data movement and auditing tools for tracking what people are doing with data
  • Log protection tools to help ensure no one is messing with audit logs
  • Permissions auditing, including changes to existing permissions
  • Anonymous reporting tools for projects and individuals not following data protection policies


These tools could be locally hosted or provided as services you run against your environments.  There are many more tools and services a shop should have deployed; I've just covered the ones that I expect to see everywhere.



The people on your team should be trained in best practices for data security and privacy.  This should be regular training, since compliance and legal issues change rapidly. People should be tested on these as well, and I don't mean just during training.


When I worked in high-security locations, we were often tested for compliance with physical security while we went about our regular jobs. They'd send people not wearing badges to our offices asking project questions.  They would leave pages marked SECRET on the copier and printers.  They would fiddle with our desktops to see if we noticed extra equipment.  I recommend to management that they do this with virtual things like data as well.


As I covered in my first post, I believe people should be measured and rewarded based on their data protection actions. If there are no incentives for doing the hard stuff, it will always get pushed to "later" on task lists.



I'm a bit biased, but I recommend that every project have a data architect, or a portion of one.  This would be a person who is responsible for managing and reviewing data models, is an expert in the data domains being used, is rewarded for ensuring data protection requirements are validated and implemented, and is given a strong governance role for dealing for non-compliance issues.


Teams should also have a development DBA in order to choose the right data protection feature for ensuring data security and privacy requirements are implemented in the best way given the costs, benefits and risks associated with each option.


Developers should have a designated data protection contact. This could be the project lead or any developer with a security-driven mindset. This person would work with the data architect and DBA to ensure data protection is given the proper level of attention throughout the process.


Quality assurance teams should also have a data protection point of contact to ensure test plans adequately test security and privacy requirements.


All of these roles would work with enterprise security and compliance.  While every team member is responsible for data protection, designating specific individuals with these roles ensures that proper attention is given to data.



Given the number of data breaches reported these days, it's clear to me that our industry has not been giving proper attention to data protection.  In this five post series, I couldn't possibly cover all the things that need to be considered let alone accomplished.  I hope it has helped your think about what your teams are doing now and how they can be better prepared to love their data better than they have in the past.


And speaking of preparation, I'm going to leave a plug here for my up-coming THWACKcamp session on the Seven Samurai of SQL Server Data Protection.  In this session, Thomas LaRock and I go over seven features in Windows and SQL Server that you should be using.  Don't worry if you aren't lucky enough to use SQL Server; there's plenty of data protection goodness for everyone. Plus a bit of snark, as usual.


I also wrote an eBook for SolarWinds called Ten Ways We Can Steal Your Data with more tips about loving your data.


See you at THWACKcamp!

White cubes with black letters spelling out "MY DATA" on black background

We've talked about building a culture, why it applies to all data environments, and some specific types of data protection features you should be considering.  Today, we'll be considering the culture of protection the actual owners of the data (customers, employees, vendors, financial partners, etc.) expect from your stewardship of their data.


Data owners expect you will:


  • Know what data you collect
  • Know the purpose for which you collected it
  • Tell them the purposes for which you collected the data
  • Be appropriately transparent about data uses and protection
  • Use skilled data professionals to architect and design data protection features
  • Document those purposes so that future users can understand
  • Honor the purposes for which you collected it and not exceed those reasons
  • Categorize the data for its sensitivity and compliance requirements
  • Document those categorizations
  • Track data changes
  • Audit data changes
  • Version reference data
  • Use strong data governance practices throughout
  • Protect non-production environments just as well as production environments
  • Prioritize data defect fixes
  • Make the metadata that describes the data easily available to all users of the data
  • Know the sources and provenance of data used to enhance their data
  • Secure the data as close as possible to the data at rest so that all access, via any means, provides the most security
  • Mask the data where needed so that unintentionally disclosure is mitigated
  • Back up the data so that it's there for the customer's use
  • Secure your backups so that it's not there for bad actors to use
  • Limit access to data to just those who have a need to know, know it
  • Immediately remove access to their data when staff leaves
  • Do background checks, where allowed, on staff accessing data
  • Test users of data regularly on good data hygiene practices
  • Ensure data quality so that processes provide the right outcomes
  • Ensure applications and other transformations are done correctly
  • Ensure applications and other transformation do not unintentionally apply biases to outcomes of using their data
  • Provide data owners access to review their data
  • Provide data owners the ability to request corrections to their data
  • Provide data owners the ability to have their data removed from your systems
  • Monitor third-party data processors for compliance with your data security requirements
  • Secure the data throughout the processing stream
  • Secure the data even when it is printed or published
  • Secure data even on mobile devices
  • Use strong authentication methods and tools
  • Monitor export and transfer of data outside its normal storage locations
  • Train IT and business users on security and privacy methods and tools
  • Protect user systems from bad actors
  • Monitor uses of sensitive data
  • Monitor systems for exploits, intrusion attempts, and other security risks
  • Securely dispose of storage hardware so that data is protected
  • Securely remove data when its lifecycle comes to an end
  • Accurately report data mis-uses and breaches
  • Treat their data as well as you'd protect your own


And after all that:


  • Actively steward the data, metadata, and data governance processes as business and compliance requirements change


Sound overwhelming? It should. We need to think of data as its own product. With a product manager, data models, metadata repository, a business user portal about the data products, and all the process that we put in place to protect code. Reread the list, changing the word data to code. We do most of this already for applications and other code. We should, at the very least, provide the same sort of process for data.


Your customer might not know they need all those things, but they sure expect them. I'd love to hear other expectations based on your own experiences.

Red and black letter tiles spelling out data protection terms on a black background

As I explained in previous posts on building a culture of data protection, we in the technology world must embrace data protection by design:


To Reward, We Must Measure. How do we fix this?  We start rewarding people for data protection activities. To reward people, we need to measure their deliverables.


An enterprise-wide security policy and framework that includes specific measures at the data category level:

    • Encryption design, starting with the data models
    • Data categorization and modeling
    • Test design that includes security and privacy testing
    • Proactive recognition of security requirements and techniques
    • Data profiling testing that discovers unprotected or under-protected data
    • Data security monitoring and alerting
    • Issue management and reporting


Traditionally, we relied on security features embedded in applications to protect our data. But in modern data stories, data is used across many applications and end-user tools. This means we must help ensure our data is protected as close as possible to where it persists. That means in the database.


Data Categorization


Before we can properly protect data, we have to know what data we steward and what protections we need to give it. That means we need a data inventory and a data categorization/cataloging scheme. There are two ways that we can categorize data: syntactically and semantically.


When we evaluate data items syntactically, we look at the names of tables and columns to understand the nature of data. For this to be even moderately successful, we must have reliable and meaningful naming standards. I can tell you from my 30+ years of looking at data architectures that we aren't good at that. Tools that start here do 80% of the work, but it's that last 20% that takes much more time to complete. Add to this the fact that we also do a shameful job of changing the meaning of a column/data item without updating the name, and we have a lot of manual work to do to properly categorize data.


Semantic data categorization involves looking at both item names and actual data via data profiling. Profiling data allows us to examine the nature of data against known patterns and values. If I showed you a column of fifteen to sixteen digit numbers that all had a first character of three, four, five, or six, you'd likely be looking at credit card data. How do I know this? Because these numbers have an established standard that follow those rules. Sure, it might not be credit card numbers. But knowing this pattern means you know you need to focus on this column.


Ideally we'd use special tools to help us catalog our data items, plus we'd throw in various types of machine learning and pattern recognition to find sensitive data, record what we found, and use that metadata to implement data protection features.


Data Modeling


The metadata we collected and design during data categorization should be managed in both logical and physical data models.  Most development projects capture these requirements in user stories or spreadsheets. These formats make these important characteristics hard to find, hard to manage, and almost impossible to share across projects.


Data models are designed to capture and manage this type of metadata from the beginning. They form the data governance deliverables around data characteristics and design. They also allow for business review, commenting, iteration, and versioning of important security and privacy decisions.


In a model-driven development project, they allow a team to automatically generate database and code features required to protect data. It's like magic.




As I mentioned in my first post in this series, for years, designers were afraid to use encryption due to performance trade-offs. However, in most current privacy and data breach legislation, the use of encryption is a requirement. At the very least, it significantly lowers the risk that data is actually disclosed to others.


Traditionally, we used server-level encryption to protect data. But this type of encryption only protects data at rest. It does not protect data in motion or in use. Many vendors have introduced end-to-end encryption to offer data security between storage and use. In SQL Server, this feature is called Always Encrypted.  It works with the .Net Framework to encrypt data at the column level and it provides the protection from disk to end use. Because it's managed as a framework, applications do not have to implement any additional features for this to work. I'm a huge fan of this holistic approach to encryption because we don't have a series of encryption/decryption processes that leave data unencrypted between steps.


There are other encryption methods to choose from, but modern solutions should focus on these integrated approaches.


Data Masking


Data masking obscures data at presentation time to help protect the privacy of sensitive data. It's typically not a true security feature because the data isn't stored as masked values, although they can be. In SQL Server, Dynamic Data Masking allows a designer to specify a standard, reusable mask pattern for each type of data. Remember that credit card column above? There's an industry standard for masking that data: all but the last four characters are masked with stars or Xs. This standard exists because the other digits in a credit card number have meanings that could be used to guess or social engineer information about the card and card holder.


Traditionally, we have used application or GUI logic to implement masks. That means that we have to manage all the applications and client tools that access that data. It's better to set a mask at the database level, giving us a mask that is applied everywhere, the same way.


There are many other methods for data protection (row level security, column level security, access permissions, etc.) but I wanted to cover the types of design changes that have changed recently to better protect our data. In my future posts, I'll talk about why these are better than the traditional methods.

In my earlier post in this data protection series, I mentioned that data security and privacy is difficult:


Building a culture that favours protecting data can be challenging. In fact, most of us who love our data spend a huge amount of time standing up for our data when it seems everyone else wants to take the easiest route to getting stuff done.


Today I'll be talking about one of the most insecure and unprotected ways we harm data during our development processes. And I'm warning you: these are all contentious positions I hold. I'm experienced (old) enough to stand up for these positions. I know I'm going to get flak for stating them. I'm ready. Are you? Let's start with the most contentious one.



In all my years of presenting to audiences about data protection, this is the position where we get the most disagreement. I understand those positions. But it is still my opinion that taking a backup and restoring to development and test environments is wrong.


Development environments are notoriously less secure than production ones. In the old days, developers connected to servers to do their development. Most of those services were located in a data center and at least had those protections. Now I'm much more likely to see developers working on their own laptops, often personally owned ones. These are often shared devices, with little or no enterprise protections because developers can't be constrained by such overhead and governance during development.


Because developers are system administrators of their local environments, they often remove most of the security features within their development environment to speed up development and test processes. Encryption, row level security, data masking, and other database features are likely to be removed.


Unlike a production environment, there aren't any data access monitoring and alerting solutions. Developers and DBAs tend to memorize data to facilitate their dev and test processes. They know the interesting customer data, transactions, and financial information that they use over and over for testing. In fact, they often have the data itself memorized. The very thing they would not be allowed to do in production they do every day in the dev environment that hosts production data.


Developers and DBAs tend to treat development data with less respect than production data because "it's just development." If something goes wrong, we can just rebuild it. The challenge with this thinking is that it's a dev environment with production data. It's not just development. This is one of the reasons we have recently seen a spike in data breaches due to IT professionals sharing their dev data in unsecured cloud storage buckets. Or they email production data, log it into bug systems, share it on a file server. Carry it around on unsecured thumb drives. "It's just test data, after all." No, it's not. It's still production data, even though you've moved it to a test environment.



When I recommend that teams stop using production data for development, the first thing they say is that this is the only test data they have. Yes, that's true. But it isn't the only way. We could develop test data based on the test cases we are going to run. That way we can test all the features of the applications, not just the features our current customer set reveals. We could generate more test data based on that data. We could require test case writers to create test data. Yes, that takes time.


I believe our industry should be developing a set of what I call lorem ipsum data for People, Places, Things, Events, Transactions, etc. This dataset could be crowdsourced and curated by the entire community. Yes, this will be hard work. Yes, it will require a lot of curation. No, it will be not much fun. But, think of all the places your own personal data is located. How comfortable are you that your data is being used for development and test on perhaps millions of laptops around the planet? How comfortable are you that someone has chosen your home address and the list of your family as the one record they have memorized for running their test-driven development processes?


I do believe there are valid cases where production data should be used in special testing scenarios. First, if you are doing a data migration project, at some point you still need to do a test migration using a subset of your production data. These tests would be just prior to completing the actual migration. But that production data would still be in its secured formats; it would not be used for the earlier dev and test processes.


Another example is when you are testing the application of new security features in your database environment and you want to see if there are any cases where your existing data or design has challenges implementing those features. Again, this test would be done late in the process.


There may be other use cases where testing of production data is required, but none of them should form the bases of using production data for general development and test methods. You can test them on me in the comments.



I've come to this position after decades of watching people's data being treated with disrespect because it's a faster way to do lesser testing (more on this “lesser testing” in a future post). That should seem wrong to all of you, and I know a few people who share these positions with me, even if I'm overwhelmingly outnumbered. One of the key things that has changed over the last few years is news of data breaches due to exposure of production data in development environments. I don't believe the number of breaches has increased; I believe the number of breach disclosures has increased due to data privacy notifications. With GDPR, we will also have significant fines and even jail terms due to sloppy data practices. How much do you really love that job that you are willing to risk this?


I recommend you start the discussions about generating real test data now. At least you will have shown you tried to take steps to protect data. Your defense lawyer will thank you.

Game tile spelling out "DATA"

Building a culture that favors protecting data can be challenging. In fact, most of us who love our data spend a huge amount of time standing up for our data when it seems everyone else wants to take the easiest route to getting stuff done. I can hear the pleas from here:


  • We don't have time to deal with SQL injection now. We will get to that later.
  • If we add encryption to this data, our queries will run longer. It will make the database larger, which will also affect performance. We can do that later if we get the performance issues fixed.
  • I don't want to keep typing our these long, complex passwords. They are painful.
  • Multi-factor authentication means I have to keep my phone near me. Plus, it's a pain.
  • Security is the job of the security team. They are a painful bunch of people.


…and so on. What my team members don't seem to understand is that these pain points are supposed to be painful. The locks on my house doors are painful. The keys to my car are painful. The PIN on my credit card is painful. All of these are set up, intentionally, as obstacles to access -- not my access, but unauthorized access. What is it about team members who lock their doors, shred sensitive documents, and keep their collector action figures under glass that don't want to protect the data we steward on behalf of customers? In my experience, these people don't want to protect data because they are measured, compensated, and punished in ways that take away almost all the incentives to do so. Developers and programmers are measured on the speed of delivery. DBAs are measured on uptime and performance. SysAdmins are measured on provisioning resources. And rarely have these roles been measured and rewarded for security and privacy compliance.


To Reward, We Must Measure


How do we fix this? We start rewarding people for data protection activities. To reward people, we need to measure their deliverables.


  • An enterprise-wide security policy and framework that includes specific measures at the data category level
  • Encryption design, starting with the data models
  • Data categorization and modeling
  • Test design that includes security and privacy testing
  • Proactive recognition of security requirements and techniques
  • Data profiling testing that discovers unprotected or under-protected data
  • Data security monitoring and alerting
  • Issue management and reporting


As for the rewards, they need to focus on the early introduction of data protection features and service. This includes reviewing designs and user stories for security requirements.


Then we get to the hard part: I'm of a thought that specific rewards for doing what was expected of me are over the top. But I recognize that this isn't always the best way to motivate positive actions. Besides, as I will get into later in this series, the organizational punishments for not protecting data may be so large that a company will not be able to afford the lack of data protection culture we currently have. Plus, we don't want to have to use a prison time measurement to encourage data protection.


In this series, I'll be discussing data protection actions, why they are important, and how we can be better at data. Until then, I'll love to hear about what, if any, data protection reward (or punishment) systems your organization has in place today.

In this last post of my 5 More Ways I Can Steal Your Data series, I focus on my belief that all data security comes down to empathy. Yes, that one trait that we in technology stereotypically aren't known for displaying. But I know there are IT professionals out there who have and use it. These are the people I need on my teams to help guide them toward making the right decisions.


Empathy? That's Not a Technical Skill!

If we all recognize that the personal data we steward actually belongs to people who need to have their data treated securely, then we will make decisions that make that data more secure. But what about people who just don't have that feeling? We see attitudes like this:


"I know the data model calls for encryption, but we just don't have the time to implement it now. We'll do it later."


"Encryption means making the columns wider. That will negatively impact performance."


"We have a firewall to protect the data."


"Encryption increases CPU pressure. That will negatively impact performance."


"Security and privacy aren't my jobs. Someone needs to do those parts after the software is done."


"We don't have to meet European laws unless our company is in Europe." [I'm not a lawyer, but I know this isn't true.]


What's lacking in all those statements is a lack of empathy for the people whose data we are storing. The people who will be forced to deal with the consequences of bad data practices once all the other 10+ Ways I Can Steal Your Data I've been writing about in the eBook and this series. Consequences might just be having to reset their passwords. Bad data practices could lead to identity theft, financial losses, and personal safety issues.


Hiring for Empathy


I rarely see any interview techniques that focus on screening candidates for empathy skills or experiences. Maybe we should be adding such items to our hiring processes. I believe the best way to do this is to ask candidates to talk about:

  • Examples of times they had to choose the right type of security to implement for Personally Identifiable Information (PII)
  • A time they had to trade performance in favor of meeting a requirement
  • The roles they think are responsible for data protection
  • The methods they would use in projects focused on protecting data
  • The times they have personally experienced having their own data exposed


If I were asking these questions of a candidate, I'd be looking not so much for their answers, but the attitude they convey while answering. Did they factor in risks? Trade-offs? How a customer might be impacted?  This is what Jerry Weinberg writes about in Secrets of Consulting when he says, "Words are useful, but always listen to the music."


By the way, this concept applies to consultants as well. Sure, we tend to retain consultants who can just get things done, but they also need to have empathy to help clients make the right decisions. Consultants who lack empathy tend to not care much about your customers, just their own.


Wrapping it Up

I encourage you to read the eBook, go back through the series, then take steps to help ensure data security and empathy. Empathy is about feeling their pain and taking a stand to mitigate that pain as much as you can.


Oh, and as I said in a previous post, keeping your boss out of jail.  Do that.


UPDATE: My eBook, 10 Ways We Can Steal Your Data is now available.  Go download it.

10 Ways We Can Steal Your Data eBook cover: spaceship, robot, data center

Datachick LEGO at a SolarWinds Desk with a water cooler

In my recent post  5 More Ways I Can Steal Your Data - Work for You & Stop Working for You I started telling the story of a security guard who helped a just fired contractor take servers with copies of production data out of the building:


Soon after he was rehired, the police called to say they had raided his home and found servers and other computer equipment with company asset control tags on them. They reviewed surveillance video that showed a security guard holding the door for the man as he carried equipment out in the early hours of the morning. The servers contained unencrypted personal data, including customer and payment information. Why? These were development servers where backups of production data were used as test data.

Apparently, the contractor was surprised to be hired back by a company that had caught him stealing, so he decided since he knew about physical security weaknesses, he would focus not on taking equipment, but the much more valuable customer and payment data.


How the Heck Was He Able to Do This?


You might think he was able to get away with this by having insider help, right?  He did, sort of.  But it didn't come from the security guard.  It came from poor management practices, not enough resources, and more. I'm going to refer to the thief here as "Our Friend".


Not Enough Resources


Our Friend had insider information about how lax physical security was at this location.  There was only ever one security person working at a time.  When she took breaks, or had to deal with a security issues elsewhere, no one else was there to cover the entrance.  Staff could enter with badges and anyone could exit.  Badging systems were old and nearly featureless.  Printers and other resources available to the security group were old and nearly non-functioning.  Staff in security weren't required or tested to be security minded.


In this case, it was easy to figure out the weaknesses in this system.


Poor Security Practices


In the case of Our Friend, he was rehired by a different group who had no access to a "do not hire" list because he was a contractor, not an employee.  He was surprised at being rehired (as were others).  This culture of this IT group was very much "mind your own business" and "don't make waves".  I find that a toxic management culture plays a key role in security approaches.  When security issues were raised, the response was more often than not "we don't have time to worry about that" or "focus on your own job".


Poor Physical Security


Piggybacking or Tailgating (following a person with access through a door without scanning a badge) is a common unenforced practice in many facilities.  Sometimes employees would actually hold the door open for complete strangers.  This seems like being nice, but it's not. Another contractor, who had recently been let go, was let in several times during off hours to wander the hallways looking for his former work laptop.  He wanted to remove traces of improper files and photos.  He accomplished this by tailgating his way into the building.  This happened just weeks before Our Friend carried out his acts.


When Our Friend was rehired, there was a printout of his old badge photo hanging on the wall at the security area.  It was a low-resolution photo printed on a cheap inkjet printer running low on ink.  The guard working that day couldn't even tell that this guy had a "no entry" warning.  The badge printing software had no checks for "no new badge".


After being rehired, Our Friend was caught again stealing networking equipment and was let go.  Security was notified and another poorly printed photo was put up in the security area. Then Our Friend came back in the early morning hours on the weekend, said he forgot his badge and was issued a new one.  Nothing in the system set up an alert.


He spent some time gathering computers that were installed in development and QA labs, then some running in other unsecured areas.  He got a cart, and the security guard held the door open while he took them out to his car.  How do we know this?  There were video tapes. How do we know this? The security guard sold the tapes to a local news station. News stations love when there is video.


Data Ignorance


Ask I mentioned in the previous post, the company didn't even know the items were missing. It took several calls from the local police to get a response.  And even then the company denied anything was missing.  Because they didn't know.   Many of us knew that these computers would have production data on them because this organization used production data in their development and test processes.


But the company itself had no data inventory system. They had no way of knowing just what data was on those computers.  It was also common to find these systems had virtually no security or they had a single login for the QA environment that was written on the whiteboard in the QA labs. No one knew just what data was copied where.  Anyone could deploy production data anywhere they could find. Request for production data were pretty much allowed for anyone in IT or the rest of the company.   Requests could be done verbally.  There were no records of any request or the provision of data.  Employees were given no indication that any set of data held sensitive or otherwise protected data.


The lack of inventory let the company spokesperson say something like "These were just test devices; we have no indication that any customer data was involved in this theft".


Fixing It


I could go on with a list of tips on how to fix these issues. But the main fix, that no one wants to embrace, is to stop using production data for dev and test.  I have some more writing on this topic, but this will be my agenda for 2018.  If this company had embraced this option, the theft would have been just of equipment and some test data with no value.


The main fix that no one wants to embrace is to stop using production data for dev and test.


If we as IT professionals started following the practice of having real test data, many of the breaches we know of would not have been breaches of real data.  Yes, we need to fix physical security issues.  But let's keep production data in production.  Unless we are testing a production migration, there's no need to use production data for any reason.  In fact, many data protection compliance schemes forbid it.

Have you developed real test data, not based on just trying to obscure production data, for all your dev/text needs?

tiles spelling out DATA THEFT

In my eBook, 10 Ways We Can Steal Your Data, I reveal ways that people can steal or destroy the data in your systems. In this blog post, I'm focusing on un-monitored and poorly monitored systems.


Third-party Vendors


The most notorious case of this type is the 2013 Target data theft incident in which 40 million credit and debit cards were stolen from Target's systems. This data breach is a case study on the role of monitoring and alerting. It led to fines and costs in the hundreds of millions of dollars for the retailer. Target had security systems in place, but the company wasn't monitoring the security of their third-party supplier. And, among other issues, Target did not respond to their monitoring reports.


The third-party vendor, an HVAC services provider, had a public-facing portal for logging in to monitor their systems. Access to this system was breached via an email phishing attack. This information, together with a detailed security case study and architecture published by another Target vendor, gave the attackers the information they needed to successfully install malware on Target Point-of-Sale (POS) servers and systems.


Target listed their vendors on their website. This list provided a funnel for attackers to find and exploit vendor systems. The attackers found the right vulnerability to exploit with one of the vendors, then leveraged the details from the other vendor to do their work.


Misconfigured, Unprotected, and Unsecured Resources


The attackers used vulnerabilities (backdoors, default credentials, and misconfigured domain controllers) to work their way through the systems. These are easy things to scan for and monitor. So much so that "script kiddies" can do this without even knowing how their scripts work. Why didn't IT know about these misconfigurations? Why were default credentials left in enterprise data center applications?  Why was information about ports and other configurations published publicly? No one of these issues may have led to the same outcome, but as I'll cover below, these together formed the perfect storm of mismanaged resources to make the data breach possible.



When all this was happening, Target's offsite monitoring team was alerted that unexpected activities were happening on a large scale. They notified Target, but there was no response.


Some of the reasons given were that there were too many false positives, so security staff had grown slow to respond to all reports. Alert tuning would have helped this issue. Other issues included having too few and undertrained security staff.


Pulling it All Together


There were monitoring controls in place at Target, as well as security staff, third-party monitoring services, and up-to-date compliance auditing. But the system as a whole failed due to not having an integrated, system-wide approach to security and threat management.



How can we mitigate these types of events?


  • Don't use many, separate monitoring and alerting systems
  • Follow data flows through the whole system, not just one system at a time
  • Tune alerts so that humans respond
  • Test responders to see if the alerts are working
  • Read the SANS case study on this breach
  • Don't let DevOps performance get in the way of threat management
  • Monitor for misconfigured resources
  • Monitor for unpatched resources
  • Monitor for rogue software installs
  • Monitor for default credentials
  • Monitor for open ports
  • Educate staff on over-sharing about systems
  • Monitor the press for reports about technical resources
  • Perform regular pen testing
  • Treat security as a daily operational practice for everyone, not just an annual review
  • Think like a hacker


I could just keep adding to this list.  Do you have items to add? List them below and I'll update.

AventureWorks Sample data

In my soon-to-be-released eBook, 10 Ways We Can Steal Your Data, we talk about The People Problem, how people not even trying to be malicious end up exposing data to others without even understanding how their actions put data at risk. But in this post, I want to talk about intentional data theft.


What happens when insiders value the data your organization stewards? There have been several newsworthy cases where insiders have recognized that they could profit from taking data and making it available to others. In today’s post, I cover two ways I can steal your data that fall under that category.

1.Get hired at a company where security is an afterthought

When working with one of my former clients (this organization is no longer in business, so I feel a bit freer to talk about this situation), an IT contractor with personal financial issues was hired to help with networking administration. From what I heard, he was a nice guy and a hard worker. One day, network equipment belonging to the company was found in his car and he was let go. However, he was rehired to work on a related project just a few months later. During this time, he was experiencing even greater financial pressures than before. 

Soon after he was rehired, the police called to say they had raided his home and found servers and other computer equipment with company asset control tags on them. They reviewed surveillance video that showed a security guard holding the door for the man as he carried equipment out in the early hours of the morning. The servers contained unencrypted personal data, including customer and payment information. Why? These were development servers where backups of production data were used as test data.

Apparently, the contractor was surprised to be hired back by a company that had caught him stealing, so he decided since he knew about physical security weaknesses, he would focus not on taking equipment, but the much more valuable customer and payment data. 

In another case, a South Carolina Medicaid worker requested a large number of patient records, then emailed that data to his personal address. This breach was discovered and he was fired. My favorite quotes from this story were:

Keck said that in hindsight, his agency relied too much on “internal relationships as our security system.”




Given his position in the agency, Lykes had no known need for the volume of information on Medicaid beneficiaries he transferred, Keck said.

How could this data breach be avoided?

It seems obvious to me, but rehiring a contractor who has already breached security seems like a bad idea. Having physical security that does not require paperwork to remove large quantities of equipment in the middle of the night also seems questionable. Don't let staffing pressures persuade you to make bad rehire decisions.

2. Get hired, then fired, but keep friends and family close


At one U.S. hospital, a staff member was caught stealing patient data for use in identity theft (apparently this a major reason why health data theft happens) and let go. But his wife, who worked at the hospital in a records administration role, maintained her position after he was gone. Not surprisingly, at least in hindsight, the data thefts continued.

There have also been data breach scenarios in which one employee paid another employee or employees to gather small numbers of records to send to a third party who aggregated those records into a more valuable stockpile of sellable data.

In other data breach stories, shared logins and passwords have led to former employees stealing data, locking out onsite teams, or even destroying data. I heard a story about one employee, who was swamped with work, who provided his credentials to a former employee who had agreed to assist with the workload. That former employee used the information he was given to steal and resell valuable trade secrets to his new employer.

How can these data breaches be avoided?

In the previously mentioned husband and wife scenario, I'm not sure what the impact should have been regarding the wife’s job. There was no evidence that she had been involved in the previous data breach. That said, it would have been a good idea to ensure that data access monitoring was focused on any family members of the accused.

Sharing logins and passwords is a security nightmare when employees leave. They rarely get reset, and even when they do they are often reset to a slight variation of the former password.


This reminds me of one more much easier way to steal data, one I covered in the 10 Ways eBook: If you use production data as test and development data, it’s likely there is no data access monitoring on that same sensitive data. And no “export controls” on it, either. This is a gaping hole in data security and it’s our job as data professionals to stop this practice.

What data breach causes have you heard about that allowed people to use unique approaches to stealing or leaking data? I'd love to hear from you in the comments below.



In my soon-to-be-released eBook, 10 Ways I Can Steal Your Data, I cover the not-so-talked-about ways that people can access your enterprise data. It covers things like you're just GIVING me your data, ways you might not realize you are giving me your data, and how to keep those things from happening.


The 10 Ways eBook was prepared to complement my upcoming panel during next week's ThwackCamp on the data management lifecycle. You've registered for ThwackCamp, right? In this panel, a group of fun and sometimes irreverent IT professionals, including Thomas LaRock sqlrockstar, Stephen Foskett sfoskett and me, talk with Head Geek Kong Yang kong.yang about things we want to see in the discipline of monitoring and systems administration. We also did a fun video about stealing data. I knew I couldn't trust that Kong guy!


In this blog series, I want to talk about bit more about other ways I can steal your data. In fact, there are so many ways this can happen I could do a semi-monthly blog series from now until the end of the world. Heck, with so many data breaches happening, the end of the world might just be sooner than we think.


More Data, More Breaches

We all know that data protection is getting more and wider attention. But why is that? Yes, there are more breaches, but I also think legislation, especially the regulations coming out of Europe, such as General Data Protection Regulation (GDPR), means we are getting more reports. In the past, organizations would keep quiet about failures in their infrastructure and processes because they didn't want us to know about how poorly they treated our data. In fact, during the "software is eating the world" phase of IT professionals making software developers kings of world, most data had almost no protection and was haphazardly secured. We valued performance over privacy and security. We favored developer productivity over data protection. We loved our software more than we loved our data.


But this is all changing due to an increased focus on the way the enterprise values data.


I have some favorite mantras for data protection:


  • Data lasts longer than code, so treat it right
  • Data privacy is not security, but security is required to protect data privacy
  • Data protection must begin at requirements time
  • Data protection cannot be an after-production add-on
  • Secure your data and secure your job
  • Customer data is valuable to the customers, so if you value it, your customers will value your company
  • Data yearns to be free, but not to the entire world
  • Security features are used to protect data, but they have to be designed appropriately
  • Performance desires should never trump security requirements



And my favorite one:


  • ROI also stands for Risk of Incarceration: Keeping your boss out of jail is part of your job description



So keep an eye out for the announcement of the eBook release and return here in two weeks when I'll share even more ways I can steal your data.


As we come to the end of this series on infrastructure and application data analytics, I thought I'd share my favorite quotes, thoughts, and images from the past few weeks of posts leading up to the PerfStack release.


SomeClown leads the way in The One Where We Abstract a Thing


"Mean time to innocence (MTTI) is a somewhat tongue-in-cheek metric in IT shops these days, referring to the amount of time it takes an engineer to prove that the domain for which they have responsibility is not, in fact, the cause of whatever problem is being investigated. In order to quantify an assessment of innocence you need information, documentation that the problem is not yours, even if you cannot say with any certainty who does own the problem. To do this, you need a tool which can generate impersonal, authoritative proof you can stand on, and which other engineers will respect. This is certainly helped if a system-wide tool, trusted by all parties, is a major contributor to this documentation."


Karen:  Mean Time To Innocence! I'm so stealing that. I wrote a bit about this effect in my post Improving your Diagnostic and Troubleshooting Skills. When there's a major problem, the first thing most of us think is, "PLEASE DON'T LET IT BE ME!"  So I love this thought.


demitassenz wrote in PerfStack for Multi-dimensional Performance Troubleshooting


"My favorite part was adding multiple different performance counters from the different layers of infrastructure to a single screen. This is where I had the Excel flashback, only here the consolidation is done programmatically. No need for me to make sure the time series match up. I loved that the performance graphs were re-drawing in real-time as new counters were added. Even better was that the re-draw was fast enough that counters could be added on the off chance that they were relevant. When they are not relevant, they can simply be removed. The hours I wasted building Excel graphs translate into minutes of building a PerfStack workspace."


Karen:  OMG! I had completely forgotten my days of downloading CSVs or other outputs of tools and trying to correlate them in Excel. As a data professional, I'm happy that we now have a way to quickly and dynamically bring metrics together to make data tell the story it wants to tell.


cobrien  NPM 12.1 Sneak Peek - Using Perfstack for Networks


"I was exploring some of the data the other day. It’s like the scientific method in real-time. Observe some data, come up with a hypothesis, drag on related data to prove or disprove your hypothesis, rinse, and repeat."


Karen:  Data + Science.  What's not to love?


SomeClown mentioned in Perfstack Changes the Game


"PerfStack can now create dashboards on the fly, filled with all of the pertinent pieces of data needed to remediate a problem. More than that, however, they can give another user that same dashboard, who can then add their own bits and bobs. You are effectively building up a grouping of monitoring inputs consisting of cross-platform data points, making troubleshooting across silos seamless in a way that it has never been before."


Karen: In my posts, I focused a lot on the importance of collaboration for troubleshooting. Here, Teren gets right to the point. We can collaboratively build analytics based on our own expertise to get right to the point of what we are trying to resolve.  And we have data to back it up.


aLTeReGo in a post demo-ing how it works, Drag & Drop Answers to Your Toughest IT Questions


"Sharing is caring. The most powerful PerfStack feature of all is the ability to collaborate with others within your IT organization; breaking down the silo walls and allowing teams to triage and troubleshoot problems across functional areas. Anything built in PerfStack is sharable. The only requirement is that the individual you're sharing with has the ability to login to the Orion web interface. Sharing is as simple as copying the URL in your browser and pasting it into email, IM, or even a help desk ticket."


Karen: Yes! I also wrote about how important collaboration is to getting problems solved fast.


demitassenz shared in Passing the Blame Like a Boss


"One thing to keep in mind is that collaborative troubleshooting is more productive than playing help desk ticket ping pong. It definitely helps the process to have experts across the disciplines working together in real time. It helps both with resolving the problem at hand and with future problems. Often each team can learn a little of the other team’s specialization to better understand the overall environment. Another underappreciated aspect is that it helps people to understand that the other teams are not complete idiots. To understand that each specialization has its own issues and complexity.


Karen: Help desk ticket ping pong. If you've ever suffered through this, especially when someone passes the tick back to you right before the emergency "why haven't we fixed this yet" meeting with the CEO, you'll know the pain of it all.


SomeClown observed in More PerfStack - Screenshot Edition


"In a nutshell, what it allows you to do is to find all sorts of bits of information that you're already monitoring, and view it all in one place for easy consumption. Rather than going from this page to that, one IT discipline-domain to another, or ticket to ticket, PerfStack gives you more freedom to mix and match, to see only the bits pertinent to the problem at hand, whether those are in the VOIP systems, wireless, applications, or network. Who would have thought that would be useful, and why haven't we thought of that before?"


Karen: "Why haven't we thought of that before?" That last bit hit home for me. I remember working on a project for a client to do a data model about IT systems. This was at least 20 years ago. We were going to build an integrated IT management systems so that admins could break through the silo-based systems and approaches to solve a major SLA issue for our end-users. We did a lot of work until the project was deferred when a legislative change meant that all resources needed to be redirected to meet those requirements. But I still remember how difficult it was going to be to pull all this data together. With PerfStack, we aren't building a new collection system.  We are applying analytics on top of what we are already collecting with specialized tools.


DataChick's Thoughts


This next part is cheating a bit, because the quotes are from my own posts. But hey, I also like them and want to focus on them again.


datachick in Better Metrics. Better Data. Better Analytics. Better IT.


"As a data professional, I'm biased, but I believe that data is the key to successful collaboration in managing complex systems. We can't manage by "feelings," and we can't manage by looking at silo-ed data. With PerfStack, we have an analytics system, with data visualizations, to help us get to the cause faster, with less pain-and-blame. This makes us all look better to the business. They become more confident in us because, as one CEO told me, "You all look like you know what you are doing." That helped when we went to ask for more resources."


Karen: We should all look good to the CEO, right?


datachick ranted in 5 Anti-Patterns to IT Collaboration: Data Will Save You


"These anti-patterns don't just increase costs, decrease team function, increase risk, and decrease organizational confidence, they also lead to employee dissatisfaction and morale. That leads to higher turnover (see above) and more pressure on good employees. Having the right data, at the right time, in the right format, will allow you to get to the root cause of issues, and better collaborate with others faster, cheaper, and easier.  Also, it will let you enjoy your 3:00 ams better."


I enjoyed sharing my thoughts on these topics and reading other people's posts as well. It seems bloggers here shared the same underlying theme of collaboration and teamwork. That made this Canadian Data Chick happy. Go, everyone. Solve problems together.  Do IT better.  And don't let me catch you trying to do any of that without data to back you up. Be part of #TeamData.



I've worked in IT for a long time (I stopped counting at twenty years.  Quite a while ago.)  This experience means that I generally do well in troubleshooting in data--related areas.  Other areas like networking and I'm pretty much done at "do I have an IP address" and "is it plugged in?"


This is why team collaboration on IT issues, as I posted before, is so important.


What Can Go Wrong?


One of the things I've noticed is that while people can be experts in deploying solutions, this doesn't mean they are great at diagnosing issues. You've worked with that guy.  He's great at getting things installed and working.  But when things go wrong, he just starts pulling out cables and grumbling about other people's incompetence.  He keeps making changes and does several at the same time.  He's a nightmare.  And when you try to step in to help him get back on a path, he starts laying blame before he starts diagnosing the issue. You don't have to be that guy, though, to have challenges in troubleshooting.


Some of the effects that can contribute to troubleshooting challenges:


Availability Heuristic


If you have recently solved a series of NIC issues, the next time someone reports slow response times, you're naturally going to first consider a NIC issue.  And many times, this will work out just fine.  But if it constrains your thinking, you may be slow to get to the actual cause.  The best way to fight this cognitive issue is to gather data first, then assess the situation based on your entire troubleshooting experience.


Confirmation Bias


Confirmation Bias goes hand in hand with availability heuristic. Once you have narrowed the causes you think are causing this response time metric, your brain will want you to go look for evidence that the problem is indeed the network cards.   The best way to fight this is to recognize when you are looking for proof instead of looking for data.  Another way to overcome confirmation bias is to collaborate with others on what they are seeing.  While groupthink can be a issue, it's less likely for a group to share the same confirmation bias equally.


Anchoring Heuristic


So to get here, you have limited your guesses to recent issues, you have searched out data to prove the correctness of your diagnosis and now you are anchored there.  You want to believe.  You may start rejecting and ignoring data that contradicts your assumptions. In a team environment, this can be one of the most frustrating group troubleshooting challenges. You definitely don't want to be that gal.  The one who won't look at all the data. Trust me on this.




I use intuition a lot when I diagnose issues.  It's a good thing, in general.  Intuition helps professionals take a huge amount of data and narrow it down to a manageable set of causes. It's usually based on having dealt with similar issues hundreds or thousands of times over the course of your career.  But intuition without follow up data analysis can be a huge issue.  This often happens due to ego or lack of experience.  Dunning Kruger syndrome (not knowing what you don't know) can also be a factor here.


There are other challenges in diagnosing causes and effects of IT issues. I highly recommend reading up of them so you can spot these behaviours in others and yourself.


Improving Troubleshooting Skills


  1. Be Aware.
    The first thing you can do to improve the speed and accuracy of your troubleshooting is to recognize these behaviours when you are doing them.  Being self-aware, especially when you are under pressure to bring systems back online or have a boss pacing behind your desk asking "when will this be fixed?" will help you focus on the right things.  In a truly collaborative, high trust environment, team members can help others check whether they are having challenges in diagnosing based on the biases above.
  2. Get feedback.
    We are generally luck in IT that we, unlike other professions,  can almost always immediately see the impact of our fixes to see if they actually fixed the problem.  We have tools that report metrics and users who will let us know if we were wrong.  But even post-event analyses, documenting what we got right, what we got wrong can help us improve our methods
  3. Practice.
    Yes, every day we troubleshoot issues.  That counts as practice.  But we don't always test ourselves like other professions do.  Disaster Recovery exercises are a great way to do this, but I've always thought we needed troubleshooting code camps/hackathons to help us hone our skills. 
  4. Bring Data.
    Data is imperative to punching through the cognitive challenges listed above.  Imagine diagnosing a data-center wide outage and having to start by polling each resource to see how it's doing.  We must have data for both intuitive and analytical responses.
  5. Analyze.
    I love my data.  But it's only and input into a diagnostic process.  Metrics, considered in a holistic, cross-platform, cross team view is the next step.  A shared analysis platform makes combining and overlaying data to get to the real answers makes all this smoother and faster.
  6. Log What Happened. 
    This sounds like a lot of overhead when you are under pressure (is your boss still there?), but keeping a quick list of what was done, what your thought process was, what others did can be an important part of professional practice.  Teams can even share the load of writing stuff down.  This sort of knowledgebase is also important for when your run into the rare things that that have a simple solution but you can't remember exactly what to do (or even not to do).

A person with experience can be a experienced non-expert. But with data, analysis and awareness of our biases and challenges in troubleshooting, we can get problems solved faster and with better accuracy. The future of IT troubleshooting will be based more and more on analytical approaches.


Do you have other tips for improving your troubleshooting and diagnostic skills?  Do you think we should get formal training in troubleshooting?

In our pursuit of Better IT, I bring you a post on how important data is to functional teams and groups. Last week we talked aboutnti-patterns in collaboration, covering things like data mine-ing and other organizational dysfunctions. In this post we will be talking about the role shared data, information, visualizations, and analytics play in helping ensure your teams can avoid all those missteps from last week.


Data! Data! Data!

These days we have data. Lots and lots of data. Even Big Data, data so important we capitalize it!. As much as I love my data, we can't solve problems with just raw data, even if we enjoy browsing through pages of JSON or log data. That's why we have products like NPM Network Performance Monitor Release Candidate , SAM Server & Applications Monitor Release Candidate and DPAThe specified item was not found.,  to help us collect and parse all that data.  Each of those products have specialized metrics they collect, meaning they apply to them and visualizations to help specialized SySadmins to leverage that data. These administrators probably don't think of themselves as data professionals, but they are. They choose which data to collect, which levels to be alerted on, and which to report upon. They are experts in this data and they have learned to love it all.

Shared Data about App and Infrastructure Resources

Within the SolarWinds product solutions, data about the infrastructure and application graph is collected and displayed on the Orion Platform. This means that cross-team admins share the same set of resources and components and the data about their metrics. Now we havePerfStack Livecast with features to do cross-team collaboration via data. We can see entities we want to analyze, then see all the other entities related them. This is what I call the Infrastructure and Application Graph, which I'll be writing about later. After choosing Entities, we can discover the metrics available for each of the entities and choose the ones that make the most sense to analyze based on the troubleshooting we are doing now.




Metrics Over Time


Another data feature that's critical to analyzing infrastructure issues is the ability to see data *over time." It's not enough to know how CPU is doing right now, we need to know what it was doing earlier today, yesterday, last week, and maybe even last month, on the same day of the month. By having a view into the status of resources over time, we can intelligently make sense of the data we are seeing today. End-of-month processing going on? Now we know why there might be slight spike in CPU pressure.


Visualizations and Analyses


The beauty of Perfstack is that by choosing these Entities and metrics we can easily build data visualizations of the metrics and overlay them to discover correlations and causes. We can then interact with the information we now have by working with the data or the visualizations. By overlaying the data, we can see how statuses of resources are impacting each other. This collaboration of data means we are performing "team troubleshooting" instead of silo-based "whodunits." We can find the issue, which until now might have been hiding in data in separate products.




So we've gone from data to information to analysis in just minutes. Another beautiful feature of PerfStack is that once we've built the analyses that show our troubleshooting results, we can copy the URL, send it off to team members, and they can see the exact same analysis -- complete with visualizations -- that we saw. If we've done similar troubleshooting before and saved projects, we might be doing this in seconds.

Save Project.png

This is often hours, if not days, faster than how we did troubleshooting in our previous silo-ed, data mine-ing approach to application and infrastructure support. We accomplished this by having quick and easy access to shared information that united differing views of our infrastructure and application graph.


Data -> Information -> Visualization -> Analysis -> Action


It all starts with the data, but we have to love the data into becoming actions. I'm excited about this data-driven workflow in keeping applications and infrastructure happy.

Karen Lego Figures (c) Karen Lopez InfoAdvisors.com

As promised in my previous post on Better IT, in this series I will be talking about collaboration. Today I'm sharing with you anti-patterns in collaboration.

Anti-pattern - Things you shouldn't be doing because they get in the way of success in your work, or your organization's efforts.  Antonym of "pattern."

In my troubled project turnaround work, when I start to talk about collaboration, I usually get many eye rolls. People think we're going to start doing team-building exercises, install an arcade game, and initiate hourly group hugs. (Not that these would be so bad.)  But most collaboration missteps I see are the result of anti-patterns that show up in how teams work. So in this post, let's look at the not-so-great-things that will get your team and your organization into trouble.


IT admins who don't know who is responsible for what, or can't find them

This is often the case in geo-diverse teams, spread over several time zones, and teams with a high staff turnover. Their processes (their "pattern") is to go on a "responsibility safari" to find the person and their contact information for a resource. On one project, it took me almost a month to find the person, who lived on another continent, who was responsible for the new networks we were going to deploy to our retail locations. By the time I found him, he was planning on moving to another company within a week. Having to hunt down people first, then their tools, then their data, is both costly and time-consuming, which delays one's ability to resolve issues. Having to find people before you find data is not the right way to manage.


IT admins who collect duplicate data about resources and their metrics, often in difficult to integrate formats and units of measure

This is almost always the result of using a hodgepodge of tools across teams, many of which are duplicate tools because one person has a preference of toolsets. This duplication of tools leads to duplication of data.  And many of these tools keep their data locked in, with no way to share that data with other tools. This duplication of data and effort is a huge waste of time and money for everyone. The cost of incompatible tool sets producing data in incompatible formats and levels of granularity is large and often not measured. It slows down access to data and the sharing of data across resource types.


IT pros who want to keep their data "private" 

This dysfunction is one my friend Len Silverston calls "data mine-ing," keeping data to yourself for personal use only. This is derived from the fact that data is indeed power. Keeping information about the status of the resources you manage gives you control of the messaging about those systems. This is a terrible thing for collaboration.


Data mine-ing - Acting in a manner that says, "This data is mine."

- Len Silverston

Agile blocking is horrible

A famous Agilista wants people to report false statuses, pretend to do work, tell teams that "all is good" so he can keep doing what he is doing without interruption. He also advocates for sharing incorrect data and data that makes it look like other teams are to blame. I refuse to link to this practice, but if you have decent search skills, you can find it. Teams that practice blocking are usually in the worst shape possible, and also build systems that are literally designed to fail and send their CEO to jail.  It's that bad. Of all these anti-patterns, this is the most dangerous and selfish.


IT admins who use a person-focused process

We should ensure that all of our work is personable. And collaborative. But "person-focused" here means "sharing only via personal intervention." When I ask them how they solve a problem, they often answer with, "I just walk over to the guy who does it and ask them to fix it." This is seen as Agile, because it's reactionary, and needs no documentation or planning. It does not scale on real-life projects. It is the exact opposite of efficiency. "Just walking over" is an interruption to someone else who may not even manage one of the actual resources related to the issue. Also, she might not even work in the same building or country.  Finally, these types of data-less visits increases the us-versus-them mentality that negatively impacts the collaboration success. Sharing data about an instance is just that: data. It's the status of a set a resources. We can blame a dead router without having to blame a person. Being able to focus on the facts allows us to depersonalize the blame game.


Data will save you

These anti-patterns don't just increase costs, decrease team function, increase risk, and decrease organizational confidence, they also lead to employee dissatisfaction and morale. That leads to higher turnover (see above) and more pressure on good employees. Having the right data, at the right time, in the right format, will allow you to get to the root cause of issues, and better collaborate with others faster, cheaper, and easier.  Also, it will let you enjoy your 3:00 ams better.


Are there other anti-patterns related to collaboration that you've seen when you've tried to motivate cross-team collaboration?  Share one in the comments if you do.

A few years ago I was working on a project as a project manager and architect when a developer came up to me and said, "You need to denormalize these tables…" and he handed me a list of about 10 tables that he wanted collapsed into one big table. When I asked him why, he explained that his query was taking four minutes to run because the database was "overnormalized." Our database was small: our largest table had only 40,000 rows. His query was pulling from a lot of tables, but it was only pulling back data on one transaction.  I couldn't even think of a way to write a query to do that and force it to take four minutes. I still can't.


I asked him to show me the data he had to show me the duration of his query against the database. He explained that he didn't have data, he had just timed his application from button push to results showing up on the screen. He believed that because there could be nothing wrong with his code, then it just *had* to be the database that was causing his problem.


I ran his query against the database, and the results set came back in just a few milliseconds. No change to the database was going to make his four-minute query run faster. I told him to go find the cause that was happening between the database and the application. It wasn't my problem.

He eventually discovered that the issue was a complex one involving duplicate IP addresses and other network configuration issues in the development lab.


Looking back on that interaction, I realize that this is how most of us in IT work: someone brings us a problem, ("the system is slow"), we look into our tools and our data and make a yes-or-no answer about whether we caused it. If we can't find a problem, we close the ticket or send the problem over to another IT group. If we are in the database group, we send it over to the network or storage guys. If they get the report, they send it over to us. These sort of silo-based responses take longer to resolve, often lead to a lot of chasing down and re-blaming. It costs time and money because we aren't responding as a team, just a loose collection of groups.

Why does this happen?

perfstacksingle.pngThe main reason we do this is because typically we don't have insights into anyone else's systems' data and metrics. And even if we did, we wouldn't understand it. Then we throw in the fact that most teams have their own set of specialized tools and that we don't have access to. I had no access to network monitoring tools nor permissions to run any.  It wasn't my job.


We are typically measured and rewarded based on working within our own groups, be it systems, storage, or networks, not on troubleshooting issues with other parts of infrastructure.  It's like we build giant walls around our "stuff" and hope that someone else knows how to navigate around them. This "not my problem' response to complex systems issues doesn't help anyone.



What if it didn't have to be that way?


Another contributing factor is the intense complexity of the architecture of modern application systems. There are more options, more metadata, more metrics, more interfaces, more layers, more options than ever before. In the past, we attempted to build one giant tool to manage them all. What if we could still use specialty tools to monitor and manage all our components *and* pull the graph of resources and their data in one place so that we could analyze and diagnose issues using a common and sharable way?


True collaboration requires data that is:

  • Integrated
  • Visualized
  • Correlated
  • Traceable across teams and groups
  • Understandable


That's exactly what SolarWinds' PerfStack does. PerfStack builds upon the Orion Platform to help IT pros troubleshoot problems in one place, using a common interface, to help cross-platform teams figure out where a bottleneck is, what is causing it and get on to fixing it.



From <https://thwack.solarwinds.com/community/solarwinds-community/product-blog/blog>


PerfStack combines metrics you choose from across tools like Network Performance Monitor Release Candidate @network  and Server & Applications Monitor Release Candidate from the Orion Platform into one easy-to-consume data visualization, matching them up by time. You can see in the figure above how it's easy to spot a correlated data point that is likely the cause of less-than-spectacular performance your work normally delivers. PerfStack allows you to highlight exactly the data you want to see, ignore the parts that aren't relevant, and get right to the outliers.


As a data professional, I'm biased, but I believe that data is the key to successful collaboration in managing complex systems. We can't manage by "feelings," and we can't manage by looking at silo-ed data. With PerfStack, we have an analytics system, with data visualizations, to help us get to the cause faster, with less pain-and-blame. This makes us all look better to the business. They become more confident in us because, as one CEO told me, "you all look like you know what you are doing." That helped when we went to ask for more resources


Do you have a story?


Later in this series, I'll be writing about the nature of collaboration and how you can benefit from shared data and analytics in delivering better and more confidence-instilling results to your organization. Meanwhile, do you have any stories of being sent on a chase to find the cause of a problem?  Do you have any great stories of bizarre causes you've found to a systems issue?

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.