The Tricky Thing About Doing Tricky Things in Your Database

stencil.linkedin-post (1).jpg

In my previous posts, I shared my tips on being an Accidental DBA - what things you should focus on first and how to prioritize your tasks.  Today at 1PM CDT, Thomas LaRock, HeadGeek and Kevin Sparenberg, Product Manager, will be talking about what Accidental DBAs should know about all the stuff that goes on inside the Black Box of a database.  I'm going to share with you some of the other things that Accidental DBAs need to think about inside the tables and columns of a database.

I'm sure you're thinking "But Karen, why should I care about database design if my job is keeping databases up and running?"  Accidental DBAs need to worry about database design because bad design has significant impacts on database performance, data quality, and availability. Even though an operational DBA didn't build it, they get the 3 AM alert for it.

Tricks

People use tricks for all kinds of reasons: they don't fully understand the relational model or databases, they haven't been properly trained, they don't know a feature already exists, or they think they are smarter than the people who build database engines. All but the last one are easily fixed.  Tricky things are support nightmares, especially at 3 AM, because all your normal troubleshoot techniques are going to fail.  They impact the ability to integrate with other databases, and they are often so fragile no one wants to touch the design or the code that made all these tricks work. In my experience, my 3 AM brain doesn't want to see any tricks.

Tricky.png

Tricky Things

Over my career I've been amazed by the variety and volume of tricky things I've seen done in database designs.  Here I'm going to list just 3 examples, but if you've seen others, I'd love to hear about them in the comments. Some days I think we need to create a Ted Codd Award for the worst database design tricks.  But that's another post...

Building a Database Engine Inside Your Database

You've seen these wonders…a graph database build in a single table.  A key-value pair (or entity attribute value) database in a couple of tables. Or my favourite, a relational database engine within a relational database engine.  Now doing these sorts of things for specific reasons might be a good idea.  But embracing these designs as your whole database design is a real problem.  More about that below.

Wrong Data Types

One of the goals of physical database design is to allocate just the right amount of space for data. Too little and you lose data (or customers), too much and performance suffers.  But some designers take this too far and reach for the smallest one possible, like INTEGER for a ZIPCode.  Ignoring that some postal codes have letters, this is a bad idea because ZIPCodes have leading zeros.  When you store 01234 as an INTEGER, you are storing 1234.  That means you need to do text manipulation to find data via postal code and you need to "fix" the data to display it.

Making Your Application Do the Hard Parts

It's common to see solutions architected to do all the data integrity and consistency checks in the application code instead of in the database.  Referential integrity (foreign key constraints), check constraints, and other database features are ignored and instead hundreds of thousands of lines of code are used to ensure these data quality features. This inevitably leads to data quality problems.  However, the worst thing is that these often lead to performance issues, too, and most developers have no idea why.

Why Do We Care?

While most of the sample tricks above are the responsibility of the database designer, the Accidental DBA should care because:

  • DBAs are on-call, not the designers
  • If there are Accidental DBAs, it's likely there are Accidental Database Designers
  • While recovery is job number one, all the other jobs involve actually getting the right data to business users
  • Making bad data move around faster isn't actually helping the business
  • Making bad data move around slower never helps the business
  • Keeping your bosses out of jail is still in your job description, even if they didn't write it down

But the most important reason why production DBAs should care about this is that relational database engines are optimized to work a specific way - with relational database structures.  When you build that fancy Key-Value structure for all your data, the database optimizer is clueless how to handle all the different types of data. All your query tuning tricks won't help, because all the queries will be the same.  All your data values will have to be indexed in the same index, for the most part.  Your table sizes will be enormous and full table scans will be very common.  This means you, as the DBA, will be getting a lot of 3 AM calls. I hope you are ready.

With applications trying to do data integrity checks, they are going to miss some. A database engine is optimized to do integrity checks quickly and completely. Your developers may not.  This means the data is going to be mangled, with end users losing confidence in the systems. The system may even harm customers or lead to conflicting financial results.  Downstream systems won't be able to accept bad data.  You will be getting a lot of 3 AM phone calls as integration fails.

Incorrect data types will lead to running out of space for bigger values, slower performance as text manipulation must happen to process the data, and less confidence in data quality.  You will be getting a lot of 3 AM and 3 PM phone calls from self-serve end users.

In other words, doing tricky things with your database is tricky. And often makes things much worse than you anticipate.

In Thwack Camp today, sqlrockstar Thomas and Kevin will be covering the mechanics of databases and how to think about troubleshooting all those 3 AM alerts.  While you are attending, I'd like you to also think about how design issues might have contributed to that phone call.  Database design and database configurations are both important.  A great DBA, accidental or not, understands how all these choices impact performance and data integrity.

Some tricks are proper use of unique design needs. But when I see many of them, or over use of tricks, I know that there will be lots and lots of alerts happening in some poor DBA's future.  You should take steps to ensure a good design lets you get more sleep.  Let the database engine do what it is meant to do.

Thwack - Symbolize TM, R, and C