My First Month as a DBRE at Stack Overflow

Not too long ago, someone asked me, “Hey Aaron, what did you do in your first month at Stack Overflow?” It’s harder to answer than you would think. Though companies vary in their onboarding processes—from nonexistent to tediously overbearing—one thing relatively constant is your first month at any new company will almost certainly be a blur.

In my previous article, “Six Tips for a Productive DBA/DBRE,” one of my tips was “Ask, Explore, Absorb.” This was one of the biggest parts of my first few weeks—I basically sat in front of a firehose, with numerous people (myself included) controlling the flow. We have internal documentation to read, videos to watch, and plenty of folks happy to answer questions. Aside from my mentor, though, I didn’t have an endless stream of people dedicated to sitting down with me and explaining our architecture inside and out. And there shouldn’t be—they all have ongoing project work of their own. I decided to be proactive and do some exploring, trying to answer my own questions before giving in and asking around.

I immediately started looking for some smaller, easy wins as well as some larger goals to start focusing on.

One of the first things I wanted to address was consistency across our SQL Server instances, which fell into both the “smaller” and “larger” categories. Things like adjusting the power plan for the underlying operating system and switching to indirect checkpoints were pretty straightforward, and we implemented those changes almost as quickly as it took me to write them up. Other improvements, such as using backup compression by default, optimizing MAXDOP and max server memory, and turning off deadlock trace flags, would take more discussion and effort.

I started collecting this information right away in a handy little spreadsheet I’d ultimately reference in several future tickets with color coding—red for “really” wrong (or too low), orange for close (or too high), and light green for no change needed:

This centralized spreadsheet gave me a single place to look at specific details about our progress and gave upward levels and cross-functional teams a color coded, bird’s-eye view of positive changes we were making. I kept the original orange and red heatmap in its own sheet and tracked the changes in a separate sheet over time. When we finished a ticket to implement one of the changes, we’d update the affected servers to dark green to indicate “fixed” (and to distinguish from “no change needed”). Having completed all the tickets, which certainly didn’t happen within my first month, the sheet now looks like this:

Even if you don’t want to think about any of the positive changes—or dwell too much in the weeds of why an individual change is beneficial (this was all expressed in the individual tickets for any given change)—it’s hard to compare the sheets without acknowledging how successful that effort has been. Something I’ve already learned from the exercise, though, is to avoid trying to express changes (or good/bad in general) using colors like red and green, as this is a poor visual indicator for those with certain types of color blindness.

The next thing I wanted to do was simply take note of some of the bigger things in our architecture I don’t agree with 100%. To be successful, you have to make sure you assume everything in place was built with good intent and for good reasons. It’s important to keep this in mind even when writing up your own notes about pros and cons, as it’s easy for things to get slanted with bias. When there’s bias in the change you want, it’s far too easy to speak negatively about the status quo, and people can often feel defensive about code they maintain—even if they didn’t write it. Even if your plan isn’t necessarily to speak up and fight it, it’s important to empathize and understand what’s in place today.

One thing that struck me immediately here at Stack Overflow was the conscious decision to steer completely clear of stored procedures. Jeff Atwood wrote up some of his reasons for this decision in this blog post back in 2004: “Who Needs Stored Procedures, Anyways?” I don’t want to belabor the point and make my case here, but while I agree we don’t need a “stored procedure for every stupid little simple query,” there’s probably some wiggle room in there for a little balance. For example, inline queries for simple OLTP and CRUD operations but stored procedures for reporting or other advanced queries likely to need tuning or DBA/DBRE visibility. In my first few weeks, I started collecting data about this in a document. And to give some idea of how big a change like this would be at Stack Overflow, six months later, I’m still collecting this data and have yet to surface it to teams that would need to be on board to even consider such a shift in thinking.

A few other things that happened within my first 30 days:

  • I came up with a strategy to eliminate multiple redundant indexes on one of our largest and most active tables (Posts)—like with server consistency, this work wasn’t finished until a few months later.
  • I helped several colleagues with different issues around T-SQL logic and/or performance.
  • I contributed to planning quarterly patching and failover performance testing.
  • I helped analyze and thwart a denial-of-service attack.
  • I submitted an important vernacular change to OpServer.
  • I created indexes and queries for our community management team to reduce the time and effort they spend investigating malicious behavior on the network. 

There were, of course, the other role-agnostic and unavoidable housekeeping things—enrolling in benefits, watching security and harassment training, installing required software, testing VPN and RDP software, making sure I’m in the correct set of Active Directory groups, and applying the right stickers in the right places to my new laptop. I also posted this “hello world” image to our all-hands Slack channel:

That was my first month at Stack Overflow. It sounds busy, but I’ve enjoyed my time so far and am excited about seeing how my early efforts impact our team in the longer term. And just wait until I tell you about some of the months since!

THWACK - Symbolize TM, R, and C