Development databases in Docker aren’t good enough

June 22, 2021 · 3 min read

Chris Heppell

Development databases in Docker aren’t good enough on their own. Why? Because they’re almost always so far from the production environment characteristics that you get a false sense of security in development.

Having isolated databases is far better than a shared environment where other developers trample over your changes. But because dev databases tend to either be empty, or have “happy path” data within them, they never truly demonstrate the behaviours you’ll end up seeing in production.

This leads to a variety of different problems:

Unexpected data loss during schema migrations
Unacceptable latency on specific queries because of vastly different data sizes
Poor UX due to unanticipated user-provided data
UI glitches or performance issues not caught in lower environments because of unrealistic data
Entire branches of code left unexercised due to conditions on the data not caught in lower environments

I think I lost more data due to database bugs in production than anything else.
— JBD ヤナドガン (@rakyll) June 19, 2021

It can be slightly better#

One thing you could do is to grab a production-like backup and restore that to your lower environments like development. One issue with that is how long that can take. If your production data is on the order of gigabytes, then you’re looking at potentially hours to restore that data. And what happens if you mess it up in dev? You’ll have to restore all over again.

Even then, you need somewhere to store that data on your machine and dedicate the necessary compute for it - taking away those precious CPU cycles from your responsive development experience.

It can be much better#

Docker solves the problem of no longer needing to worry about how to install database engines on your machine. You get deterministic environments that are the same as the rest of your development team due to the containerised nature.

But it’s not as easy as that. You still need to know the specifics about how to mount persistent volumes so your data doesn’t disappear when your container stops. You still need to know how to handle readiness checks of those databases. You still need to know the specific flags or environment variables to pass on startup to ensure that your database is ready to go. That’s extra overhead and expertise for devs to apply in addition to the task they are actually working on.

Thankfully, spinning up a variety of different database engines from production-like datasets instantly regardless of datasize is possible with Spawn. Now, your entire dev team have the same database environment but also exactly the same data.

Whether it’s MongoDB, PostgreSQL, SQL Server, MySQL or Redis there’s a single unified command for bringing up a copy of a production-like data set instantly for you to start using in development and CI:

spawnctl create data-container --image postgres11:prod --name my-copy-of-prod

Once you’ve got that instance, you can manipulate it in ways you’ve never been able to before.

Accidentally deleted all the rows because of a missed WHERE clause?

spawnctl reset data-container my-copy-of-prod

Managed to reproduce a bug you want to share with a colleague?

spawnctl save data-container my-copy-of-prod
spawnctl graduate data-container my-copy-of-prod --revision rev.1 --team devs

Want to swap between branches and keep your database changes with them?

spawnctl create data-container --name feature-branch1
spawnctl create data-container --name feature-branch2

Forget about remembering flags, environment variables, and Docker config. Get back your time, CPU cycles and disk space with Spawn right now for free.

Recent posts

It can be slightly better#

It can be much better#