Running tests against databases in CI pipelines is an essential part of testing your application.
Provisioning databases in CI pipelines can be hard work, however. Broadly speaking you have two options:
- Have all your pipelines use shared databases
- Use Docker to run containerised database instances
The first option has the advantage that you can test against real data, perhaps a recently restored copy of production, but it effectively serializes your pipelines as they contend for the shared database. You may be able to scale by adding multiple database servers, but ultimately the parallelism of your CI pipelines is limited by the number of database servers you have available to the pipelines.
The second option is a substantial improvement in terms of parallelism as each pipeline run now has a dedicated database spun up and torn down for exclusive use. However, the problem of testing against realistic data is now more acute. Typically where we see Docker being used to provision databases in CI pipelines, we see the use of seed data stored in the code repository used to populate the containerised database. This means you lose the confidence that you gain from testing against a realistic data set. If you don't want to go down the route of using seed data, you need to manage a docker volume inside your pipeline, or run a lengthy database restore operation in each pipeline run.
Fortunately, there is a third way of testing against databases in CI that has all the advantages of both these approaches. Spawn allows you run an arbitrary number of pipelines in parallel, use realistic test data in all your tests, and not have to worry about Docker volume management or lengthy restore operations. The remainder of this article assumes that you are familiar with the basic concepts of Spawn; data images and data containers. If not, sign up for Spawn (its free!) and get started.
We'll create a simple Github Actions workflow that uses Spawn to provision some databases for us to run database migration tests against.
The action takes the name of the data image from which to create the data container, and a lifetime after which the data container will be automatically destroyed.
The create container action has a number of outputs that allow us to connect to the new database server in later steps. The example workflow below shows how to connect to the new data container and run some database migration tests on it:
Every run of this workflow runs against a freshly provisioned, cloud hosted, isolated database server. The server spins up in seconds, regardless of the size of the data image from which it is created. We are testing our migrations against a real database, with realistic data and we are able to do it in parallel with other pipeline runs.
During a test run, it is often desirable to save the state of the database so that it can be rolled back to later in the test run. There is no good way to do this either when working with shared database servers or with servers provisioned with Docker, but Spawn allows for easy save and reset of any data container. We can take advantage of this functionality using the save and reset actions:
One application of save and reset would be to ensure that any tests in a workflow don't mutate the state of the database for any subsequent tests.
Working with databases in CI is often a choice between connecting to live servers vs using containers. Each of these approaches has benefits and drawbacks. Spawn gives us the best of both worlds; realistic live data that spins up in seconds, and the parallelism and isolation that we get from containerised instances. Coupled with Github Actions to lessen the scripting required to invoke the Spawn CLI, we get smooth, frictionless database CI in our workflows.
If you want to experience how Spawn can make working with databases in CI so much easier, sign up for free now.