Running tests against databases in CI pipelines is an essential part of testing your application.
Provisioning databases in CI pipelines can be hard work, however. Broadly speaking you have two options:
- Have all your pipelines use shared databases
- Use Docker to run containerised database instances
The first option has the advantage that you can test against real data, perhaps a recently restored copy of production, but it effectively serializes your pipelines as they contend for the shared database. You may be able to scale by adding multiple database servers, but ultimately the parallelism of your CI pipelines is limited by the number of database servers you have available to the pipelines.
The second option is a substantial improvement in terms of parallelism as each pipeline run now has a dedicated database spun up and torn down for exclusive use. However, the problem of testing against realistic data is now more acute. Typically where we see Docker being used to provision databases in CI pipelines, we see the use of seed data stored in the code repository used to populate the containerised database. This means you lose the confidence that you gain from testing against a realistic data set. If you don't want to go down the route of using seed data, you need to manage a docker volume inside your pipeline, or run a lengthy database restore operation in each pipeline run.