Blocking Deploys on Failed Migration Dry-Runs
A migration dry-run that prints a plan but does not stop the pipeline is decoration, not a gate. The classic incident: a developer runs migrate --plan locally, eyeballs the output, and merges; the CI job runs the same dry-run, the command exits non-zero because a referenced column no longer exists, and the deploy proceeds anyway because the step was marked continue-on-error or its output was never asserted. The migration then fails halfway through against production, leaving a partially applied schema. If you have ever seen a green pipeline ship a broken ALTER TABLE, the fix is to turn the dry-run from an advisory log line into a blocking status check whose exit code and emitted DDL both decide whether the deploy continues. This page shows how to wire migrate --plan, flyway migrate -dryRunOutput, and prisma migrate diff as hard gates, and how to diff the planned DDL against an allow-list so an unexpected DROP fails the build rather than reaching production.
Symptom / Error Signatures
The failure shows up in one of three shapes. The first is a dry-run that errored but did not stop the job. In a GitHub Actions log you see the plan command print a stack of red text and then the very next step — the actual deploy — start anyway, because the dry-run step lacked an explicit failure assertion. Flyway prints ERROR: Unable to obtain connection from database or Validate failed: Detected resolved migration not applied to database during -dryRunOutput, yet the pipeline marches on.
The second shape is a dry-run that succeeds but emits DDL nobody expected. prisma migrate diff prints DROP COLUMN or DROP TABLE in its --script output where the author intended only an additive change — typically because a model rename was interpreted as a drop-plus-add. The command exit code is 0; only inspecting the emitted SQL reveals the danger.
The third is the silent skip: the dry-run step has a rules/if condition that did not match the changed paths, so it never ran, and the build is green because the gate was absent. Look for log lines like Job migration_dryrun was skipped next to a migration file in the diff. Any of these means the gate is not actually gating.
Root Cause Analysis
A dry-run is two separable signals — an exit code and a body of generated DDL — and a real gate must honor both. Most broken gates honor neither.
The exit-code half fails when CI swallows the status. Shell pipelines mask it: migrate plan | tee plan.log returns the exit code of tee, not of migrate, so a failed plan looks successful. The fix is set -o pipefail (or capturing ${PIPESTATUS[0]}). YAML-level continue-on-error: true and GitLab allow_failure: true do the same thing more explicitly. The dry-run must be allowed to fail the job.
The DDL-body half fails because the three tools express “unexpected change” differently, and a gate that only checks exit codes never inspects the body at all:
| Tool | Dry-run command | Signals a problem via | What the gate must assert |
|---|---|---|---|
| Generic CLI | migrate --plan |
Non-zero exit on plan error | set -o pipefail; fail job on non-zero |
| Flyway | flyway migrate -dryRunOutput=plan.sql |
Writes planned SQL to a file; validation errors exit non-zero | Grep plan.sql for destructive statements against an allow-list |
| Prisma | prisma migrate diff --script |
Always exits 0 for a valid diff; --exit-code makes a non-empty diff exit 2 |
Inspect --script output for DROP/RENAME; use --exit-code to detect drift |
Prisma is the sharpest trap: prisma migrate diff without --exit-code returns 0 even when the diff is enormous, so a pipeline that only checks the exit status learns nothing. The body is the signal. This is the same backward-compatibility contract enforced more broadly when enforcing backward-compatibility checks in pull requests; the dry-run gate is its first, cheapest line.
Immediate Mitigation
If a bad migration is mid-deploy right now, stop the rollout and treat the schema as possibly half-applied — the recovery for a partially applied step depends on whether your engine has transactional DDL. Then close the gate before the next deploy:
- Make the dry-run able to fail the job. Remove
continue-on-error/allow_failurefrom the step and addpipefailso a masked exit code surfaces.
# CI shell step · read-only DB role · no writes · must fail the job on a bad plan
# Context: runs on every PR touching migrations; non-zero exit blocks the merge/deploy.
set -euo pipefail
migrate --plan --database-url "$PROD_READONLY_URL" | tee plan.log
# pipefail ensures the exit code of `migrate`, not `tee`, decides the step
- Diff the emitted DDL against an allow-list. Treat any statement outside the additive set as a hard failure.
# CI shell step · operates on generated SQL only · no DB connection needed
# Context: fails the build if the plan contains destructive DDL.
flyway migrate -dryRunOutput=plan.sql -url="$PROD_READONLY_URL"
if grep -E -i '\b(DROP|TRUNCATE|RENAME)\b' plan.sql; then
echo "Unexpected destructive DDL in dry-run plan" >&2
exit 1
fi
- For Prisma, force a non-empty diff to be an error. Use
--exit-codeso drift between the schema and the live database stops the build.
# CI shell step · read-only connection to live DB · no migration applied
# Context: --exit-code makes a non-empty diff exit 2, which fails the job.
npx prisma migrate diff \
--from-url "$PROD_READONLY_URL" \
--to-schema-datamodel prisma/schema.prisma \
--script --exit-code > plan.sql || { echo "Schema drift detected"; exit 1; }
grep -E -i '\bDROP\b' plan.sql && { echo "Drop in planned DDL"; exit 1; } || true
- Make the gate required. In branch protection, mark the dry-run check as a required status so a red result cannot be merged past.
Permanent Fix / Long-Term Pattern
A durable dry-run gate has three properties: it always runs when a migration changes, it fails closed, and it classifies the emitted DDL rather than trusting the exit code alone. Encode all three in one stage.
# .gitlab-ci.yml — dry-run as a required, fail-closed stage
# Context: read-only DB role; emits a plan, then asserts the plan is additive-only.
migration_dryrun:
stage: verify
rules:
- changes: [ "migrations/**/*", "prisma/schema.prisma" ]
script:
- set -euo pipefail
- flyway migrate -dryRunOutput=plan.sql -url="$PROD_READONLY_URL" | tee plan.log
- ./scripts/assert-additive-ddl.sh plan.sql # allow-list: CREATE/ADD COLUMN/CREATE INDEX
artifacts:
paths: [ plan.sql, plan.log ] # keep the plan for review on every run
allow_failure: false # blocks the deploy
The allow-list — not a deny-list — is the part that ages well. A deny-list of DROP, TRUNCATE, RENAME will miss the next destructive verb someone discovers. An allow-list permits exactly the statements your safe-change policy sanctions (ADD COLUMN, CREATE INDEX CONCURRENTLY, CREATE TABLE) and fails everything else, including DDL you have not yet classified. Generate the plan into a CI artifact so reviewers see the exact SQL that will run, and so the dry-run output becomes the reviewed unit rather than the developer’s local memory. For the full set of checks this gate belongs to — checksum verification, lock budgets, merge-queue determinism — see the migration pipeline gating overview, and pair the plan-time gate with testing migrations against production-like snapshots so a plan that looks additive but rewrites the table on real data is also caught.
Verification Checklist
set -o pipefail(or capturesPIPESTATUS) so a masked exit code fails the job.continue-on-error/allow_failure: trueremains on the dry-run step.--exit-codeso a non-empty diff exits non-zero.
Frequently Asked Questions
Why does prisma migrate diff exit 0 even when the plan is wrong?
By default prisma migrate diff is informational and returns 0 for any valid comparison, regardless of how large or destructive the diff is. Add --exit-code to make a non-empty diff exit 2, then inspect the --script output for DROP or RENAME so the gate reacts to the content, not just the status.
Should the dry-run connect to production or to a snapshot? Connect with a read-only role to a snapshot or to production for the diff, but never let the dry-run apply anything. The plan must reflect the real live schema to catch drift, yet a dry-run by definition performs no writes. Apply-and-measure belongs to a separate snapshot-based test stage.
A rename keeps showing up as DROP plus ADD — how do I let it through?
Express the rename as an explicit, reviewed migration (for example ALTER TABLE ... RENAME COLUMN) rather than letting the tool infer it, and add that statement to the allow-list for that specific migration. An inferred drop-plus-add is exactly the backward-incompatible change the gate exists to stop, so it should fail until a human encodes the safe form.