Blocking Deploys on Failed Migration Dry-Runs

A migration dry-run that prints a plan but does not stop the pipeline is decoration, not a gate. The classic incident: a developer runs migrate --plan locally, eyeballs the output, and merges; the CI job runs the same dry-run, the command exits non-zero because a referenced column no longer exists, and the deploy proceeds anyway because the step was marked continue-on-error or its output was never asserted. The migration then fails halfway through against production, leaving a partially applied schema. If you have ever seen a green pipeline ship a broken ALTER TABLE, the fix is to turn the dry-run from an advisory log line into a blocking status check whose exit code and emitted DDL both decide whether the deploy continues. This page shows how to wire migrate --plan, flyway migrate -dryRunOutput, and prisma migrate diff as hard gates, and how to diff the planned DDL against an allow-list so an unexpected DROP fails the build rather than reaching production.

Dry-run as a hard gate A migration plan flows into a gate that checks the command exit code and diffs the emitted DDL against an allow-list; failure blocks the deploy, success promotes it. Dry-Run as a Blocking Gate migrate --plan emit DDL Gate exit code + DDL allow-list Deploy BLOCKED non-zero / DROP Deploy PROMOTED clean plan
The dry-run feeds a single gate: the command's exit code and the emitted DDL together decide whether the deploy is blocked or promoted.

Symptom / Error Signatures

The failure shows up in one of three shapes. The first is a dry-run that errored but did not stop the job. In a GitHub Actions log you see the plan command print a stack of red text and then the very next step — the actual deploy — start anyway, because the dry-run step lacked an explicit failure assertion. Flyway prints ERROR: Unable to obtain connection from database or Validate failed: Detected resolved migration not applied to database during -dryRunOutput, yet the pipeline marches on.

The second shape is a dry-run that succeeds but emits DDL nobody expected. prisma migrate diff prints DROP COLUMN or DROP TABLE in its --script output where the author intended only an additive change — typically because a model rename was interpreted as a drop-plus-add. The command exit code is 0; only inspecting the emitted SQL reveals the danger.

The third is the silent skip: the dry-run step has a rules/if condition that did not match the changed paths, so it never ran, and the build is green because the gate was absent. Look for log lines like Job migration_dryrun was skipped next to a migration file in the diff. Any of these means the gate is not actually gating.

Root Cause Analysis

A dry-run is two separable signals — an exit code and a body of generated DDL — and a real gate must honor both. Most broken gates honor neither.

The exit-code half fails when CI swallows the status. Shell pipelines mask it: migrate plan | tee plan.log returns the exit code of tee, not of migrate, so a failed plan looks successful. The fix is set -o pipefail (or capturing ${PIPESTATUS[0]}). YAML-level continue-on-error: true and GitLab allow_failure: true do the same thing more explicitly. The dry-run must be allowed to fail the job.

The DDL-body half fails because the three tools express “unexpected change” differently, and a gate that only checks exit codes never inspects the body at all:

Tool Dry-run command Signals a problem via What the gate must assert
Generic CLI migrate --plan Non-zero exit on plan error set -o pipefail; fail job on non-zero
Flyway flyway migrate -dryRunOutput=plan.sql Writes planned SQL to a file; validation errors exit non-zero Grep plan.sql for destructive statements against an allow-list
Prisma prisma migrate diff --script Always exits 0 for a valid diff; --exit-code makes a non-empty diff exit 2 Inspect --script output for DROP/RENAME; use --exit-code to detect drift

Prisma is the sharpest trap: prisma migrate diff without --exit-code returns 0 even when the diff is enormous, so a pipeline that only checks the exit status learns nothing. The body is the signal. This is the same backward-compatibility contract enforced more broadly when enforcing backward-compatibility checks in pull requests; the dry-run gate is its first, cheapest line.

Immediate Mitigation

If a bad migration is mid-deploy right now, stop the rollout and treat the schema as possibly half-applied — the recovery for a partially applied step depends on whether your engine has transactional DDL. Then close the gate before the next deploy:

  1. Make the dry-run able to fail the job. Remove continue-on-error/allow_failure from the step and add pipefail so a masked exit code surfaces.
# CI shell step · read-only DB role · no writes · must fail the job on a bad plan
# Context: runs on every PR touching migrations; non-zero exit blocks the merge/deploy.
set -euo pipefail
migrate --plan --database-url "$PROD_READONLY_URL" | tee plan.log
# pipefail ensures the exit code of `migrate`, not `tee`, decides the step
  1. Diff the emitted DDL against an allow-list. Treat any statement outside the additive set as a hard failure.
# CI shell step · operates on generated SQL only · no DB connection needed
# Context: fails the build if the plan contains destructive DDL.
flyway migrate -dryRunOutput=plan.sql -url="$PROD_READONLY_URL"
if grep -E -i '\b(DROP|TRUNCATE|RENAME)\b' plan.sql; then
  echo "Unexpected destructive DDL in dry-run plan" >&2
  exit 1
fi
  1. For Prisma, force a non-empty diff to be an error. Use --exit-code so drift between the schema and the live database stops the build.
# CI shell step · read-only connection to live DB · no migration applied
# Context: --exit-code makes a non-empty diff exit 2, which fails the job.
npx prisma migrate diff \
  --from-url "$PROD_READONLY_URL" \
  --to-schema-datamodel prisma/schema.prisma \
  --script --exit-code > plan.sql || { echo "Schema drift detected"; exit 1; }
grep -E -i '\bDROP\b' plan.sql && { echo "Drop in planned DDL"; exit 1; } || true
  1. Make the gate required. In branch protection, mark the dry-run check as a required status so a red result cannot be merged past.

Permanent Fix / Long-Term Pattern

A durable dry-run gate has three properties: it always runs when a migration changes, it fails closed, and it classifies the emitted DDL rather than trusting the exit code alone. Encode all three in one stage.

# .gitlab-ci.yml — dry-run as a required, fail-closed stage
# Context: read-only DB role; emits a plan, then asserts the plan is additive-only.
migration_dryrun:
  stage: verify
  rules:
    - changes: [ "migrations/**/*", "prisma/schema.prisma" ]
  script:
    - set -euo pipefail
    - flyway migrate -dryRunOutput=plan.sql -url="$PROD_READONLY_URL" | tee plan.log
    - ./scripts/assert-additive-ddl.sh plan.sql   # allow-list: CREATE/ADD COLUMN/CREATE INDEX
  artifacts:
    paths: [ plan.sql, plan.log ]   # keep the plan for review on every run
  allow_failure: false              # blocks the deploy

The allow-list — not a deny-list — is the part that ages well. A deny-list of DROP, TRUNCATE, RENAME will miss the next destructive verb someone discovers. An allow-list permits exactly the statements your safe-change policy sanctions (ADD COLUMN, CREATE INDEX CONCURRENTLY, CREATE TABLE) and fails everything else, including DDL you have not yet classified. Generate the plan into a CI artifact so reviewers see the exact SQL that will run, and so the dry-run output becomes the reviewed unit rather than the developer’s local memory. For the full set of checks this gate belongs to — checksum verification, lock budgets, merge-queue determinism — see the migration pipeline gating overview, and pair the plan-time gate with testing migrations against production-like snapshots so a plan that looks additive but rewrites the table on real data is also caught.

Verification Checklist

  • set -o pipefail (or captures PIPESTATUS) so a masked exit code fails the job.
  • continue-on-error / allow_failure: true remains on the dry-run step.
  • --exit-code so a non-empty diff exits non-zero.

Frequently Asked Questions

Why does prisma migrate diff exit 0 even when the plan is wrong? By default prisma migrate diff is informational and returns 0 for any valid comparison, regardless of how large or destructive the diff is. Add --exit-code to make a non-empty diff exit 2, then inspect the --script output for DROP or RENAME so the gate reacts to the content, not just the status.

Should the dry-run connect to production or to a snapshot? Connect with a read-only role to a snapshot or to production for the diff, but never let the dry-run apply anything. The plan must reflect the real live schema to catch drift, yet a dry-run by definition performs no writes. Apply-and-measure belongs to a separate snapshot-based test stage.

A rename keeps showing up as DROP plus ADD — how do I let it through? Express the rename as an explicit, reviewed migration (for example ALTER TABLE ... RENAME COLUMN) rather than letting the tool infer it, and add that statement to the allow-list for that specific migration. An inferred drop-plus-add is exactly the backward-incompatible change the gate exists to stop, so it should fail until a human encodes the safe form.