Read/Write Splitting Tactics
Implementing read/write routing during schema transitions requires strict proxy controls, deterministic traffic classification, and continuous replication monitoring to prevent data divergence. This operational guide details the exact lag thresholds, failover sequences, and validation gates required for safe execution, building directly on the foundational principles outlined in Zero-Downtime Schema Evolution Patterns.
Phase 1: Proxy Routing & Circuit Breaker Configuration
Route read traffic through a connection proxy that enforces strict query classification before initiating schema changes. The proxy must parse query semantics, route SELECT statements to replicas, and enforce circuit-breaker thresholds to prevent stale reads during replication lag spikes. Reference Configuring Read Replicas for Seamless Schema Updates for baseline topology validation.
Configuration & Routing
Deploy a proxy configuration that explicitly separates read/write hosts and respects transaction boundaries.
-- ProxySQL / PgBouncer equivalent routing rules
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply) VALUES
(1, 1, '^SELECT .*', 1, 1), -- Route reads to hostgroup 1 (replicas)
(2, 1, '^SET | ^START TRANSACTION | ^COMMIT | ^ROLLBACK', 0, 1); -- Force writes/tx to hostgroup 0 (primary)
-- Application-level transaction boundary enforcement
BEGIN;
SET @@session.sql_mode = 'STRICT_TRANS_TABLES';
INSERT INTO audit_log (event, ts) VALUES ('schema_migration_start', NOW());
COMMIT;
Dry-Run & Validation
# Dry-run: Validate routing rules without applying to production traffic
proxyadmin -e "LOAD MYSQL QUERY RULES TO RUNTIME; SELECT * FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%';"
# Circuit breaker validation (simulate 600ms lag)
./simulate_replication_lag.sh --target=replica-02 --lag-ms=600 --duration=10s
Rollback & Forward Paths
- Forward: Apply routing rules to runtime, monitor
stats_mysql_query_digestfor 5 minutes, then commit to disk. - Rollback:
proxyadmin -e "RESET MYSQL QUERY RULES; LOAD MYSQL QUERY RULES FROM DISK;". Revert connection strings to direct primary-only routing.
Compatibility Window
- Scope: 99.9% backward compatible. Requires application connection pool to support dynamic endpoint resolution.
- Duration: 15–30 minutes for validation, zero application downtime during rule application.
Environment Context
| Environment | Execution Strategy | Validation Gate |
|---|---|---|
| Dev | Mock proxy with synthetic lag injection | Unit tests assert SELECT hits replica hostgroup |
| Staging | Canary deployment (10% traffic) | Circuit breaker triggers at 400ms threshold |
| Prod | Blue/green proxy rollout | Automated rollback if error rate > 0.5% |
Phase 2: Schema Version Alignment & Drift Detection
During the migration window, enforce strict schema version parity across all nodes. When applying additive or non-blocking changes, align your routing logic with the Expand and Contract Methodology to ensure backward-compatible reads and prevent query failures on lagging replicas.
Configuration & Routing
Run automated checksum validation against the primary every 60 seconds. Wrap validation in explicit transaction boundaries to prevent phantom reads during drift checks.
-- Drift detection query (MySQL/PostgreSQL compatible)
BEGIN;
SELECT
TABLE_NAME,
CHECKSUM(TABLE_SCHEMA, TABLE_NAME) AS schema_hash,
COUNT(*) AS row_count
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'production_db'
GROUP BY TABLE_NAME;
COMMIT;
Dry-Run & Validation
# Dry-run: Compare primary vs replica schema hashes without locking
pt-table-checksum --replicate=production_db.checksums --databases=production_db --no-check-binlog-format --dry-run
# Validate replica catch-up
./check_replication_lag.sh --max-lag-seconds=2 --timeout=60
Rollback & Forward Paths
- Forward: If checksums match across all nodes, proceed to Phase 3. Update application routing flags to
schema_version=v2. - Rollback: Pause migration, force replica resync via
STOP SLAVE; START SLAVE;(or equivalent), verify connection pool respects updated routing rules, and revert schema flag tov1.
Compatibility Window
- Scope: Additive changes only (new columns, indexes, nullable fields). Destructive or type-altering changes require separate contract phase.
- Duration: 2–4 hours for full cluster sync. Must complete before peak traffic windows.
Environment Context
| Environment | Execution Strategy | Validation Gate |
|---|---|---|
| Dev | Rapid forward/backward sync cycles | Automated drift alerts in CI pipeline |
| Staging | Chaos engineering: kill replica mid-sync | Verify app handles ER_SLAVE_HEARTBEAT_FAILURE gracefully |
| Prod | Strict gating via deployment orchestrator | Block cutover if drift > 0.1% or lag > 500ms |
Phase 3: Traffic Cutover & Consistency Gates
Transitioning write traffic requires deterministic routing and explicit consistency verification. Deploy Read/Write Splitting for High-Availability Migrations to handle failover routing without triggering application-level connection pool exhaustion. If your architecture requires simultaneous writes during the transition window, integrate Dual-Write Synchronization to maintain idempotent transaction logs and prevent primary key collisions.
Configuration & Routing
Implement a feature-flagged routing layer that atomically switches write endpoints while preserving transaction isolation.
# Pseudocode: Feature-flagged write router with explicit TX boundaries
def execute_migration_write(payload):
if feature_flags.is_enabled("write_routing_v2"):
conn = get_primary_v2()
else:
conn = get_primary_v1()
with conn.transaction() as tx:
tx.execute("INSERT INTO orders (id, status, migrated_at) VALUES (%s, %s, NOW())",
payload.id, payload.status)
# Read-after-write consistency gate
row = tx.execute("SELECT status FROM orders WHERE id = %s", payload.id)
assert row[0] == payload.status, "Consistency gate failed"
Dry-Run & Validation
# Dry-run: Simulate cutover with shadow writes
curl -X POST /api/internal/db-cutover/dry-run \
-H "X-Feature-Flag: write_routing_v2" \
-d '{"mode": "shadow", "duration": "300s"}'
# Validate read-after-write consistency
./verify_consistency.sh --endpoint=/api/orders --read-after-write=true --threshold=50ms
Rollback & Forward Paths
- Forward: Flip feature flag to
100%, monitor connection pool saturation, verifyread_after_writelatency < 100ms. - Rollback: Revert flag to
0%, halt all schema application, trigger automated reconciliation scripts to sync divergent writes, and validate primary key sequences.
Compatibility Window
- Scope: Requires idempotent write patterns and sequence synchronization. Dual-write mode adds ~15% CPU overhead.
- Duration: 15–30 minutes. Must execute during maintenance window if dual-write is active.
Environment Context
| Environment | Execution Strategy | Validation Gate |
|---|---|---|
| Dev | Local flag toggle with mock DBs | Unit tests assert idempotency on duplicate payloads |
| Staging | 50/50 traffic split for 1 hour | Pool exhaustion alert if active_connections > 80% |
| Prod | Canary rollout (1% → 10% → 50% → 100%) | Auto-rollback if error rate > 0.1% or latency > 200ms |
Phase 4: Rollback Protocols & Parity Verification
Every split configuration must include a tested, automated rollback path. Maintain a snapshot of the pre-migration routing rules and enforce strict configuration versioning. If data inconsistency, elevated error rates, or unacceptable latency is detected post-cutover, immediately revert write routing to the original primary, halt all schema application, and trigger automated reconciliation scripts.
Configuration & Routing
Version control proxy configs and routing flags. Wrap reconciliation in explicit transaction boundaries to prevent partial state.
#!/bin/bash
# rollback.sh
set -euo pipefail
TARGET_VERSION="${1:-pre-migration}"
echo "[$(date)] Initiating rollback to ${TARGET_VERSION}..."
# 1. Revert proxy routing
proxyadmin -e "LOAD MYSQL QUERY RULES FROM DISK WHERE version='${TARGET_VERSION}';"
# 2. Revert feature flags
curl -X PATCH /api/internal/flags/write_routing_v2 -d '{"enabled": false}'
# 3. Reconcile divergent transactions
psql -c "BEGIN; SELECT reconcile_divergent_writes(); COMMIT;"
Dry-Run & Validation
# Dry-run: Validate rollback sequence without applying state changes
./rollback.sh --dry-run --target=pre-migration --verify-only
# Post-rollback parity check
./verify_parity.sh --primary=old-writer --replica=replica-01 --timeout=120
Rollback & Forward Paths
- Forward: If rollback succeeds and parity is confirmed, archive migration artifacts, update runbooks, and resume normal operations.
- Rollback: If reconciliation fails, isolate the affected shard, switch to read-only mode, and escalate to DBA on-call for manual binlog/WAL replay.
Compatibility Window
- Scope: Full backward compatibility required. Reconciliation scripts must handle out-of-order transactions.
- Duration: < 5 minutes for automated execution. Manual escalation adds 15–30 minutes.
Environment Context
| Environment | Execution Strategy | Validation Gate |
|---|---|---|
| Dev | Automated CI rollback simulation | Asserts row_count matches pre-migration snapshot |
| Staging | Full failover drill weekly | Validates reconciliation completes within SLA |
| Prod | Runbook-driven, human-in-the-loop | Requires dual-approval for manual escalation |
Read/write splitting during schema migrations demands rigorous operational discipline. By enforcing strict proxy routing, continuous lag monitoring, and deterministic rollback sequences, engineering teams can execute complex versioning transitions without service interruption. Always validate environment parity before deployment and maintain explicit safety gates at every routing transition point.