Scaling to 10M Daily Requests

This wasn't a traffic spike. It was sustained, compounding growth—30% month-over-month for eight consecutive months. Our monolithic payments API was the bottleneck: in-memory session state prevented horizontal scaling, database connection pools were exhausted, and P95 latency climbed from 180ms to 420ms. We had to decompose the architecture while serving production traffic with zero downtime.

The Growth Curve

A B2B payments API processing 800K requests/day in January hit 2.4M by April and 6M by July. Traditional vertical scaling (bigger instances) bought us weeks, not months. The monolith's shared-memory architecture made horizontal scaling impossible.

The Zero-Downtime Constraint

Payment processing cannot stop. No maintenance windows. No "deploy and pray." Every architectural change had to happen transparently to customers while handling live transaction volume.

Latency ceiling 420ms P95 at peak load inside the monolith

Architecture 1 service Tightly coupled monolith, shared DB

Database limit 250 conns Connection pool exhaustion during business hours

Daily volume 800K Requests per day when we started

Service Extraction

We decomposed the monolith into payments, accounts, and fraud services. Each domain now owns its data store and scaling policies.

Three bounded contexts · independent deploys

Stateless Front Door

Session state moved to JWT tokens with Redis-backed refresh, so we could scale application nodes horizontally without coordination.

JWT auth · Redis session fabric

Feature-Flagged Strangler Fig

API gateway routing stayed under LaunchDarkly control. Dual writes validated new services before traffic ramped past 10%, then 50%, then 100%.

Zero downtime · instant rollback

How We Decomposed the Monolith

Domain-Driven Service Boundaries

We identified three bounded contexts in the monolith: payment processing, account lifecycle, and fraud detection. Each became an independent service with its own database. The monolith's shared PostgreSQL instance was the bottleneck—connection pool exhaustion at 250 connections meant we couldn't scale horizontally.

                        Monolith: 1 service, 1 DB, 250 connection limit

                        After: 3 services, 3 DBs, 750 total connections

                        Payment Service: RDS PostgreSQL (100 conn)

                        Account Service: Aurora PostgreSQL (200 conn)

                        Fraud Service: DynamoDB (unlimited)

                        Result: 3x connection capacity, independent scaling

Event-Driven Communication

Services communicate via SNS/SQS event bus instead of synchronous HTTP calls. Payment events trigger account updates asynchronously. Fraud checks happen in parallel. Decoupled services meant one slow service couldn't cascade failures to others.

                        POST /~api/payment

                        1. Payment service validates & writes to DB

                        2. Publishes PaymentCreated event to SNS

                        3. Returns 202 Accepted (18ms)

                        4. Account service consumes event (async)

                        5. Fraud service consumes event (async)

                        No blocking calls, no cascading timeouts

Strangler Fig Pattern with Feature Flags

We built new services behind feature flags in the API gateway. Traffic routing was controlled via LaunchDarkly—0% to new service, validate correctness with dual writes, ramp to 10%, 50%, 100%. Rollback was instant. The monolith stayed live until each service proved itself at full load.

                        API Gateway routing logic:

                        if feature_flag('payment_service_v2') > random():

                            route to new Payment Service

                        else:

                            route to Monolith

                        Dual-write validation for 2 weeks at 10%

                        Full cutover after 6 weeks of gradual ramp

Six Months of Architectural Evolution

Months 1-2 · Domain analysis

Mapped bounded contexts, sketched API contracts, and traced cross-service calls to see where coupling lived. Defined event schemas before writing code.

Month 3 · Payment service

Payment logic moved to its own service and database, shipped behind a feature flag at 0%. Dual-write validation compared monolith vs service responses before we ramped to 10% traffic.

Month 4 · Event bus + accounts

Accounts service came online with SNS/SQS fan-out. Payments now publishes events instead of calling accounts synchronously, letting each service scale separately.

Months 5-6 · Fraud + decommission

Fraud detection moved to DynamoDB-backed service. After two weeks of shadow mode we drained the monolith entirely and handled 10M+ requests on the new mesh.

What Changed

Monolithic Architecture

P95 Latency

420ms

Daily Request Volume

800K

Services

1 monolith

DB Connection Pool

250 max

Deployment Frequency

1x/week

Distributed Services

P95 Latency

175ms

58% improvement

Daily Request Volume

10M+

12.5x growth handled

Services

3 independent

Isolated scaling

DB Connection Pool

750+ total

3x capacity

Deployment Frequency

5x/day

Independent deploys

What We Actually Learned

Strangler Fig Beats Big Bang Every Time

We considered a full rewrite. It would have taken 18 months and risked everything. Extracting services gradually kept the monolith alive until each new service proved itself.

Shared Databases Are the Real Coupling

As long as services shared the monolith's database, we couldn't scale independently. The breakthrough was giving each service its own persistence layer.

Event-Driven Architecture Eliminates Cascading Failures

Synchronous service-to-service calls created tight coupling. SNS/SQS events decoupled everything—payments publish once, downstream services act asynchronously.

Dual-Write Validation Builds Confidence

Shadow traffic surfaced discrepancies we would have missed. By the time we ramped to 100%, we trusted the new services because the data said so.