TechAni

Scaling to 10M Daily Requests

How we evolved a fintech API from 800K to 10M+ daily requests through architectural decomposition, eliminating the monolith bottleneck while maintaining zero downtime and cutting P95 latency by 58%.
Timeline: 6 months
Team: Platform Engineering + SRE
Industry: Fintech payments platform
Published: September 15, 2023
This wasn't a traffic spike. It was sustained, compounding growth—30% month-over-month for eight consecutive months. Our monolithic payments API was the bottleneck: in-memory session state prevented horizontal scaling, database connection pools were exhausted, and P95 latency climbed from 180ms to 420ms. We had to decompose the architecture while serving production traffic with zero downtime.

The Growth Curve

A B2B payments API processing 800K requests/day in January hit 2.4M by April and 6M by July. Traditional vertical scaling (bigger instances) bought us weeks, not months. The monolith's shared-memory architecture made horizontal scaling impossible.

The Zero-Downtime Constraint

Payment processing cannot stop. No maintenance windows. No "deploy and pray." Every architectural change had to happen transparently to customers while handling live transaction volume.

Latency ceiling 420ms P95 at peak load inside the monolith
Architecture 1 service Tightly coupled monolith, shared DB
Database limit 250 conns Connection pool exhaustion during business hours
Daily volume 800K Requests per day when we started
01

Service Extraction

We decomposed the monolith into payments, accounts, and fraud services. Each domain now owns its data store and scaling policies.

Three bounded contexts · independent deploys
02

Stateless Front Door

Session state moved to JWT tokens with Redis-backed refresh, so we could scale application nodes horizontally without coordination.

JWT auth · Redis session fabric
03

Feature-Flagged Strangler Fig

API gateway routing stayed under LaunchDarkly control. Dual writes validated new services before traffic ramped past 10%, then 50%, then 100%.

Zero downtime · instant rollback

How We Decomposed the Monolith

Domain-Driven Service Boundaries

We identified three bounded contexts in the monolith: payment processing, account lifecycle, and fraud detection. Each became an independent service with its own database. The monolith's shared PostgreSQL instance was the bottleneck—connection pool exhaustion at 250 connections meant we couldn't scale horizontally.

Monolith: 1 service, 1 DB, 250 connection limit
After: 3 services, 3 DBs, 750 total connections
Payment Service: RDS PostgreSQL (100 conn)
Account Service: Aurora PostgreSQL (200 conn)
Fraud Service: DynamoDB (unlimited)
Result: 3x connection capacity, independent scaling

Event-Driven Communication

Services communicate via SNS/SQS event bus instead of synchronous HTTP calls. Payment events trigger account updates asynchronously. Fraud checks happen in parallel. Decoupled services meant one slow service couldn't cascade failures to others.

POST /~api/payment
1. Payment service validates & writes to DB
2. Publishes PaymentCreated event to SNS
3. Returns 202 Accepted (18ms)
4. Account service consumes event (async)
5. Fraud service consumes event (async)
No blocking calls, no cascading timeouts

Strangler Fig Pattern with Feature Flags

We built new services behind feature flags in the API gateway. Traffic routing was controlled via LaunchDarkly—0% to new service, validate correctness with dual writes, ramp to 10%, 50%, 100%. Rollback was instant. The monolith stayed live until each service proved itself at full load.

API Gateway routing logic:
if feature_flag('payment_service_v2') > random():
route to new Payment Service
else:
route to Monolith

Dual-write validation for 2 weeks at 10%
Full cutover after 6 weeks of gradual ramp

Six Months of Architectural Evolution

Months 1-2 · Domain analysis

Mapped bounded contexts, sketched API contracts, and traced cross-service calls to see where coupling lived. Defined event schemas before writing code.

Month 3 · Payment service

Payment logic moved to its own service and database, shipped behind a feature flag at 0%. Dual-write validation compared monolith vs service responses before we ramped to 10% traffic.

Month 4 · Event bus + accounts

Accounts service came online with SNS/SQS fan-out. Payments now publishes events instead of calling accounts synchronously, letting each service scale separately.

Months 5-6 · Fraud + decommission

Fraud detection moved to DynamoDB-backed service. After two weeks of shadow mode we drained the monolith entirely and handled 10M+ requests on the new mesh.

What Changed

Monolithic Architecture

P95 Latency
420ms
Daily Request Volume
800K
Services
1 monolith
DB Connection Pool
250 max
Deployment Frequency
1x/week

Distributed Services

P95 Latency
175ms
58% improvement
Daily Request Volume
10M+
12.5x growth handled
Services
3 independent
Isolated scaling
DB Connection Pool
750+ total
3x capacity
Deployment Frequency
5x/day
Independent deploys

What We Actually Learned

Strangler Fig Beats Big Bang Every Time

We considered a full rewrite. It would have taken 18 months and risked everything. Extracting services gradually kept the monolith alive until each new service proved itself.

Shared Databases Are the Real Coupling

As long as services shared the monolith's database, we couldn't scale independently. The breakthrough was giving each service its own persistence layer.

Event-Driven Architecture Eliminates Cascading Failures

Synchronous service-to-service calls created tight coupling. SNS/SQS events decoupled everything—payments publish once, downstream services act asynchronously.

Dual-Write Validation Builds Confidence

Shadow traffic surfaced discrepancies we would have missed. By the time we ramped to 100%, we trusted the new services because the data said so.