Reducing Real-World Latency from 14s to 200ms on a Zero-Budget Stack

This is the story of how I reduced real-world latency from about 14 seconds to around 200 milliseconds for a production website without paying for any infrastructure.

The project was the official GDG On Campus website at my university. I built the stack solo, deployed it myself, and then spent roughly a week optimizing it post-launch.

The non-negotiable constraint was simple: zero budget. No paid tiers, no trials, no credit card.

Stack and context

Frontend: Vite app on Cloudflare Pages
Backend: Django API
Database: Supabase PostgreSQL (SQLite during local development)
Primary users: India

Locally, everything felt fast. The problems only became visible in real deployment conditions.

Part 1: Anatomy of a 14-second response

When “working” is not production-ready

My first public backend was exposed from my laptop through ngrok. It worked functionally, but not operationally. A laptop is not production infrastructure.

Moving to hosted platforms immediately made one limitation obvious: SQLite is fragile in ephemeral container filesystems. Restarts and filesystem resets make it unreliable for persistent production state.

So migration to managed Postgres became mandatory.

Migration friction

The migration was useful in its own way:

PowerShell exported JSON as UTF-16 by default.
PostgreSQL expected UTF-8.
Encoding conversion introduced minor corruption.

I corrected critical records manually, validated outputs, and continued. Perfect migration was less important than correct production behavior.

Render instability under free-tier constraints

My first hosted backend target was Render free tier. The process looked alive, but requests eventually timed out and the service became functionally dead.

Root cause pattern:

Supabase moved to IPv6.
Render free tier was IPv4-only.
Connection pooling was required as a bridge.
Django persistent DB connections plus sleeping containers produced stale or exhausted sockets.

HTTP health checks could keep the service awake, but they did not heal broken database-layer state.

At that point, this was no longer a tuning issue. It was an architecture mismatch under free-tier limits.

Stable but geographically expensive

I moved backend hosting to Northflank. Stability improved immediately, but latency was still severe:

free backend region: US/UK
database region: Singapore
user base: India

The system was not CPU-bound or memory-bound. It was distance-bound.

Stage	Approx latency	Primary bottleneck
Baseline	~14000ms	Cross-region backend+DB round trips
After backend cache	~1000-2000ms	User-to-backend long hop
Final state	~200ms	Mostly cache-hit path at edge

Part 2: Remove cross-region work from the hot path

Backend caching: remove database as default path

The dataset was small (~320KB), read-heavy, and rarely written (mostly admin updates). Selective query-level optimization would add complexity without solving the core problem.

So I cached effective database payloads in backend memory and treated Postgres as source-of-truth.

Design choices:

stale-while-revalidate semantics
warm cache via periodic endpoint hits
explicit invalidation on admin writes

This removed most cross-continent DB round trips from request-time paths. Latency dropped to around 1-2 seconds and variance became far tighter.

Frontend/edge caching: remove backend as default path

Even after backend cache, users still paid the geographic cost of contacting backend origin.

Next step:

stale-while-revalidate caching at frontend/edge
backend used as fallback path instead of primary path

Most users were now served directly from cache. Backend calls became rare and predictable.

I initially used cross-domain warmup traffic and ran into adblocker issues because it resembled tracking behavior. I moved worker logic to first-party domain boundaries and separated warmup responsibility to avoid trust and filtering problems.

What changed technically

I stopped trying to “optimize” a slow path and instead removed slow path dependencies from normal requests.
I used layered caching where each layer removed one class of cross-region round trips.
I added explicit cache lifecycle controls (warmup + invalidation), not just passive TTL waiting.

Outcome

The final result was:

latency reduction from ~14000ms to ~200ms
~98.5% improvement
significantly lower variance and fewer long-tail stalls

More important than the average speed, the system became predictable. Cache misses were visible and explainable, not random user-facing failures.

For a zero-budget stack, this made the website practically usable for our community at GDG On Campus NEHU.