It worked on localhost. Obviously.
Systems Optimization & Performance Engineering

It worked on localhost. Obviously.

JeJozef ehj·

I was profiling an application under load the other night, not because anyone asked me to, not because there was a deadline, but because I could hear it being slow from across the room, and something clicked into place that I have technically known for years but apparently needed to rediscover at midnight for the forty-seventh time.

Localhost is a lie we tell ourselves so we can sleep. A comfortable, fast, consequence-free fantasy where every API responds instantly, every database query is basically RAM, and the only user is you, clicking politely, in a controlled environment, having a great time.

Production is where software goes to find out what it actually is.

And production has opinions.


// localhost. 2am. just you and your machine.

GET  /api/users    200 11ms  // a dream
GET  /api/users    200 11ms  // a dream
GET  /api/orders   200 8ms   // beautiful 
POST /auth/login   200 14ms  // i am a genius 

// production. 9am monday. 4,000 users.

GET  /api/users    200 847ms // hmmm 
GET  /api/orders   503      // oh no 
POST /auth/login   504      // oh NO 
GET  /health       500     // the health check. the HEALTH CHECK is down.

The thing nobody warns you about in bootcamp, in tutorials, in any of the seventeen YouTube videos with "Master" in the title: production does not break your software. It reveals it. Every cut corner, every "I'll fix this later," every index you skipped because the table only had twelve rows when you tested it, production collects all of these debts and presents them to you simultaneously, in front of real users, on a Monday.

  • Junior devs think the code is broken. It is not. You are.
  • Senior devs think they have seen this before. They have. They still cannot fix it faster.
  • DevOps already knew. They have been waiting for you to figure it out.
  • The PM is asking if we can just "turn it off and back on."
  • The CEO has sent a Slack message that is just: "thoughts?"
  • Me, writing this at 3am — also part of the problem.

The real insult, the part that makes you stare at a flamegraph with the specific expression of someone being told terrible news by a doctor, is that the slow part is never where you think it is. The database query you spent three hours optimizing? That takes nine milliseconds. The actual bottleneck is a serializer you did not write, inside an ORM you imported without reading, wrapping a response in two layers of middleware you added because a blog post said it was best practice in 2019.

The bottleneck is always somewhere between where you looked and where you should have looked.

This is why load testing exists. Not because it is fun, it is not fun, it is the software equivalent of going to the doctor after convincing yourself for months that you are fine, but because tools like K6 will simply show you the truth. No opinion. No feelings. Just: here is what your application does when seventeen hundred people hit it at the same time, and here is the part that started crying first.


$ k6 run load-test.js

OK   status is 200             [ 94% ] 
FAIL response time under 500ms          [ 23% ] // you, specifically
FAIL no 5xx errors              [ 81% ] // the middleware said hi </strong></li>

http_req_duration  avg=1.24s  min=12ms  med=890ms  max=8.3s 
http_req_failed    19.4%               // a fun number 

Senior engineers, I have noticed, are not faster because they are smarter. They are faster because they have already been embarrassed by every single one of these mistakes at least once, and embarrassment is a remarkably effective teacher. They do not optimize before measuring because they have spent enough time optimizing the wrong thing to know exactly how that story ends.

So here we are. Writing production-hardened software, profiling at midnight, adding indexes nobody asked for, chasing p99 latency on applications that will load in 200ms for a user who will immediately complain it feels slow. Applications that will make someone's checkout seamless, their dashboard instant, their experience effortless, because we chose to suffer the flamegraph so they never have to know it exists.

We make the hard invisible so the easy feels obvious. That is the job. It has always been the job. It just did not come with a warning label that said: you will do this at 3am, voluntarily, for systems that will never thank you, for users who will only notice when something breaks.

Profile before you optimize. Measure before you architect. Add the index. And try, at some point, to go to bed before the health check does.

written at 03:47   commit: "fix: hopefully"

deployed at 04:12 &nbsp; commit: "fix: for real this time"
rolled back at 04:31 &nbsp;commit: "revert: for real for real this time"
// localhost tests: passing. obviously.

Frequently Asked Questions

Why does everything work perfectly on localhost and immediately fall apart in production?+

Because localhost is not the real world. On your machine, you are the only user, the database lives next door, the network has zero latency, and nothing is competing for resources. Production introduces every single thing localhost kindly removed: real users, real traffic, real network round trips, real contention. Your application was never actually fast. It just had no competition.

Is high traffic really what causes production failures?+

Rarely. Traffic is what reveals failures, not what creates them. The N+1 query was always there. The missing index was always there. The serializer doing three unnecessary transformations on every request was always there. Scale just turns up the volume until you can no longer pretend you do not hear it.

I optimized the database query. Why is the endpoint still slow?+

Because the database query was probably never the problem. The actual cost is usually hiding inside the layers wrapped around it: ORM hydration, response serialization, middleware that runs on every request without anyone remembering why, abstraction stacked on abstraction. Profile the full execution path with something like a flamegraph before you touch anything. The bottleneck is almost never where you are looking.

What does K6 actually tell you that manual testing does not?+

Manual testing tells you the application works when one person, who built it, uses it carefully. K6 tells you what happens when four hundred people hit the same endpoint at the same time, three of them with slow connections, one of them sending a payload you did not account for. It does not ask whether your app works. It asks how it behaves under pressure. Those are completely different questions and only one of them matters in production.

Why do senior engineers seem to fix things faster without writing more code?+

They have already been embarrassed by every shortcut exactly once. They know that the first explanation is usually wrong, that the bottleneck is rarely where it looks like it is, and that optimizing before measuring is how you spend four hours making the wrong thing 30% faster. The speed comes from not repeating the same expensive mistakes. Which is a polite way of saying production has already taught them everything the hard way.

When is an application actually ready for production?+

Not when it has enough features. Features are invisible to a user who is staring at a 503. An application is production-ready when it continues behaving correctly under conditions that were never ideal: partial failures, unexpected inputs, degraded dependencies, traffic it was not expecting on a Monday morning. The goal was never to build something that runs. The goal was to build something that keeps running once reality arrives, unannounced, at scale, at 2am.

Je
Studies and Development Engineer
More

Continue reading

From 500ms to 900ms: How AI-Assisted “Optimizations” Turned a Fast Query into a Slow One — and What Brought It Back to 43ms

An API endpoint went from 500ms to 900ms after AI-suggested “optimizations,” until removing ORM abstraction and switching to raw SQL reduced it to 43ms, revealing how performance depends more on system understanding than generated fixes.

5 min