From 500ms to 900ms: How AI-Assisted “Optimizations” Turned a Fast Query into a Slow One — and What Brought It Back to 43ms
Systems Optimization & Performance Engineering5 min read

From 500ms to 900ms: How AI-Assisted “Optimizations” Turned a Fast Query into a Slow One — and What Brought It Back to 43ms

JeJozef ehj··5 min read

Modern AI coding assistants are increasingly positioned as tools for improving performance, suggesting optimizations, and accelerating backend development. In practice, their strengths are often concentrated in code generation and pattern completion rather than in understanding the actual behavior of running systems under real constraints.

A recent debugging case illustrates this gap clearly. An API endpoint returning approximately 2,000 database records exhibited a response time of ~500ms on the client side. Initial investigation ruled out the database layer almost immediately, as the raw query execution time inside PostgreSQL was around 2ms. The system was not database-bound; it was execution-path bound.

When the issue was analyzed with an AI assistant, the suggested direction followed a conventional optimization path: increase indexing coverage, adjust ORM loading strategies, introduce scalars().unique(), and refine relationship handling through additional abstraction at the ORM layer. In isolation, these are standard recommendations in ORM-based systems and often valid in large-scale workloads. In this case, they were misaligned with the actual constraint.

After implementation, response time increased from ~500ms to ~900ms. The reason was not a single incorrect change, but the introduction of additional computational layers into a workflow that did not require them. The dataset size remained small enough that PostgreSQL naturally favored hash joins and avoided heavy index utilization. Forcing additional indexing and ORM uniqueness enforcement introduced repeated UUID validation checks and extra iteration steps across the query lifecycle, effectively shifting cost from the database to the application layer.

The resolution came from removing abstraction rather than adding it. Replacing ORM-heavy relationship loading patterns such as joinedload and selectinload with controlled raw SQL queries, combined with explicit pagination and reduced object hydration, eliminated unnecessary overhead in the execution path. The final response time dropped from ~900ms to ~43ms.

The key observation is not that AI suggestions were wrong in principle, but that they optimized for structural correctness rather than contextual necessity. Performance engineering is rarely about applying more patterns. It is about identifying when those patterns no longer fit the scale or shape of the workload.

In database systems, especially those built on ORMs like SQLAlchemy, the most expensive performance issues are often not missing optimizations, but accumulated abstraction: unnecessary joins, redundant object mapping, hidden execution paths, and implicit behavior that only becomes visible under profiling.

As AI-generated code becomes more common in production systems, the core risk is not syntactic error but systematic over-engineering. Code that is correct, idiomatic, and still suboptimal under real constraints. In performance-critical systems, the hardest skill is not knowing what to add. It is knowing what to remove.

Frequently Asked Questions

Why database queries are fast but API requests are slow+

A query execution time inside the database only measures the database engine’s work. It does not include serialization, ORM hydration, network transfer, or object mapping. In systems returning thousands of records, the overhead of converting rows into Python objects and then into JSON often dominates total latency even when SQL execution is near-instant.

Why can ORM optimizations sometimes make performance worse+

ORM optimizations are context-sensitive. Techniques like eager loading, relationship joins, and uniqueness enforcement reduce database round trips in large relational traversals, but they also increase memory allocation, object graph construction, and CPU work in the application layer. When the dataset is small or already efficiently fetched, these abstractions add overhead without removing meaningful bottlenecks.

How do you determine whether a performance issue is database-bound or application-bound+

The separation comes from measuring each layer independently. Database-bound issues show high query execution time, slow joins, or inefficient execution plans. Application-bound issues show fast SQL execution but slow total response time, usually caused by serialization, ORM mapping, business logic loops, or payload size. Profiling must isolate query time from full request time.

Why does reducing abstraction sometimes improve performance in ORM-based systems+

Abstraction introduces generality, not efficiency. ORMs prioritize developer ergonomics by modeling relational data as object graphs, but this mapping layer is computational work. Removing unnecessary object hydration or replacing complex relationship loading with targeted SQL reduces CPU usage and memory pressure, especially when full ORM features are not required for a specific endpoint.

Why do AI coding assistants often suggest ORM-heavy optimizations+

AI assistants tend to generalize from common patterns in large-scale systems where ORM inefficiency is a frequent real problem. They optimize toward structural correctness and best-practice templates rather than runtime-specific constraints. This leads to recommendations that are technically valid but not always aligned with actual bottlenecks in small or medium workloads.

What is the main lesson from performance regressions caused by over-optimization+

Performance regressions often come from adding layers rather than missing them. Each abstraction layer introduces hidden cost in CPU, memory, and execution path complexity. Effective optimization requires identifying the actual bottleneck first and resisting changes that do not directly reduce measured latency in that specific layer.

Je
Studies and Development Engineer
More

Continue reading

It worked on localhost. Obviously.

Localhost is a comfortable, consequence-free fantasy. Production is where software goes to find out what it actually is. A 3am field report from someone who has been profiling things nobody asked them to profile.

AI Coding Assistants Fail at Production Debugging

AI coding assistants generate correct-looking code but often fail in production debugging. Learn why runtime profiling, system constraints, and execution paths matter more than generated solutions.

3 min

Cutting Through the Noise: A Late-Night Rant on Directness in Systems

Another 3 AM production incident survived. Time to talk about why we make our systems so damn complicated, and why sometimes, the most elegant solution is the one that just gets straight to the point.

6 min