Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
Categories
ops 3 posts
- Closing the Eval-Prod Gap: Online Evaluation as ObservabilityOffline eval scores are green and production is worse. The gap is not a measurement error — it is structural. Here is how to instrument online evaluation so production quality becomes observable.
- Embedding and Vector-Store Observability: The Unwatched LayerRAG systems fail at the embedding and index layer long before the LLM does. Here is what to actually monitor: embedding drift, index staleness, recall decay, and retrieval quality in production.
- End-to-End Tracing for LLM Applications: What Belongs in a SpanProduction LLM apps span multiple model calls, tool invocations, retrieval steps, and re-tries. A complete trace makes them debuggable; a sparse one leaves you guessing.