Insights
Field notes from the edge
Research and opinion from the consultants doing the work.
Compute·5 min read
GPU utilization is the only metric that matters
Throughput, latency, queue depth — useful, sure. But if your GPUs are sitting at 38% you don't have a serving problem, you have a billing problem.
May 2, 2026Read →
Inference·6 min read
The hidden cost of cold starts in inference
Autoscale-to-zero looks cheap until you bill p99 latency at 14 seconds and learn what your users actually feel.
Apr 8, 2026Read →
Retrieval·5 min read
When to use a vector DB — and when to use plain Postgres
The dedicated vector database is the most over-bought piece of infrastructure in 2026. Most teams already had the answer in their main database.
Mar 20, 2026Read →
Agents·7 min read
Designing agentic systems that don't burn money
Agents fail loudly when they crash. They fail expensively when they don't.
Feb 26, 2026Read →
AI Security·8 min read
LLM red-teaming without lobotomizing your product
It's easy to make a model refuse everything. The hard part is making it refuse the right things while still being useful.
Jan 11, 2026Read →