Insights · Vertotech

Book an assessment

Insights

Field notes from the edge

Research and opinion from the consultants doing the work.

Compute·5 min read

GPU utilization is the only metric that matters

Throughput, latency, queue depth — useful, sure. But if your GPUs are sitting at 38% you don't have a serving problem, you have a billing problem.

May 2, 2026Read →

Inference·6 min read

The hidden cost of cold starts in inference

Autoscale-to-zero looks cheap until you bill p99 latency at 14 seconds and learn what your users actually feel.

Apr 8, 2026Read →

Retrieval·5 min read

When to use a vector DB — and when to use plain Postgres

The dedicated vector database is the most over-bought piece of infrastructure in 2026. Most teams already had the answer in their main database.

Mar 20, 2026Read →

Agents·7 min read

Designing agentic systems that don't burn money

Agents fail loudly when they crash. They fail expensively when they don't.

Feb 26, 2026Read →

AI Security·8 min read

LLM red-teaming without lobotomizing your product

It's easy to make a model refuse everything. The hard part is making it refuse the right things while still being useful.

Jan 11, 2026Read →