How We Rebuilt Tool Calling: From 75 Schemas to Smart Selection
Anthropic's advanced tool use research showed us the path. We adapted it for every provider — and for local models that can't even call tools natively.
Opinionated technical deep dives into agent architecture, local LLMs, and prompt economics.
Anthropic's advanced tool use research showed us the path. We adapted it for every provider — and for local models that can't even call tools natively.
Dumping 200 MCP tool schemas into a local model's context is a great way to ensure it never answers your question. Here is how we fixed the 'amnesiac expert' problem.
AI doesn't return clean JSON — it streams chaotic fragments of text, tool calls, and errors in real time. Here's what it took to render that smoothly, and why it still breaks.
When you build a local-first AI app, you hit a brutal hardware constraint almost immediately: the context window. Every token is precious. Here's how we're managing it.
Building native cloud APIs is easy. Getting a local Llama-3 model to execute shell commands requires the dark arts of prompt engineering and a lot of regex.
Everyone wants an infinite context window. The math says no. Here's how we compress history to keep the system prompt from getting lost in the middle.
We love JSON mode because it makes agents reliable. But forcing a model to think in JSON burns 3x more tokens than letting it reason in plain text.
If you aren't using your own API key, you're trusting a middleman with your codebase. Here's how the big three handle your data differently depending on how you access them.