How We Route Cloud APIs, Hermes, OpenClaw, and Your Mac
Hermes and OpenClaw run on your machine or VM. They can call cloud APIs or local Ollama; AICoven does not choose. We route HTTP to your agent and WebSocket to your Mac.
Opinionated field notes on agent architecture, BYOK orchestration, local LLMs, and prompt economics — from building AICoven on iPhone, iPad, and Mac.
Hermes and OpenClaw run on your machine or VM. They can call cloud APIs or local Ollama; AICoven does not choose. We route HTTP to your agent and WebSocket to your Mac.
Anthropic's advanced tool use research showed us the path. We adapted it for every provider — and for local models that can't even call tools natively.
Dumping 200 MCP tool schemas into a local model's context is a great way to ensure it never answers your question. Here is how we fixed the 'amnesiac expert' problem.
AI doesn't return clean JSON — it streams chaotic fragments of text, tool calls, and errors in real time. Here's what it took to render that smoothly, and why it still breaks.
When you build a local-first AI app, you hit a brutal hardware constraint almost immediately: the context window. Every token is precious. Here's how we're managing it.
Building native cloud APIs is easy. Getting a local Llama-3 model to execute shell commands requires the dark arts of prompt engineering and a lot of regex.
Everyone wants an infinite context window. The math says no. Here's how we compress history to keep the system prompt from getting lost in the middle.
We love JSON mode because it makes agents reliable. But forcing a model to think in JSON burns 3x more tokens than letting it reason in plain text.
If you aren't using your own API key, you're trusting a middleman with your codebase. Here's how the big three handle your data differently depending on how you access them.