How We Route Cloud APIs, Hermes, OpenClaw, and Your Mac
I wrote about BYOK and regex-based local tool calling. Those posts cover trust and parsers.
This one covers routing: how AICoven picks where a turn goes, how traffic reaches a server on your Mac when the API runs in GCP, and what we do not control (which model Hermes or OpenClaw calls internally).
Three Lanes (Not "Cloud vs Local")
Most diagrams use two boxes. We run three lanes:
1. Cloud BYOK providers. OpenAI, Anthropic, Google, Mistral, Azure OpenAI, Vertex, Bedrock, Codex. Your key. The API calls the vendor directly.
2. Hermes / OpenClaw provider accounts. OpenAI-compatible HTTP to an agent runtime you host (same Mac as AICoven, a home-lab VM, a Tailscale host, etc.). The slug in our catalogue is hermes or openclaw; default models look like hermes-agent and OpenClaw's compatible surface.
3. AICoven Local Agent (macOS WebSocket). A desktop daemon the Mac app connects to the API with. Used when a turn needs your filesystem or shell while the brain might still be Claude or GPT.
Lane 2 and lane 3 are both "local" in the sense of your hardware. They are not the same pipe.
Privacy note (read with middleman-attack): All lanes still go through AICoven's backend for threading, memory, tools, and context. "Local routing" means where inference or shell execution happens, not that we never see the conversation. We do not store your chats in plaintext for staff to read; message content is encrypted with your key and decrypted in RAM to run the agent. If you need "never touches our cloud," that is a different product shape than what we ship.
We Route to the Agent, Not the Model Inside It
Hermes Agent (Nous) and OpenClaw are not "a GGUF in RAM." They are full agent stacks: tools, MCP, skills, memory. They run where you put them: laptop, Mac Studio, Proxmox VM.
Can they use local models? Yes. Point Hermes at Ollama or vLLM on the same box. Run OpenClaw against a local endpoint.
Can they use cloud models? Yes. OpenRouter, OpenAI, Anthropic, configured inside Hermes or OpenClaw, not in AICoven's provider picker for that lane.
What does AICoven know? Almost nothing about that choice. We store base URL (where your agent listens) and an optional API key so our API can authenticate AICoven → your agent server. We call the OpenAI-compatible surface (/v1/chat/completions, Hermes /v1/runs, etc.). Whatever model or provider the agent uses to produce tokens is their config. We do not inspect it, enforce it, or bill it.
So "local provider" in AICoven means:
- Routing target is on your machine or VM (reachable directly or via the Mac WebSocket relay).
- Not a guarantee that every token was generated on-GPU in your basement.
Local here means you host the agent process. Local-local means you also chose offline inference inside that agent. We only promise the first.
Hermes is listed with tools scope beside OpenClaw because you are routing to an agent OS, not to a completion URL you fully describe in our UI.
| Question | Cloud role (e.g. Claude) | Hermes / OpenClaw role |
|---|---|---|
| Who streams the reply? | Anthropic's API (AICoven picks account + model) | Your agent server (AICoven picks base URL; agent picks backend model) |
| Where do tools run? | Mostly cloud tool services; Mac tools via WebSocket (below) | Inside Hermes/OpenClaw |
| Whose API keys? | Yours, in AICoven BYOK | Inside the agent (cloud and/or local stack) + optional gateway key in AICoven for our HTTP calls |
| Typical host | Vendor cloud | localhost, LAN, VM, Tailscale |
The Problem: Cloud Run Cannot See Your Basement
The API runs on Cloud Run. Your Hermes gateway is at http://127.0.0.1:… or http://100.x.x.x:… on a Mac. Those URLs are not reachable from a container in us-central1.
Early failure mode: user adds Hermes, turn times out, we look incompetent.
Fix: transport policy on the local LLM client (auto, local_agent, direct):
auto (default)
→ if base_url is loopback / private / .ts.net:
require a connected Mac local agent; relay HTTP over WebSocket
→ else:
try direct HTTP to the public VM URL
→ on failure, fall back to local agent relay if one is connected
local_agent
→ always relay via WebSocket (llm.inference.stream)
direct
→ always HTTP from the API host (only works if the URL is public)
If you use a public base_url, treat it like exposing an API: HTTPS, authentication on the agent, firewall rules. An open OpenAI-compatible port on the internet is a relay for strangers.
The relay is not tunneling of your API keys. The Mac app maintains a WebSocket to the API. When a turn needs inference at a private base_url, we send a tool-shaped message to that daemon:
tool: llm.inference.stream
args: { base_url, payload, api_key? }
The daemon performs the HTTP request where the URL is actually reachable, streams SSE chunks back, and Cloud Run forwards them to your thread. Same mechanism for Hermes durable /v1/runs when we need a run to survive worker boundaries.
If you are on iPhone and Hermes is on a Mac at home: the Mac must be online with the local agent connected, or private URLs will fail with an explicit "start the local agent" error instead of a vague model failure.
Cloud Models Using Local Tools on the Mac
Separate from "this role uses Hermes."
You can assign a coven role to Claude and still let it run shell.execute, shell.writeFile, and related tools. Anthropic does not run the shell on your Mac. The command still flows through AICoven's cloud orchestration (tool call → your thread context) before we WebSocket it to the daemon:
Agent turn (cloud LLM) → tool call → AgentToolsService (our API)
→ connected local agent id (memory or DB registry)
→ WebSocket → macOS daemon → real shell / real files
→ stdout/stderr back through the same path into the turn
Common setup:
- Reasoning on a cloud BYOK model you trust for context length and schema fidelity.
- Execution on the Mac via the local agent when the task touches disk.
That is how we ship "ChatGPT-class brain + my repo on disk" without claiming Anthropic SSHs into your laptop. Shell output does transit our backend when the model needs it in context.
(Still bounded by tool selection and context budgets. Routing does not forgive 200 MCP schemas.)
Hermes vs OpenClaw in the Router
Same integration shape (OpenAI-compatible client, optional durable runs for Hermes). Different runtimes you install.
- Hermes → Hermes Agent: Nous stack,
/v1/runsfor durable continuations, attachment flattening so multimodal turns do not break run creation. - OpenClaw → OpenClaw: different skill/tool ecosystem, often
:3333.
AICoven does not merge them into one slug. The provider gateway picks the account bound to the role, builds the right client, applies fallback ordering if that account is unhealthy, and classifies hermes-agent / openclaw as high-tier so cloud fallbacks prefer capable models (Opus/Sonnet-class), not Flash/Haiku, when a local stream dies mid-turn.
Per-role assignment is the routing knob. Users (and covens) bind accounts to roles; the gateway executes that contract.
What This Is Not
- Not "Hermes = LM Studio only." Pointing the Hermes slug at a bare GGUF server without the agent runtime is a configuration mismatch, not a supported happy path.
- Not the on-device MLX path from local-vs-api-tools. That is another engine (regex, on-phone/macOS MLX). Hermes/OpenClaw here means HTTP to your hosted agent.
- Not "local = private" or "local = offline." Your Hermes setup might be 100% Ollama; we do not know. Your Claude role may still call cloud APIs; only the shell/file lane strictly requires your Mac with the daemon.
What We Have Not Nailed Yet
- Automatic discovery of Hermes/OpenClaw on the LAN. Still manual base URL.
- One mental model in the UI. "Local provider" vs "local agent for tools" still confuses people.
- Phone-only workflows against private home URLs. Need the Mac bridge; we should keep saying that.
- Unified observability across direct HTTP, relayed HTTP, and cloud tool calls in one timeline.
TL;DR
- Routing: cloud BYOK + HTTP to your Hermes/OpenClaw + WebSocket to your Mac for private URLs and shell tools, all orchestrated through our API. See middleman-attack for what we do and do not store in plaintext.
- Hermes/OpenClaw: agent runtimes you host; you configure cloud or local models inside them. AICoven only routes HTTP to the agent.
- Cloud roles can still touch local disk through the local agent WebSocket; commands and output pass through our backend for context.
- Transport
autoexists because Cloud Run cannot curl yourlocalhostsuccessfully on its own.
Questions: @aicoven. Beta: aicoven.com/beta.
Related posts
The Reality of 'Local Agents': API Tools vs. Prompt Engineering
Building native cloud APIs is easy. Getting a local Llama-3 model to execute shell commands requires the dark arts of prompt engineering and a lot of regex.
Read →The Middleman Attack: Why Your AI Wrapper is Logging Your Code
If you aren't using your own API key, you're trusting a middleman with your codebase. Here's how the big three handle your data differently depending on how you access them.
Read →
About the Author
I'm Andreea, the creator of AICoven. I build local-first tools for developers who care about architecture, privacy, and prompt economics.
See more of my work at papillonmakes.tech →