The Reality of 'Local Agents': API Tools vs. Prompt Engineering
When people talk about building "AI Agents," they are usually talking about an API proxy.
If you are building a coding assistant using OpenAI or Anthropic, giving the agent "tools" (like the ability to read your repository or execute shell commands) is mathematically solved for you. You pass a clean JSON Schema representing your function, the model natively supports "Function Calling," and you get a beautifully formatted, guaranteed JSON response back.
It feels like magic. But it only works because you are paying a cloud provider to do the heavy lifting of fine-tuning their model to explicitly understand that schema.
When you try to run that exact same Agent workflow entirely locally (on a MacBook using models like Llama-3 or Mistral), the illusion shatters.
The Local Toolkit Challenge
Off-the-shelf open-weights models are trained to predict the next token in a conversational sequence. They do not have native, robust structures for <function_call> built-in in the same flawless way GPT-4o does.
To give a local agent tools, you have to revert to the dark arts of Prompt Engineering and Regex.
Instead of a clean JSON Schema, your System Prompt becomes a desperate plea:
"You are an AI with tools. To use a tool, you MUST output the exact syntax
[TOOL_CALL: name(arg="value")]. Do not write anything else. I am begging you to adhere to this format strictly."
And on the backend? You aren't unwrapping a nice JSON object. You are running brute-force Regex parsers over the LLM's streaming text, hoping the model remembered to close its parenthesis and didn't hallucinate a parameter name.
Why It's Still a Work in Progress
Building an agentic app like AICoven that works both with Cloud APIs and 100% Local Models requires two completely distinct orchestration engines.
Our Cloud LLMService converts internal tools into compliant JSON schemas and runs flawlessly.
But our LocalAgent orchestrator is essentially a glorified text-parser holding on for dear life. Even with strict system prompts, local models are inconsistent. Because prompt engineering is fundamentally less reliable than actual API function calling, local models will occasionally just... forget that they have tools.
You'll ask it to search your codebase, and instead of emitting the [TOOL_CALL: github.searchCode] syntax we explicitly trained it on, it will apologize and say, "I'm sorry, as an AI, I cannot access your local files."
To combat this, we built a self-correction loop. If the local model hallucinates a parameter or breaks the syntax, the orchestrator intercepts the error, injects a system correction directly back into the local context window, and forces the model to try again.
But it's a constant battle. Building a true local-first agent isn't just about finding a smarter model; it's about building an incredibly resilient parser that can handle an AI that hallucinates its own instruction manual.
About the Author
I'm Andreea, the creator of AICoven. I build local-first tools for developers who care about architecture, privacy, and prompt economics.
See more of my work at papillonmakes.tech →