The Reality of 'Local Agents': API Tools vs. Prompt Engineering
When people talk about building "AI Agents," they're usually talking about an API proxy.
If you're building a coding assistant with OpenAI or Anthropic, giving the agent tools is a solved problem. You pass a clean JSON Schema, the model natively supports function calling, and you get a guaranteed, well-formatted JSON response back. It feels like magic.
But it only works because you're paying a cloud provider whose model was explicitly fine-tuned to understand that schema.
When you try to run that same agent workflow entirely locally — on a MacBook using Llama-3 or Mistral — the illusion shatters.
The Local Toolkit Challenge
Off-the-shelf open-weights models are trained to predict the next token in conversation. They don't have robust, native <function_call> structures in the same way GPT-4o does.
To give a local agent tools, you revert to prompt engineering and regex:
You are an AI with tools. To use a tool, output the
exact syntax: [TOOL_CALL: name(arg="value")]
Do not write anything else around the tool call.
And on the backend? You aren't unwrapping a nice structured response. You're running brute-force regex parsers over the LLM's streaming text, hoping the model remembered to close its parenthesis:
import re
TOOL_PATTERN = re.compile(
r'\[TOOL_CALL:\s*(\w+)\((.*?)\)\]'
)
def parse_tool_calls(stream_text: str):
matches = TOOL_PATTERN.findall(stream_text)
if not matches:
return None # Model forgot it has tools. Again.
return [{"name": m[0], "args": m[1]} for m in matches]
Why It's Still a Work in Progress
Building an app like AICoven that works both with cloud APIs and 100% local models requires two completely distinct orchestration engines.
Our cloud LLMService converts internal tools into compliant JSON schemas and runs flawlessly. Our LocalAgent orchestrator is essentially a glorified text-parser holding on for dear life.
Even with strict system prompts, local models are inconsistent. You'll ask a Llama-3 to search your codebase, and instead of emitting the [TOOL_CALL: github.searchCode] syntax, it will apologize and say, "I'm sorry, as an AI, I cannot access your local files."
To combat this, we built a self-correction loop: if the local model hallucinates a parameter or breaks the syntax, the orchestrator intercepts the error, injects a correction directly back into the context window, and forces the model to try again.
It's a constant battle. Building a true local-first agent isn't about finding a smarter model — it's about building a parser resilient enough to handle an AI that hallucinates its own instruction manual.
If you're grinding through the same local tool-calling problems, we're comparing notes — @aicoven.
About the Author
I'm Andreea, the creator of AICoven. I build local-first tools for developers who care about architecture, privacy, and prompt economics.
See more of my work at papillonmakes.tech →