← Back to Blog

The Reality of 'Local Agents': API Tools vs. Prompt Engineering

Andreea

When people talk about building "AI Agents," they're usually talking about an API proxy.

If you're building a coding assistant with OpenAI or Anthropic, giving the agent tools is a solved problem. You pass a clean JSON Schema, the model natively supports function calling, and you get a guaranteed, well-formatted JSON response back. It feels like magic.

But it only works because you're paying a cloud provider whose model was explicitly fine-tuned to understand that schema.

When you try to run that same agent workflow entirely locally — on a MacBook using Llama-3 or Mistral — the illusion shatters.


The Local Toolkit Challenge

Off-the-shelf open-weights models are trained to predict the next token in conversation. They don't have robust, native <function_call> structures in the same way GPT-4o does.

To give a local agent tools, you revert to prompt engineering and regex:

You are an AI with tools. To use a tool, output the
exact syntax: [TOOL_CALL: name(arg="value")]
Do not write anything else around the tool call.

And on the backend? You aren't unwrapping a nice structured response. You're running brute-force regex parsers over the LLM's streaming text, hoping the model remembered to close its parenthesis:

import re

TOOL_PATTERN = re.compile(
    r'\[TOOL_CALL:\s*(\w+)\((.*?)\)\]'
)

def parse_tool_calls(stream_text: str):
    matches = TOOL_PATTERN.findall(stream_text)
    if not matches:
        return None  # Model forgot it has tools. Again.
    return [{"name": m[0], "args": m[1]} for m in matches]

Why It's Still a Work in Progress

Building an app like AICoven that works both with cloud APIs and 100% local models requires two completely distinct orchestration engines.

Our cloud LLMService converts internal tools into compliant JSON schemas and runs flawlessly. Our LocalAgent orchestrator is essentially a glorified text-parser holding on for dear life.

Even with strict system prompts, local models are inconsistent. You'll ask a Llama-3 to search your codebase, and instead of emitting the [TOOL_CALL: github.searchCode] syntax, it will apologize and say, "I'm sorry, as an AI, I cannot access your local files."

To combat this, we built a self-correction loop: if the local model hallucinates a parameter or breaks the syntax, the orchestrator intercepts the error, injects a correction directly back into the context window, and forces the model to try again.

It's a constant battle. Building a true local-first agent isn't about finding a smarter model — it's about building a parser resilient enough to handle an AI that hallucinates its own instruction manual.

If you're grinding through the same local tool-calling problems, we're comparing notes — @aicoven.

About the Author

I'm Andreea, the creator of AICoven. I build local-first tools for developers who care about architecture, privacy, and prompt economics.

See more of my work at papillonmakes.tech →