The JSON Mode Tax: Why Your Agent Bills Are Bloated
Reliability costs money.
We love "JSON Mode" because it makes agents reliable. But it also bloats your model's output with brackets, quotes, and keys.
We found that forcing a model to "think" in JSON burns 3x more tokens than letting it think in plain text.
The Hidden Math of Output Tokens
If you look at your typical agent logs, a chat response might use 2,500 total tokens. About 2,200 of those are Input Tokens (the system prompt, the file definitions, the chat history). In the grand scheme of API pricing, input tokens are cheap.
The cost optimization happens on the remaining 300 Output Tokens.
Output tokens are where the model actually "thinks" and generates a response, and they cost a premium (Claude 3.5 Sonnet charges $15.00 per 1M output tokens, compared to just $3.00 for input).
When developers build native AI agents, the easy architectural choice is to force the model to output its entire response—including its internal reasoning, its plan, and the final message to the user—as a giant, structured JSON object so the backend can parse it predictably.
But look at the cost of that convenience:
If an agent wants to reason:
"The user wants to search for cats. I will query the database." (14 tokens)
When forced into a rigid JSON schema, that same thought becomes:
{
"thought_process": "The user wants to search for cats.",
"action_plan": "I will query the database.",
"status_update": "Searching..."
}
(38 tokens)
You are paying nearly 3x the premium output price to compute and transmit syntax characters ({, ", :, }) that provide zero actual reasoning value to the model. Multiply this by hundreds of turns in an autonomous loop, and the "JSON Tax" becomes a massive financial burden.
Worse, because conversation history is fed back into the context window, you are paying that 3x penalty again on every subsequent turn.
How AICoven Solves the JSON Tax
Optimization isn't just about code execution speed. It's about prompt economics.
In AICoven, we explicitly designed our architecture to avoid the "JSON-in, JSON-out" trap. We decouple the agent's "Brain" from its "Hands":
- The Brain (Natural Language): Our custom System Prompts instruct the agent to reason, plan, and critique its own work in raw markdown (often wrapped in
<scratchpad>or<thought>tags). This is incredibly token-efficient and allows the model to "think" out loud cheaply. - The Hands (Native Tool Calling): The agent only switches to strict, structured JSON at the exact moment it needs to invoke a function (like
github.readFile).
By letting agents "think" like humans in plain text, and only "act" like machines in JSON, we get the exact same reliability and automation of a structured agent—but without paying the bloated API tax.
About the Author
I'm Andreea, the creator of AICoven. I build local-first tools for developers who care about architecture, privacy, and prompt economics.
See more of my work at papillonmakes.tech →