← Back to Blog

The JSON Mode Tax: Why Your Agent Bills Are Bloated

Andreea

Reliability costs money.

We love JSON Mode because it makes agents reliable. But it also bloats your model's output with brackets, quotes, and keys. Forcing a model to "think" in JSON burns roughly 3x more tokens than letting it reason in plain text.

The Hidden Math of Output Tokens

Look at a typical agent log: a chat response might use 2,500 total tokens. About 2,200 of those are input tokens (system prompt, file definitions, chat history). Input tokens are cheap.

The cost lives in the remaining 300 output tokens. Output is where the model reasons and generates, and it costs a premium — Claude 3.5 Sonnet charges $15.00/M output versus $3.00/M input. That's a 5x multiplier.

When developers build native AI agents, the easy choice is to force the model to output its entire response — reasoning, plan, and final message — as a single structured JSON object. The backend parses it predictably. Ship it.

But look at the cost of that convenience:

Plain-text reasoning:

"The user wants to search for cats. I will query the database."14 tokens

The same thought forced into JSON:

{
  "thought_process": "The user wants to search for cats.",
  "action_plan": "I will query the database.",
  "status_update": "Searching..."
}

38 tokens.

You're paying nearly 3x the premium output price to compute and transmit syntax characters ({, ", :, }) that add zero reasoning value. Multiply that by hundreds of turns in an autonomous loop, and the JSON Tax becomes a real financial problem.

Worse: conversation history is fed back into the context window on every subsequent turn. That 3x penalty compounds.

How AICoven Handles This

We decouple the agent's "Brain" from its "Hands":

  1. The Brain (Natural Language): The agent reasons, plans, and critiques its own work in raw markdown, often wrapped in <scratchpad> tags. Token-efficient thinking.
  2. The Hands (Native Tool Calling): The agent only switches to strict JSON at the exact moment it invokes a function (like github.readFile).
<scratchpad>
The user wants to search for cats in the database.
I'll use the search tool with query="cats".
</scratchpad>

[TOOL_CALL: database.search(query="cats")]

By letting agents think like humans in plain text and act like machines in JSON, we get the same reliability and automation without paying the bloated API tax.

About the Author

I'm Andreea, the creator of AICoven. I build local-first tools for developers who care about architecture, privacy, and prompt economics.

See more of my work at papillonmakes.tech →