JSON vs TOON — Feeding Data to LLMs Efficiently
LLMs and Tokens
In an LLM (Large Language Model) everything is represented as tokens - words, spaces, punctuation, and symbols.Tokens are both sent and received, and every character increases processing cost performance.
The JSON Problem
Data for LLMs is commonly provided as raw text or JSON.
However,LLMs don’t parse JSON — they just read it as plain text.That means all {}, and " become chargeable tokens.
Why JSON fails for LLMs:
While JSON is ideal for APIs, it performs poorly with LLMs due to:
- Token Waste: Structural characters ({}, ,, ") add unnecessary tokens.
- Redundancy: Repeats keys for every object
- Poor Readability: Hard to read in logs or prompts
Introducing TOON
TOON is a token-efficient data format engineered for LLM efficiency. It removes JSON’s punctuation and expresses data in a clear CSV based tabular and indentation-based layout similar to YAML.
Key Benefits:
- Tabular arrays
- Minimal syntax
- 30–60% fewer tokens
- Clean indentation-based structure
- Great for streaming and LLM inputs
Example
JSON:
{ "users": [
{ "id": 1, "name": "Alice", "role": "Admin" },
{ "id": 2, "name": "Bob", "role": "Editor" }
]}
TOON:
Savings Highlight : For this type of uniform, tabular data, TOON offers roughly 50% fewer tokens than JSON. This reduction scales significantly with larger datasets.
Flattening JSON for Better TOON Efficiency
The efficiency of TOON depends on the structure of the source JSON.
- Flat JSON → fewer tokens
- Nested JSON → more tokens
To maximize token savings, nestedJSON should first be flattened by removing inner objects and arrays before encoding it into TOON.
When to Use
Use TOON for LLM input where token efficiency matters. Keep JSON for APIs and data interchange.
When TOON Works Best
- Flat Data: Best performance with non-nested JSON structures.
- LLM Prompts: Ideal for embedding structured data into model prompts.
- Readable Logs: Maintains human-readable format for easy debugging and tuning.
Summary
| Feature | JSON | TOON |
|---|---|---|
| Structure | Nested | Flat / Tabular |
| Syntax | Heavy | Minimal |
| Token Use | High | Low |
| Best For | APIs | LLM Prompts |