Introduction
JSON, a familiar format for APIs, storage, and application logic, often leads to token overload when used in Large Language Model (LLM) pipelines. The repeated use of braces, quotes, commas, and field names adds little value to the model. Enter TOON—Token-Oriented Object Notation—a format crafted to maintain the same JSON data model while minimizing token usage and providing models with clearer structural cues. Official documentation describes TOON as a compact, lossless representation of JSON, particularly effective on uniform arrays of objects.
This article explores TOON, its optimal use cases, and a step-by-step guide to incorporating it into your LLM workflow. We will also delve into the trade-offs, as TOON shines in some scenarios but not all.
Why JSON Wastes Tokens in LLM Pipelines
JSON’s structure is repeatedly expressed in LLM prompts, leading to inefficiencies. LLMs interpret JSON solely as tokens, not as a standard. For instance, when processing multiple support tickets, product lines, or user records, field names are redundantly repeated. TOON counters this by declaring fields once and streaming row values in a compact, tabular form. Here’s a basic illustration:
JSON:
{ “users”: [ { “id”: 1, “name”: “Alice”, “role”: “admin” }, { “id”: 2, “name”: “Bob”, “role”: “user” }, { “id”: 3, “name”: “Charlie”, “role”: “user” } ] }
TOON:
users[3]{id, name, role}: 1, Alice, admin 2, Bob, user 3, Charlie, user
The data remains the same, but the clutter is significantly reduced. This is where TOON derives much of its utility.
What is TOON and When is it Worth Using?
TOON is a serialization format for the JSON data model, capable of representing objects, arrays, strings, numbers, booleans, and nulls compactly for model input. The TOON project markets it as lossless compared to JSON, allowing seamless conversion between the two without information loss. Importantly, you don’t need to replace JSON in your application.
Maintaining JSON in your backend, APIs, and storage is advisable, converting to TOON only when sending structured data into an LLM. TOON excels when prompts contain repeated structured records with identical fields. Ideal examples include support tickets, catalog lines, analysis records, tool outputs, CRM entries, or memory snapshots for agent systems. Conversely, if the structure is deeply nested, irregular, purely flat, or very small, TOON’s benefits may diminish.
Getting Started with TOON
Step 1: Installing TOON CLI
The simplest method to experiment with TOON is via the official TOON Project Command Line Interface (CLI). The TOON website provides links to its CLI, and the main repository introduces the format within a comprehensive ecosystem of SDKs and tools.
Install the package:
npm install -g @toon-format/cli
Step 2: Converting a JSON File to TOON
Create a folder first:
mkdir toon-test cd toon-test
Next, create the JSON file:
[ { “id”: 1, “name”: “Alice”, “role”: “admin” }, { “id”: 2, “name”: “Bob”, “role”: “user” }, { “id”: 3, “name”: “Charlie”, “role”: “user” } ]
Convert it using:
npx @toon-format/cli users.json -o users.toon
This generates a compact result akin to:
[3]{id, name, role}: 1, Alice, admin 2, Bob, user 3, Charlie, user
This is the fundamental TOON pattern: declare the structure once, then list values line by line, aligning with the official design goal of tabular tables for uniform objects.
Step 3: Using TOON as Model Input
TOON is most beneficial on the input side of your pipeline. Instead of embedding a large JSON blob into a prompt, pass the TOON version and simplify instructions.
Example:
The following data is in TOON format. users[3]{id,name,role}: 1,Alice,admin 2,Bob,user 3,Charlie,user Summarize user roles and report anything unusual.
This approach works well as TOON facilitates the model’s comprehension of repeated structure with minimal overhead. This aligns with the official project’s criteria: a comprehension test across different structured input formats.
Step 4: Preserve JSON for Outputs
This is a critical practical decision. While TOON is advantageous for input, JSON remains the preferred choice for output when another system must parse the model response. JSON boasts robust tool support, and modern APIs effectively handle structured JSON output with schemas.
In practice, the safest approach is:
- JSON within your application.
- TOON for structured and voluminous prompt context.
- JSON again for machine-parsable model responses.
This strategy ensures efficiency on the input side and reliability on the output side.
Step 5: Benchmarking in Your Own Pipeline
Avoid changing formats purely due to hype.
Conduct a small benchmark in your workflow:
- Count input tokens for JSON.
- Count input tokens for TOON.
- Compare latency.
- Assess answer quality.
- Compare total cost.
The TOON project highlights token economies as a primary benefit, and third-party coverage echoes these claims. However, community discussions reveal that outcomes heavily depend on data structure. Thus, the better question is not “Is TOON better than JSON?”
The pertinent question is: “Is TOON better for this specific stage of the LLM?”
Final Thoughts
TOON is not a universal solution. It serves as a targeted optimization for a specific issue: token waste from repeated JSON structure in LLM prompts. If your pipeline involves numerous repeated structured records in a template, TOON is worth testing. If your payloads are small, irregular, or heavily nested, JSON may remain the better choice.
The wisest adoption strategy is straightforward: preserve JSON where it already excels, employ TOON for large structured inputs in prompts, and evaluate results on your tasks before committing.
Kanwal Mehreen is a machine learning engineer and technical writer with a passion for data science and the intersection of AI and medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she champions diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Fellow, Mitacs Globalink Research Fellow, and Harvard WeCode Fellow. Kanwal is a strong advocate for change, having founded FEMCodes to empower women in STEM fields.
Source: Here
“`

