Using TOON with LLMs — Cut Token Costs Without Losing Structure

Every token you send to an LLM API costs money. Not much per token — but when your prompt includes a dataset with 50 rows of user data formatted as JSON, you're paying for every curly brace, every repeated key name, every quote mark. On a dataset with 10 fields and 100 rows, JSON burns roughly 60–70% of its tokens on structural noise. TOON was built to fix exactly this. This article shows you how to use @toon-format/toon to cut your prompt token counts — sometimes by more than half — without losing any structure the model needs.

How LLM Token Pricing Works

Models like GPT-4o and Claude charge per input token and per output token. The OpenAI Tokenizer tool lets you paste any text and see exactly how many tokens it costs. Roughly speaking, 1 token ≈ 4 characters of English text — but JSON's structural characters (quotes, colons, braces) are often tokenised individually, so JSON-heavy prompts tokenise worse than plain prose at the same character count. You're paying a structural tax on every request.

For a typical SaaS app running thousands of requests per day with structured data in each prompt, that tax adds up fast. The architecture of large language models means the model genuinely doesn't need all that formatting noise — it can read structured data just fine without JSON's verbosity, as long as the format is unambiguous and well-explained.

JSON vs TOON — A Concrete Token Comparison

Let's make this concrete. Here's a 10-row user dataset in JSON:

json

[
  { "id": 1, "username": "alice_dev", "email": "[email protected]", "plan": "pro", "active": true },
  { "id": 2, "username": "bob_writer", "email": "[email protected]", "plan": "free", "active": true },
  { "id": 3, "username": "carol_ops", "email": "[email protected]", "plan": "pro", "active": false },
  { "id": 4, "username": "dan_qa", "email": "[email protected]", "plan": "team", "active": true },
  { "id": 5, "username": "eve_design", "email": "[email protected]", "plan": "pro", "active": true },
  { "id": 6, "username": "frank_sec", "email": "[email protected]", "plan": "team", "active": true },
  { "id": 7, "username": "grace_ml", "email": "[email protected]", "plan": "pro", "active": false },
  { "id": 8, "username": "henry_be", "email": "[email protected]", "plan": "free", "active": true },
  { "id": 9, "username": "iris_fe", "email": "[email protected]", "plan": "pro", "active": true },
  { "id": 10, "username": "jack_devrel", "email": "[email protected]", "plan": "team", "active": true }
]

That JSON block tokenises to roughly 310–330 tokens. Here's the exact same data in TOON tabular notation:

text

users[10]{id,username,email,plan,active}:
  1,alice_dev,[email protected],pro,true
  2,bob_writer,[email protected],free,true
  3,carol_ops,[email protected],pro,false
  4,dan_qa,[email protected],team,true
  5,eve_design,[email protected],pro,true
  6,frank_sec,[email protected],team,true
  7,grace_ml,[email protected],pro,false
  8,henry_be,[email protected],free,true
  9,iris_fe,[email protected],pro,true
  10,jack_devrel,[email protected],team,true

The TOON version tokenises to roughly 135–150 tokens — about 55% fewer. At scale, that's not a rounding error. If you're running 10,000 such queries per day at GPT-4o pricing, the difference between JSON and TOON in your prompt alone is material.

Token savings scale with rows, not columns. The more rows your dataset has, the bigger the win — because TOON pays the column-header cost once, while JSON pays the key-name cost on every single row. A 100-row table in TOON saves roughly the same percentage as this 10-row example.

Installing and Converting Data in Node.js

Install the package from npm. It works in Node.js 18+ and modern browsers:

bash

npm install @toon-format/toon

The import is a single destructure:

import { encode, decode } from '@toon-format/toon';

// Convert a JS array/object to TOON before sending to an LLM
const users = [
  { id: 1, username: 'alice_dev', email: '[email protected]', plan: 'pro', active: true },
  { id: 2, username: 'bob_writer', email: '[email protected]', plan: 'free', active: true },
  // ... more rows
];

const toonData = encode(users, { indent: 2 });
console.log(toonData);
// users[2]{id,username,email,plan,active}:
//   1,alice_dev,[email protected],pro,true
//   2,bob_writer,[email protected],free,true

Practical Pattern: Fetch → Encode → Prompt → Decode

The workflow for using TOON with an LLM API is four steps. Fetch your data from whatever source (database, REST API, CSV file), encode it to TOON, build your prompt, and optionally decode any structured TOON back out of the model's response.

import { encode, decode } from '@toon-format/toon';
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function analyseUsers(users) {
  // Step 1: encode data to TOON
  const toonData = encode(users, { indent: 2 });

  // Step 2: build the prompt
  const systemPrompt = [
    'You analyse user datasets. Data is provided in TOON (Token-Optimised Object Notation).',
    'TOON tabular format: name[count]{col1,col2,...}: followed by comma-separated rows, one per line.',
    'Respond with plain prose unless asked to return data, in which case use TOON format.'
  ].join('\n');

  const userPrompt = `Here is the user dataset:\n\n${toonData}\n\nWhich plan has the most active users?`;

  // Step 3: call the API
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt }
    ]
  });

  return response.choices[0].message.content;
}

// Step 4 (optional): if the model returns TOON, decode it back
const rawResponse = await analyseUsers(myUserArray);
try {
  const structured = decode(rawResponse);
  console.log('Structured result:', structured);
} catch {
  console.log('Prose response:', rawResponse);
}

The same pattern works for the Anthropic Claude API — just swap the client. The prompt structure and TOON encoding are identical regardless of provider. You can check the full OpenAI API docs for authentication setup and model selection details.

When It Matters Most

TOON pays off most in these scenarios:

Large datasets in context. Any time you're putting more than ~20 rows of structured data into a prompt, TOON tabular notation will save you significant tokens over JSON arrays of objects.
Repeated structured queries. If your application makes the same shape of query thousands of times per day (think: "analyse this user's activity" with a user record in each prompt), the cumulative savings are substantial.
Batch processing jobs. Scripts that process thousands of records through an LLM — classification, tagging, enrichment, summarisation — benefit enormously. Fewer tokens per call means faster throughput and lower cost.
Context-window-constrained tasks. When you're trying to fit a large dataset into a 128k context window alongside a long system prompt and few-shot examples, every token matters. TOON lets you fit more rows in the same window.
Cost-sensitive production APIs. Free-tier hobby projects won't notice. Production apps serving paying users at scale absolutely will.

The One Caveat: LLMs Need to Know the Format

TOON is not in any LLM's training data the way JSON is. The model has never seen a .toon file. This means you must include a brief format description in your system prompt — otherwise the model will either refuse the input or misparse it. The good news is the description is short, and you only pay for it once per conversation or request.

A minimal system prompt addition that works reliably:

text

Data is provided in TOON (Token-Optimised Object Notation).
TOON syntax:
- Objects: {key:value,key2:value2} — keys are never quoted
- Arrays: [val1,val2,val3]
- Tabular: name[rowCount]{col1,col2,...}:
    rowval1,rowval2,...
    rowval1,rowval2,...
Parse each row by matching values to the column headers in order.
Strings containing commas are double-quoted.

That block adds roughly 60 tokens to your system prompt — a one-time cost that's quickly recovered on any dataset larger than 5–6 rows. For applications where you're making many API calls with TOON data, the format description tokens are amortised to near-zero. Use JSON to TOON to convert your data before building the prompt, and TOON Formatter to verify it looks right.

Wrapping Up

The case for TOON in LLM workflows is straightforward: you're paying per token, structured data in JSON is token-inefficient, and TOON is a direct fix. The tabular notation alone cuts token usage by 50–60% on typical row-based datasets. The npm package — @toon-format/toon — is tiny, the API is two functions, and the integration into an existing API call is a five-minute job. The only thing to remember is the format description in your system prompt — without it, the model is guessing. With it, you get a model that reads your data correctly at half the token cost. Start with JSON to TOON to convert your existing data, validate it with the TOON Validator, and use TOON to JSON or decode() to convert structured responses back when needed.

← All TOON articles Browse all categories →