TOON vs YAML — Two Compact Formats, Very Different Jobs

YAML and TOON are both described as "more readable than JSON" — and that description is technically true for both, which makes it useless. The real question is: readable to whom, and for what purpose? YAML is optimised for human editing: comments, anchors, multi-line strings, indentation that mirrors how developers think about configuration. TOON is optimised for machine processing: minimal tokens, no ambiguity, no syntax that exists purely for human comfort. These are different jobs. Confusing them leads to YAML in LLM prompts — which is worse than JSON — and TOON in Kubernetes manifests, which nobody wants to hand-edit. This article draws the line clearly.

YAML in Two Sentences

YAML is a human-friendly data serialisation format designed for configuration files, CI/CD pipelines, and any structured document a developer will read, write, and maintain by hand. Its defining features — inline comments, anchor/alias DRY mechanics, multi-line string literals, and indentation-based structure — all exist to make the format pleasant for humans, not efficient for machines.

The canonical use cases are GitHub Actions workflows, Kubernetes manifests, Docker Compose files, and application config that ships alongside code. If a human being is expected to open the file and edit it, YAML is a strong choice. The YAML 1.2 spec formalised a number of edge cases that plagued YAML 1.1 — most infamously the Norway Problem, where the country code NO parsed as boolean false in YAML 1.1 parsers. Modern parsers targeting YAML 1.2 handle this correctly, but it is a useful reminder that YAML's apparent simplicity hides real parser complexity.

Comments. # this is a comment — YAML supports inline and full-line comments. This alone makes it the right choice for any config a human will maintain.
Anchors and aliases. Define a block once with &anchor, reuse it anywhere with *alias. Essential for DRY Kubernetes configs and multi-environment CI pipelines.
Multi-line strings. Literal (|) and folded (>) block scalars let you embed shell scripts, SQL queries, or certificate data cleanly inside a YAML file.
Readable indentation. Structure is defined by whitespace, which maps naturally to how developers think about nested config hierarchies.
Weaknesses. Indent-sensitivity means a misplaced space is a parse error. Tab characters are forbidden. YAML 1.1 boolean coercion (Norway Problem, yes/no, on/off) has caused real production bugs. Tabular data expressed as an array of objects is more verbose than even JSON.

TOON in Two Sentences

TOON is a compact serialisation format designed for passing structured data to and from large language models, where every token costs money and context window space is finite. Its key innovation is tabular notation: for datasets where every record shares the same fields, keys are declared once in a header and omitted from every subsequent row — which is the opposite of what JSON and YAML do.

TOON is not a config format and was never meant to be. There is no comment syntax. There are no anchors. You would not want to hand-edit a 500-row TOON dataset any more than you would hand-edit a binary file. What TOON gives you is a format that encodes the same information as JSON in significantly fewer tokens — and fewer tokens means lower API costs, larger effective datasets per prompt, and less pressure on context window limits. The OpenAI tokenizer is the fastest way to see this in practice: paste the same dataset in both formats and compare.

Tabular notation. name[count]{col1,col2,...}: followed by one row of values per line. Keys appear exactly once regardless of row count.
Object notation. {key:value,key2:value2} — no quotes on keys, no extra whitespace.
Unambiguous parsing. No boolean coercion, no indent-sensitivity, no spec version divergence.
No comment syntax. TOON has no mechanism for inline comments — by design. It is a data format, not a document format.
Weaknesses. Niche tooling, no human-editing story, not appropriate for any file a developer will open and edit directly.

Side-by-Side: The Same Data in Each Format's Element

The most honest comparison shows each format doing what it is actually good at — not forcing a head-to-head where one is clearly the wrong tool.

First, a Kubernetes Deployment manifest. This is YAML's home territory: a human-maintained config file with comments, anchors for shared values, and deep nesting that maps to the logical hierarchy of a Kubernetes object:

yaml

# Deployment manifest for the payments-service
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-service
  namespace: production
  labels:
    app: payments-service
    version: "2.4.1"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payments-service
  template:
    metadata:
      labels:
        app: payments-service
    spec:
      containers:
        - name: payments-service
          image: registry.example.com/payments-service:2.4.1
          ports:
            - containerPort: 8080
          env:
            - name: DB_HOST
              valueFrom:
                secretKeyRef:
                  name: payments-db-secret
                  key: host
            - name: LOG_LEVEL
              value: "info"
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

Writing that in TOON would be pointless — it is not tabular data, it will be edited by humans, and it benefits from comments that explain non-obvious values. YAML is the right tool here, and there is no contest.

Now the same data in TOON's home territory: a user dataset being passed to an LLM for analysis. This is where TOON's tabular notation does the work:

text

users[12]{id,email,plan,mrr,country,signupDate,churned}:
  1001,[email protected],pro,99.00,US,2024-01-15,false
  1002,[email protected],starter,19.00,GB,2024-02-03,false
  1003,[email protected],enterprise,499.00,DE,2023-11-20,false
  1004,[email protected],pro,99.00,CA,2024-03-10,true
  1005,[email protected],starter,19.00,AU,2024-01-28,false
  1006,[email protected],pro,99.00,US,2023-12-05,false
  1007,[email protected],enterprise,499.00,FR,2024-02-14,false
  1008,[email protected],starter,19.00,IN,2024-03-22,true
  1009,[email protected],pro,99.00,US,2024-01-09,false
  1010,[email protected],enterprise,499.00,JP,2023-10-31,false
  1011,[email protected],starter,19.00,BR,2024-04-01,false
  1012,[email protected],pro,99.00,US,2024-02-19,true

Writing that as YAML — an array of 12 objects, each with 7 keys — would repeat all 7 key names 12 times. That is 84 key declarations for 84 values. TOON declares each key once.

Where YAML Beats TOON Every Time

Any file that a human being will open, read, and edit belongs in YAML (or JSON, for simpler cases). The decisive advantages are comments and anchors — two features TOON simply does not have.

CI/CD pipelines. GitHub Actions, GitLab CI, CircleCI — all YAML-native. The ability to comment out a step during debugging is genuinely useful.
Kubernetes and Helm. Every manifest, every values file, every chart template. The YAML anchor system is actively used in complex Helm charts to avoid repeating environment configs.
Docker Compose. Multi-service definitions with comments explaining non-obvious port bindings, volume mounts, and network configs.
Application config files. pyproject.toml-style configs, application settings, feature flags with explanatory comments inline.
Any file in version control that humans review in PRs. Comments in YAML config are part of the documentation. TOON cannot participate in this workflow at all.

Note on anchors: YAML anchors (&anchor) and aliases (*alias) are underused but powerful. A Kubernetes config that shares the same environment variables across multiple containers can define them once with &common-env and reference the block with *common-env — keeping the file DRY without any templating engine. TOON has no equivalent mechanism.

Where TOON Beats YAML Every Time

Any data that is programmatically generated and passed to an LLM belongs in TOON. YAML is actually worse than JSON for this use case — its indentation-heavy syntax and repeated key names add tokens without adding any information the model needs.

LLM prompt payloads. Feeding a dataset to GPT-4o, Claude, or Gemini for analysis, classification, or enrichment. TOON's tabular notation cuts token count by 40–60% compared to JSON, and compared to YAML the gap is even larger.
LLM output instructions. Instructing a model to respond in TOON produces shorter, cheaper output. YAML output from an LLM is verbose and indent-sensitive — one misaligned space and parsing breaks.
Programmatically-generated datasets. If your code is building the data, it should build TOON. There is no human editor to benefit from comments or readable indentation.
High-volume batch pipelines. Running 10,000 records through an LLM per day? A 50% token reduction is a 50% reduction in that line of your API bill.
Context window pressure. When you need to fit more data within a model's context limit, TOON lets you pack in more rows at the same token cost.

The Token Count Reality

Here is the same 10-row dataset in three formats. The numbers are approximate but consistent with what the OpenAI tokenizer reports for GPT-4o's tokenisation.

YAML array of objects:

yaml

- id: 1
  username: alice_chen
  plan: pro
  mrr: 99.00
  country: US
- id: 2
  username: bob_martin
  plan: starter
  mrr: 19.00
  country: GB
- id: 3
  username: carol_white
  plan: enterprise
  mrr: 499.00
  country: DE
- id: 4
  username: dan_patel
  plan: pro
  mrr: 99.00
  country: CA
- id: 5
  username: eve_torres
  plan: starter
  mrr: 19.00
  country: AU
- id: 6
  username: frank_liu
  plan: pro
  mrr: 99.00
  country: US
- id: 7
  username: grace_kim
  plan: enterprise
  mrr: 499.00
  country: FR
- id: 8
  username: henry_obi
  plan: starter
  mrr: 19.00
  country: IN
- id: 9
  username: iris_novak
  plan: pro
  mrr: 99.00
  country: US
- id: 10
  username: james_sato
  plan: enterprise
  mrr: 499.00
  country: JP

TOON tabular notation, same data:

text

users[10]{id,username,plan,mrr,country}:
  1,alice_chen,pro,99.00,US
  2,bob_martin,starter,19.00,GB
  3,carol_white,enterprise,499.00,DE
  4,dan_patel,pro,99.00,CA
  5,eve_torres,starter,19.00,AU
  6,frank_liu,pro,99.00,US
  7,grace_kim,enterprise,499.00,FR
  8,henry_obi,starter,19.00,IN
  9,iris_novak,pro,99.00,US
  10,james_sato,enterprise,499.00,JP

Approximate token counts for this dataset: YAML ≈ 290 tokens. JSON (equivalent array of objects) ≈ 230 tokens. TOON ≈ 115 tokens. YAML is not just worse than TOON — it is worse than JSON for tabular data, because its indentation syntax adds tokens that JSON's braces do not. TOON wins by roughly 2.5× over YAML and 2× over JSON on this shape of data. Verify with the OpenAI tokenizer.

The reason YAML performs worse than JSON on tabular data is structural: YAML uses one line per key-value pair with indentation, so a 5-field object costs 5 lines plus the list marker. JSON at least wraps the whole object in one set of braces. TOON eliminates key repetition entirely — keys appear once, values are packed into rows. The savings compound with row count and field count.

Using TOON in Your Code

The @toon-format/toon package handles encoding and decoding:

bash

npm install @toon-format/toon

import { encode, decode } from '@toon-format/toon';

// Your dataset — could come from a database query, API response, anywhere
const users = [
  { id: 1001, email: '[email protected]', plan: 'pro',        mrr: 99.00,  country: 'US' },
  { id: 1002, email: '[email protected]',   plan: 'starter',    mrr: 19.00,  country: 'GB' },
  { id: 1003, email: '[email protected]', plan: 'enterprise', mrr: 499.00, country: 'DE' },
  // ...more rows
];

// Encode to TOON before inserting into your LLM prompt
const toonPayload = encode(users);
// users[3]{id,email,plan,mrr,country}:
//   1001,[email protected],pro,99.00,US
//   1002,[email protected],starter,19.00,GB
//   1003,[email protected],enterprise,499.00,DE

const prompt = `Analyse this user dataset and identify churn risk signals.
Return your findings as a TOON dataset with columns: id, riskScore, reason.

Dataset:
${toonPayload}`;

// After the LLM responds with TOON output, decode it back
const llmResponse = '...'; // TOON string from the model
const findings = decode(llmResponse);
console.log(findings[0]); // { id: 1001, riskScore: 'low', reason: 'Active, pro plan' }

Decision Guide

The choice between YAML and TOON is almost never ambiguous in practice:

Use YAML if a human will read or edit the file — CI/CD pipelines, Kubernetes manifests, Docker Compose, application config, Ansible playbooks.
Use YAML if you need inline comments to explain non-obvious values.
Use YAML if you need anchors and aliases to keep a complex config DRY.
Use YAML if you are working with a tool that expects YAML by convention (Helm, GitHub Actions, k8s kubectl apply).
Use TOON if the data is going into an LLM prompt — especially tabular data with multiple rows.
Use TOON if you are asking an LLM to return structured data and you want shorter, cheaper output.
Use TOON if token count matters — high-volume pipelines, long datasets, context window pressure.
Use TOON if the data is programmatically generated and no human will edit it directly.
Use JSON (not YAML or TOON) if you are building a REST API, storing data in a database, or integrating with third-party tooling that expects JSON.

Wrapping Up

YAML and TOON occupy completely different positions in your stack. YAML belongs in your repository alongside your code — config files, pipeline definitions, infrastructure manifests. TOON belongs at the boundary between your application and LLM APIs, where it converts your structured data into the most token-efficient representation before sending and converts the model's response back on the way out. There is no meaningful overlap between these jobs, which is why the question is not "which is better" but "which belongs here".

If you are working with TOON, the TOON Formatter and TOON Validator are the fastest way to inspect and verify TOON strings. The JSON to TOON converter converts existing JSON payloads into TOON for LLM use, and the TOON to JSON converter handles the return trip when a model responds in TOON and your downstream system expects JSON. See also the Wikipedia article on YAML for a concise history of the format and a summary of its known edge cases.

← All TOON articles Browse all categories →