Three formats walk into a planning meeting. JSON says it can handle anything. CSV says it's fast and lean. TOON says it's here to talk to the AI. They're all right — for different jobs. The frustrating part isn't that these formats exist, it's that picking the wrong one for your use case costs you in real, concrete ways: verbose payloads, broken imports, expensive LLM bills, or parsing headaches. This guide gives you a clear framework for choosing.
A Quick Portrait of Each Format
JSON (JavaScript Object Notation) emerged from JavaScript in the early 2000s and became the dominant format for web APIs thanks to its simplicity and expressiveness. It handles nested structures natively, distinguishes between strings, numbers, booleans, and nulls without any extra ceremony, and is specified by RFC 8259. Every modern language has a first-class JSON library.
CSV (Comma-Separated Values) is older than the web. It's defined by RFC 4180 and is essentially the lingua franca of flat tabular data. Open any CSV in Excel, Google Sheets, or Numbers and it just works. For pure flat tables, it's the most compact and universally importable format that exists.
TOON is a newer format built specifically for LLM workflows. It takes inspiration from both — like CSV it uses a header-once-then-rows structure for tabular data, but like JSON it can also encode nested objects and arrays. Its entire design is oriented around minimising token count when passing data to and from large language models.
The Same Data in All Three Formats
To make this concrete, let's use a product catalogue that includes a nested specs
object — a real-world shape that exercises all three formats meaningfully. Here are four products
from an electronics store:
JSON — expressive, handles nesting naturally, but verbose for repetitive row data:
[
{
"id": "P001",
"name": "Wireless Headphones",
"category": "Audio",
"price": 79.99,
"inStock": true,
"specs": { "weight": "250g", "connectivity": "Bluetooth 5.2", "battery": "30h" }
},
{
"id": "P002",
"name": "USB-C Docking Station",
"category": "Peripherals",
"price": 129.99,
"inStock": true,
"specs": { "ports": 11, "maxPower": "100W", "display": "4K@60Hz" }
},
{
"id": "P003",
"name": "Mechanical Keyboard",
"category": "Input",
"price": 94.99,
"inStock": false,
"specs": { "layout": "TKL", "switches": "Cherry MX Red", "backlight": "RGB" }
},
{
"id": "P004",
"name": "27" IPS Monitor",
"category": "Display",
"price": 299.99,
"inStock": true,
"specs": { "resolution": "2560x1440", "refreshRate": "165Hz", "panel": "IPS" }
}
]CSV — compact, Excel-friendly, but the nested specs object has
to be flattened. No standard way to represent it, so we either lose the nesting or mangle it into
a string:
id,name,category,price,inStock,specs_weight,specs_ports,specs_layout,specs_resolution
P001,Wireless Headphones,Audio,79.99,true,250g,,,
P002,USB-C Docking Station,Peripherals,129.99,true,,11,,
P003,Mechanical Keyboard,Input,94.99,false,,,TKL,
P004,27" IPS Monitor,Display,299.99,true,,,,2560x1440TOON — header declared once, rows are compact, nested objects encoded inline:
products[4]{id,name,category,price,inStock,specs}:
P001,Wireless Headphones,Audio,79.99,true,{weight:250g,connectivity:Bluetooth 5.2,battery:30h}
P002,USB-C Docking Station,Peripherals,129.99,true,{ports:11,maxPower:100W,display:4K@60Hz}
P003,Mechanical Keyboard,Input,94.99,false,{layout:TKL,switches:Cherry MX Red,backlight:RGB}
P004,27" IPS Monitor,Display,299.99,true,{resolution:2560x1440,refreshRate:165Hz,panel:IPS}Where Each Format Breaks Down
Understanding the failure modes is as important as knowing the strengths.
- JSON breaks down for tables. Repeating every key name on every row is genuinely wasteful. A 1000-row dataset where each row has 8 keys means "id", "name", "price" and so on written 1000 times each. For LLM input this translates directly to token cost; for human inspection it's just noise.
- CSV breaks down for nested data. The format has no concept of nesting.
You can stringify a nested object into a cell, but then the consumer has to know to parse that
cell — you've just moved the problem. CSV also has no native type system:
trueis a string,42is a string,nullis ambiguous. Tools handle this inconsistently. - TOON breaks down outside LLM contexts. It's a niche format with a narrower ecosystem. You won't find native TOON support in PostgreSQL, in your REST framework, or in Excel. The npm package covers JavaScript/TypeScript workflows well, but if you're reaching for TOON in a context that has nothing to do with AI, you're probably over-engineering it.
- CSV breaks down for values containing commas or newlines. The RFC 4180 quoting
rules handle it, but many CSV producers and consumers implement them inconsistently. A product
name like
27" IPS Monitor, Blackor a description with line breaks becomes a reliability hazard.
Decision Matrix
Here's a practical guide. Match your situation to the appropriate format:
- Your data is going into an LLM prompt → TOON. Especially if it's tabular. Use JSON to TOON to convert before the API call.
- Your data comes back from an LLM and needs to be processed → TOON (for output), then decode to JSON or a native object for downstream use with TOON to JSON.
- Your data is purely flat tabular (same keys on every row) and goes to/from spreadsheets → CSV. It's the smallest representation and universally importable.
- Your data is flat tabular but will be transformed by code (not spreadsheets)
→ Either CSV or JSON. JSON is safer if types matter (you don't want
"true"instead oftrue). - Your data has nested objects or arrays → JSON. Don't fight CSV's flat-only limitation. If nesting goes to an LLM, encode to TOON after building the structure.
- Your data is going to a REST API or stored in a database → JSON. Always.
- Your data is a config file → JSON or YAML. Neither CSV nor TOON belongs here.
- You need maximum portability and a zero-dependency parser → JSON or CSV. Both have native or near-native support everywhere.
Using All Three in One Pipeline
A realistic pipeline might use all three. Imagine an e-commerce workflow that enriches product data with AI-generated descriptions:
import { encode, decode } from '@toon-format/toon';
// Step 1: Products arrive as JSON from your REST API
const products = await fetch('/api/products').then(r => r.json());
// Step 2: Encode to TOON to minimise tokens before the LLM call
const toonInput = encode(products);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: `Here is our product catalogue in TOON format:\n\n${toonInput}\n\n
For each product, write a one-sentence marketing description.
Return results as TOON with fields: id, description`
}
]
});
// Step 3: Decode the LLM's TOON response back to objects
const enriched = decode(response.choices[0].message.content);
// Step 4: Export to CSV for the marketing team's spreadsheet
const csvRows = enriched.map(row => `${row.id},"${row.description}"`);
const csv = ['id,description', ...csvRows].join('\n');JSON for the API layer, TOON through the LLM layer, CSV for the human-consumable output. Each format is doing exactly the job it was designed for.
Quick Tool Reference
If you're moving between these formats, these tools will cover the common conversions. The TOON Formatter is the quickest way to validate and clean up TOON strings. The JSON to TOON converter handles the LLM preparation step. Going the other direction after an LLM returns TOON, TOON to CSV can produce a spreadsheet-ready export directly, and CSV to JSON is the go-to for normalising imports from Excel or third-party data providers. For TOON-specific details, the official package lives on npm.
Wrapping Up
JSON, CSV, and TOON each have a clear domain. JSON is the universal format for structured data exchange — nested or flat, APIs, config, storage. CSV is the universal format for flat tabular data that needs to travel between systems and humans — spreadsheets, imports, exports. TOON is the format for passing structured data through AI systems efficiently, where every token counts. The mistake most developers make is defaulting to JSON for everything including LLM prompts, or defaulting to CSV and then discovering nesting requirements mid-project. Know the shape of your data, know where it's going, and the right format usually picks itself.
For a deeper dive into the JSON vs TOON tradeoff specifically, check out the TOON vs JSON comparison. For background on the CSV spec and its quirks, the Wikipedia article on CSV covers the history and format variations well.