TOON in Node.js — From File I/O to LLM Pipelines

You've read the docs, you know TOON cuts token counts in half on tabular data. Now you want to actually wire it into something. This article is about the plumbing: reading and writing .toon files, validating TOON at system boundaries, building an Express middleware that parses TOON request bodies, and assembling a database-to-prompt pipeline that feeds TOON directly to an LLM. Real code, real patterns — no toy examples.

Setup

Install the package from npm. It's ESM-only, so you'll need "type": "module" in your package.json or use a .mjs extension. Node.js 18+ is all you need — no config files, no plugins.

bash

npm install @toon-format/toon

import { encode, decode } from '@toon-format/toon';

// That's it. encode() → TOON string, decode() → JS value.

Reading and Writing TOON Files

The Node.js fs module handles the I/O. Pass the file contents straight into decode(), or pass your data to encode() and write the result to disk. Below are both patterns — sync for scripts and CLI tools, async for server routes.

// --- Sync (scripts, CLI tools) ---
import { readFileSync, writeFileSync } from 'fs';
import { encode, decode } from '@toon-format/toon';

// Read a .toon file and decode it to a JS value
const raw = readFileSync('./data/products.toon', 'utf8');
const products = decode(raw);
console.log(products); // → JS array or object

// Encode a JS value and write it to a .toon file
const inventory = [
  { sku: 'WDG-001', name: 'Widget A', qty: 142, price: 9.99 },
  { sku: 'WDG-002', name: 'Widget B', qty: 87,  price: 14.49 },
  { sku: 'GDG-001', name: 'Gadget X', qty: 31,  price: 49.99 },
];
writeFileSync('./data/inventory.toon', encode(inventory, { indent: 2 }), 'utf8');

// --- Async (server routes, pipelines) ---
import { promises as fs } from 'fs';
import { encode, decode } from '@toon-format/toon';

// Read
async function loadReportData(filePath) {
  const raw = await fs.readFile(filePath, 'utf8');
  return decode(raw); // throws if malformed — handle upstream
}

// Write
async function saveSnapshot(data, filePath) {
  const toon = encode(data, { indent: 2 });
  await fs.writeFile(filePath, toon, 'utf8');
}

Always use 'utf8' encoding. TOON files are plain text. Omitting the encoding argument returns a Buffer — decode() expects a string and will throw a type error if passed a Buffer.

Validating TOON at System Boundaries

decode() throws on invalid input, which is the right behaviour for a parser but inconvenient at an API boundary or message-queue consumer where you need a structured result, not an uncaught exception. The fix is a thin wrapper that turns the throw into a return value. This is the pattern you'll reach for in Express route handlers, queue processors, and anywhere else external data enters your system.

import { decode } from '@toon-format/toon';

/**
 * Safely parse a TOON string.
 * Returns { valid: true, data } on success,
 * or { valid: false, error } on failure — never throws.
 */
export function validateToon(input) {
  if (typeof input !== 'string') {
    return { valid: false, error: 'Input must be a string' };
  }
  try {
    const data = decode(input);
    return { valid: true, data };
  } catch (err) {
    return { valid: false, error: err.message };
  }
}

Usage in an Express route or a queue consumer looks identical — call validateToon(), branch on valid, and either proceed with data or return a 400 / dead-letter the message with the error string. The try/catch pattern keeps the calling code clean and predictable.

// Example: queue consumer
queue.process('ingest-toon', async (job) => {
  const result = validateToon(job.data.payload);
  if (!result.valid) {
    console.error('Rejecting malformed TOON:', result.error);
    return; // dead-letter, skip, or throw depending on your queue
  }
  await db.insert(result.data);
});

Building a TOON Middleware for Express

express.json() parses application/json bodies and puts the result on req.body. Here's the same thing for application/toon. Drop it in before your route handlers and the rest of the stack never knows the difference.

import { decode } from '@toon-format/toon';

/**
 * Express middleware: parses application/toon request bodies
 * and attaches the decoded value to req.body.
 */
export function toonBodyParser(req, res, next) {
  const contentType = req.headers['content-type'] ?? '';
  if (!contentType.includes('application/toon')) {
    return next(); // not our content type, pass through
  }

  let body = '';
  req.setEncoding('utf8');
  req.on('data', (chunk) => { body += chunk; });
  req.on('end', () => {
    try {
      req.body = decode(body);
      next();
    } catch (err) {
      res.status(400).json({ error: 'Invalid TOON body', detail: err.message });
    }
  });
  req.on('error', (err) => {
    res.status(500).json({ error: 'Request stream error', detail: err.message });
  });
}

// Wire it up:
// app.use(toonBodyParser);
// app.post('/api/import', (req, res) => {
//   // req.body is already the decoded JS value
//   res.json({ received: Array.isArray(req.body) ? req.body.length : 1 });
// });

Converting Database Results to TOON Before Sending to LLM

This is the pattern TOON was built for. You query the database, get back an array of rows, encode to TOON, and drop it straight into your prompt. The LLM gets all the structure with none of JSON's key-repetition overhead. Here's a realistic pipeline using node-postgres (pg):

import pg from 'pg';
import { encode } from '@toon-format/toon';

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

async function buildOrderPrompt(customerId) {
  // Step 1: query the database
  const { rows } = await pool.query(
    `SELECT order_id, created_at, status, total_cents, item_count
       FROM orders
      WHERE customer_id = $1
      ORDER BY created_at DESC
      LIMIT 50`,
    [customerId]
  );

  if (rows.length === 0) {
    return null;
  }

  // Step 2: encode rows to TOON
  // encode() handles all quoting automatically — no pre-processing needed
  const toonData = encode(rows, { indent: 2 });

  // Step 3: build the prompt
  return [
    'Analyse the following order history for a customer support case.',
    'Data is in TOON tabular format: name[count]{col1,col2,...}: followed by one row per line.',
    '',
    toonData,
    '',
    'Summarise any patterns that suggest the customer has a recurring issue.'
  ].join('\n');
}

// Calling code:
const prompt = await buildOrderPrompt('cust_8821');
if (prompt) {
  const reply = await callLlm(prompt); // your LLM client here
  console.log(reply);
}

The same pattern works for any SQL client or ORM — Prisma, Drizzle, Knex, Sequelize — as long as your query returns plain JS objects. encode() picks up the key names from the first row and uses them as column headers; subsequent rows are written as comma-separated values. A 50-row result set that would cost ~1,500 tokens as a JSON array typically costs ~600–700 tokens as TOON.

Handling Errors and Edge Cases

A few things worth knowing before you ship:

LLM returns malformed TOON. Models don't always reproduce a format perfectly, especially on first attempt. Wrap decode() in a try/catch (or use validateToon() from above). If it fails, log the raw response, return an error to the caller, and — if you need structured output reliably — add a retry with an explicit correction prompt: "Your last response was not valid TOON. Please reformat it."
Values containing commas or colons. TOON uses commas to separate values and colons in object syntax — both are significant characters. encode() detects these automatically and wraps the affected value in double quotes. You never need to pre-process your data; just pass raw strings.
Null and undefined. encode() serialises null as null (bare, unquoted) and omits undefined properties entirely — the same behaviour as JSON.stringify(). When decoding, bare null is returned as JS null.
Empty arrays. encode([]) returns a valid empty TOON array. decode() round-trips it cleanly. Guard upstream if your LLM prompt shouldn't include an empty dataset.
Very large result sets. There's no hard limit in the library, but LLMs have context-window limits. Paginate or LIMIT your queries before encoding — 100–200 rows is a reasonable ceiling for most prompts.

Validate before you store. If your pipeline accepts TOON from an external source (webhook, queue, API client) and stores the decoded result in a database, always run validateToon() first. Letting a malformed payload reach your DB layer makes debugging much harder than catching it at the boundary.

Wrapping Up

The patterns in this article cover most of what you need to integrate TOON into a real Node.js codebase: fs for file I/O, a validateToon() wrapper for safe boundary parsing, a drop-in Express middleware for application/toon bodies, and a DB-to-prompt pipeline that turns SQL rows into token-efficient LLM input. The library itself — @toon-format/toon — stays out of your way: two functions, no config, throws on invalid input. Use TOON Validator to check outputs during development, TOON Formatter to inspect encoded data, JSON to TOON to convert existing datasets before pasting them into a prompt, and TOON to JSON if you need to hand off a decoded response to a downstream system that expects JSON.

← All TOON articles Browse all categories →