Node.jsでTOON — ファイルI/OからLLMパイプラインまで

ドキュメントを読み、TOONが表形式データのトークン数を半分に削減することは理解しました。今度は実際にそれを何かに組み込みたいと思っています。この記事は配管作業についてです：.toonファイルの読み書き、システム境界でのTOONの検証、TOONリクエストボディを解析するExpressミドルウェアの構築、そしてTOONをLLMに直接供給するデータベース-からプロンプトへのパイプラインの組み立て。実際のコード、実際のパターン — おもちゃの例はありません。

セットアップ

npmからパッケージをインストールします。ESMのみのため、package.jsonに"type": "module"が必要か、.mjs拡張子を使用する必要があります。Node.js18以上が必要なだけです — 設定ファイルもプラグインも不要です。

bash

npm install @toon-format/toon

import { encode, decode } from '@toon-format/toon';

// That's it. encode() → TOON string, decode() → JS value.

TOONファイルの読み書き

Node.jsのfsモジュールがI/Oを処理します。ファイルの内容を直接decode()に渡すか、データをencode()に渡して結果をディスクに書き込みます。以下に両方のパターンを示します — スクリプトとCLIツールにはsync、サーバールートにはasyncです。

// --- Sync (scripts, CLI tools) ---
import { readFileSync, writeFileSync } from 'fs';
import { encode, decode } from '@toon-format/toon';

// Read a .toon file and decode it to a JS value
const raw = readFileSync('./data/products.toon', 'utf8');
const products = decode(raw);
console.log(products); // → JS array or object

// Encode a JS value and write it to a .toon file
const inventory = [
  { sku: 'WDG-001', name: 'Widget A', qty: 142, price: 9.99 },
  { sku: 'WDG-002', name: 'Widget B', qty: 87,  price: 14.49 },
  { sku: 'GDG-001', name: 'Gadget X', qty: 31,  price: 49.99 },
];
writeFileSync('./data/inventory.toon', encode(inventory, { indent: 2 }), 'utf8');

// --- Async (server routes, pipelines) ---
import { promises as fs } from 'fs';
import { encode, decode } from '@toon-format/toon';

// Read
async function loadReportData(filePath) {
  const raw = await fs.readFile(filePath, 'utf8');
  return decode(raw); // throws if malformed — handle upstream
}

// Write
async function saveSnapshot(data, filePath) {
  const toon = encode(data, { indent: 2 });
  await fs.writeFile(filePath, toon, 'utf8');
}

常に'utf8'エンコーディングを使用してください。 TOONファイルはプレーンテキストです。エンコーディング引数を省略するとBufferが返されます — decode()は文字列を期待しており、Bufferが渡された場合は型エラーをスローします。

システム境界でのTOONの検証

decode()は無効な入力に対して例外をスローします。これはパーサーとしては正しい動作ですが、構造化された結果が必要でキャッチされない例外は困るAPI境界やメッセージキューコンシューマーでは不便です。解決策は、スローを戻り値に変換する薄いラッパーです。これはExpressのルートハンドラー、キュープロセッサー、そして外部データがシステムに入る場所ならどこでも使うパターンです。

import { decode } from '@toon-format/toon';

/**
 * Safely parse a TOON string.
 * Returns { valid: true, data } on success,
 * or { valid: false, error } on failure — never throws.
 */
export function validateToon(input) {
  if (typeof input !== 'string') {
    return { valid: false, error: 'Input must be a string' };
  }
  try {
    const data = decode(input);
    return { valid: true, data };
  } catch (err) {
    return { valid: false, error: err.message };
  }
}

Expressルートやキューコンシューマーでの使用方法は同じです — validateToon()を呼び出し、validで分岐し、dataで処理を続けるか、error文字列と共に400を返す/メッセージをデッドレターにします。try/catchパターンは呼び出しコードを清潔で予測可能に保ちます。

// Example: queue consumer
queue.process('ingest-toon', async (job) => {
  const result = validateToon(job.data.payload);
  if (!result.valid) {
    console.error('Rejecting malformed TOON:', result.error);
    return; // dead-letter, skip, or throw depending on your queue
  }
  await db.insert(result.data);
});

ExpressのTOONミドルウェアを構築する

express.json()はapplication/jsonボディを解析して結果をreq.bodyに格納します。ここではapplication/toonのための同じものを示します。ルートハンドラーの前に配置すれば、残りのスタックは違いに気づきません。

import { decode } from '@toon-format/toon';

/**
 * Express middleware: parses application/toon request bodies
 * and attaches the decoded value to req.body.
 */
export function toonBodyParser(req, res, next) {
  const contentType = req.headers['content-type'] ?? '';
  if (!contentType.includes('application/toon')) {
    return next(); // not our content type, pass through
  }

  let body = '';
  req.setEncoding('utf8');
  req.on('data', (chunk) => { body += chunk; });
  req.on('end', () => {
    try {
      req.body = decode(body);
      next();
    } catch (err) {
      res.status(400).json({ error: 'Invalid TOON body', detail: err.message });
    }
  });
  req.on('error', (err) => {
    res.status(500).json({ error: 'Request stream error', detail: err.message });
  });
}

// Wire it up:
// app.use(toonBodyParser);
// app.post('/api/import', (req, res) => {
//   // req.body is already the decoded JS value
//   res.json({ received: Array.isArray(req.body) ? req.body.length : 1 });
// });

データベースの結果をTOONに変換してからLLMに送る

これがTOONが構築された目的のパターンです。データベースにクエリし、行の配列を取得し、TOONにエンコードし、プロンプトに直接組み込みます。LLMはJSONのキー繰り返しオーバーヘッドなしにすべての構造を受け取ります。node-postgres (pg)を使用した現実的なパイプラインを示します：

import pg from 'pg';
import { encode } from '@toon-format/toon';

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

async function buildOrderPrompt(customerId) {
  // Step 1: query the database
  const { rows } = await pool.query(
    `SELECT order_id, created_at, status, total_cents, item_count
       FROM orders
      WHERE customer_id = $1
      ORDER BY created_at DESC
      LIMIT 50`,
    [customerId]
  );

  if (rows.length === 0) {
    return null;
  }

  // Step 2: encode rows to TOON
  // encode() handles all quoting automatically — no pre-processing needed
  const toonData = encode(rows, { indent: 2 });

  // Step 3: build the prompt
  return [
    'Analyse the following order history for a customer support case.',
    'Data is in TOON tabular format: name[count]{col1,col2,...}: followed by one row per line.',
    '',
    toonData,
    '',
    'Summarise any patterns that suggest the customer has a recurring issue.'
  ].join('\n');
}

// Calling code:
const prompt = await buildOrderPrompt('cust_8821');
if (prompt) {
  const reply = await callLlm(prompt); // your LLM client here
  console.log(reply);
}

同じパターンは任意のSQLクライアントやORM — Prisma、Drizzle、Knex、Sequelize — でも機能します。クエリがプレーンなJSオブジェクトを返す限り。encode()は最初の行からキー名を取得し、それらを列ヘッダーとして使用します；後続の行はカンマ区切りの値として書き込まれます。JSONの配列として~1,500トークンかかる50行の結果セットは、通常TOONでは~600-700トークンです。

エラーとエッジケースの処理

デプロイ前に知っておく価値のあることがいくつかあります：

LLMが不正なTOONを返す。 モデルは常に形式を完璧に再現するわけではありません、特に最初の試みでは。decode()をtry/catchで囲みます（または上記のvalidateToon()を使用）。失敗した場合は、生のレスポンスをログに記録し、呼び出し元にエラーを返し、そして — 構造化された出力を確実に必要とする場合 — 明示的な修正プロンプトでリトライを追加します："最後のレスポンスは有効なTOONではありませんでした。再フォーマットしてください。"
カンマまたはコロンを含む値。 TOONは値の区切りにカンマを、オブジェクト構文にコロンを使用します — 両方とも重要な文字です。encode()はこれらを自動的に検出し、影響を受ける値をダブルクォートで囲みます。データを前処理する必要はありません；生の文字列を渡すだけです。
NullとUndefined。 encode()はnullをnull（裸、引用符なし）としてシリアライズし、undefinedプロパティを完全に省略します — JSON.stringify()と同じ動作です。デコード時、裸のnullはJSnullとして返されます。
空の配列。 encode([])は有効な空のTOON配列を返します。decode()はそれをきれいにラウンドトリップします。LLMプロンプトに空のデータセットを含めるべきでない場合は上流で保護してください。
非常に大きな結果セット。 ライブラリには厳格な制限はありませんが、LLMにはコンテキストウィンドウの制限があります。エンコードする前にクエリをページネートするかLIMITを使用してください — ほとんどのプロンプトには100-200行が妥当な上限です。

保存前に検証してください。 パイプラインが外部ソース（webhook、キュー、APIクライアント）からTOONを受け取り、デコードされた結果をデータベースに保存する場合は、常に最初にvalidateToon()を実行してください。不正なペイロードがDBレイヤーに到達することを許すと、境界で検出するよりもデバッグがはるかに難しくなります。

まとめ

この記事のパターンは、本物のNode.jsコードベースにTOONを統合するために必要なほとんどのことをカバーしています：ファイルI/Oにはfs、安全な境界解析にはvalidateToon()ラッパー、application/toonボディ用のドロップイン型Expressミドルウェア、そしてSQLの行をトークン効率の良いLLM入力に変換するDB-からプロンプトへのパイプライン。ライブラリ自体 — @toon-format/toon— は邪魔をしません：2つの関数、設定なし、無効な入力にはスロー。開発中に出力を確認するにはTOONバリデーターを、エンコードされたデータを検査するにはTOONフォーマッターを、プロンプトに貼り付ける前に既存のデータセットを変換するにはJSON to TOONを、そしてJSONを期待する下流のシステムにデコードされたレスポンスを渡す必要がある場合はTOON to JSONを使用してください。

← All TOON articles Browse all categories →