lenzcom's picture
Upload folder using huggingface_hub
e706de2 verified

Output Parsers: Structured Output Extraction

Part 2: Composition - Lesson 2

LLMs return text. You need data.

Overview

You've learned to create great prompts. LLMs return unstructured text, and in some cases you might need structured data:

// LLM returns this:
"The sentiment is positive with a confidence of 0.92"

// You need this:
{
    sentiment: "positive",
    confidence: 0.92
}

Output parsers transform LLM text into structured data you can use in your applications.

Why This Matters

The Problem: Parsing Chaos

Without parsers, your code is full of brittle string manipulation:

const response = await llm.invoke("Classify: I love this product!");

// Fragile parsing code everywhere
if (response.includes("positive")) {
    sentiment = "positive";
} else if (response.includes("negative")) {
    sentiment = "negative";
}

// What if format changes?
// What if LLM adds extra text?
// How do you handle errors?

Problems:

  • Brittle regex and string matching
  • No validation of output format
  • Hard to test parsing logic
  • Inconsistent error handling
  • Parser code duplicated everywhere

The Solution: Output Parsers

const parser = new JsonOutputParser();

const prompt = new PromptTemplate({
    template: `Classify the sentiment. Respond in JSON:
{{"sentiment": "positive/negative/neutral", "confidence": 0.0-1.0}}

Text: {text}`,
    inputVariables: ["text"]
});

const chain = prompt.pipe(llm).pipe(parser);

const result = await chain.invoke({ text: "I love this!" });
// { sentiment: "positive", confidence: 0.95 }

Benefits:

  • βœ… Reliable structured extraction
  • βœ… Format validation
  • βœ… Error handling built-in
  • βœ… Reusable parsing logic
  • βœ… Type-safe outputs

Learning Objectives

By the end of this lesson, you will:

  • βœ… Build a BaseOutputParser abstraction
  • βœ… Create a StringOutputParser for text cleanup
  • βœ… Implement JsonOutputParser for JSON extraction
  • βœ… Build ListOutputParser for arrays
  • βœ… Create StructuredOutputParser with schemas
  • βœ… Use parsers in chains with prompts
  • βœ… Handle parsing errors gracefully

Core Concepts

What is an Output Parser?

An output parser transforms LLM text output into structured data.

Flow:

LLM Output (text) β†’ Parser β†’ Structured Data
    ↓                ↓              ↓
"positive: 0.95"  parse()    {sentiment: "positive", confidence: 0.95}

The Parser Hierarchy

BaseOutputParser (abstract)
    β”œβ”€β”€ StringOutputParser (clean text)
    β”œβ”€β”€ JsonOutputParser (extract JSON)
    β”œβ”€β”€ ListOutputParser (extract lists)
    β”œβ”€β”€ RegexOutputParser (regex patterns)
    └── StructuredOutputParser (schema validation)

Each parser handles a specific output format.

Key Operations

  1. Parse: Extract structured data from text
  2. Get Format Instructions: Tell LLM how to format response
  3. Validate: Check output matches expected structure
  4. Handle Errors: Gracefully handle malformed outputs

Implementation Guide

Step 1: Base Output Parser

Location: src/output-parsers/base-parser.js

This is the abstract base class all parsers inherit from.

What it does:

  • Defines the interface for all parsers
  • Extends Runnable (so parsers work in chains)
  • Provides format instruction generation
  • Handles parsing errors

Implementation:

import { Runnable } from '../core/runnable.js';

/**
 * Base class for all output parsers
 * Transforms LLM text output into structured data
 */
export class BaseOutputParser extends Runnable {
    constructor() {
        super();
        this.name = this.constructor.name;
    }

    /**
     * Parse the LLM output into structured data
     * @abstract
     * @param {string} text - Raw LLM output
     * @returns {Promise<any>} Parsed data
     */
    async parse(text) {
        throw new Error(`${this.name} must implement parse()`);
    }

    /**
     * Get instructions for the LLM on how to format output
     * @returns {string} Format instructions
     */
    getFormatInstructions() {
        return '';
    }

    /**
     * Runnable interface: parse the output
     */
    async _call(input, config) {
        // Input can be a string or a Message
        const text = typeof input === 'string' 
            ? input 
            : input.content;
        
        return await this.parse(text);
    }

    /**
     * Parse with error handling
     */
    async parseWithPrompt(text, prompt) {
        try {
            return await this.parse(text);
        } catch (error) {
            throw new OutputParserException(
                `Failed to parse output from prompt: ${error.message}`,
                text,
                error
            );
        }
    }
}

/**
 * Exception thrown when parsing fails
 */
export class OutputParserException extends Error {
    constructor(message, llmOutput, originalError) {
        super(message);
        this.name = 'OutputParserException';
        this.llmOutput = llmOutput;
        this.originalError = originalError;
    }
}

Key insights:

  • Extends Runnable so parsers can be piped in chains
  • _call extracts text from strings or Messages
  • getFormatInstructions() helps prompt the LLM
  • Error handling wraps parse failures with context

Step 2: String Output Parser

Location: src/output-parsers/string-parser.js

The simplest parser - cleans up text output.

What it does:

  • Strips leading/trailing whitespace
  • Optionally removes markdown code blocks
  • Returns clean string

Use when:

  • You just need clean text
  • No structure needed
  • Want to remove formatting artifacts

Implementation:

import { BaseOutputParser } from './base-parser.js';

/**
 * Parser that returns cleaned string output
 * Strips whitespace and optionally removes markdown
 * 
 * Example:
 *   const parser = new StringOutputParser();
 *   const result = await parser.parse("  Hello World  ");
 *   // "Hello World"
 */
export class StringOutputParser extends BaseOutputParser {
    constructor(options = {}) {
        super();
        this.stripMarkdown = options.stripMarkdown ?? true;
    }

    /**
     * Parse: clean the text
     */
    async parse(text) {
        let cleaned = text.trim();

        if (this.stripMarkdown) {
            cleaned = this._stripMarkdownCodeBlocks(cleaned);
        }

        return cleaned;
    }

    /**
     * Remove markdown code blocks (```code```)
     */
    _stripMarkdownCodeBlocks(text) {
        // Remove ```language\ncode\n```
        return text.replace(/```[\w]*\n([\s\S]*?)\n```/g, '$1').trim();
    }

    getFormatInstructions() {
        return 'Respond with plain text. No markdown formatting.';
    }
}

Usage:

const parser = new StringOutputParser();

// Handles various formats
await parser.parse("  Hello  ");           // "Hello"
await parser.parse("```\ncode\n```");      // "code"
await parser.parse("   \n  Text  \n   "); // "Text"

Step 3: JSON Output Parser

Location: src/output-parsers/json-parser.js

Extracts and validates JSON from LLM output.

What it does:

  • Finds JSON in text (handles markdown, extra text)
  • Parses and validates JSON
  • Optionally validates against a schema

Use when:

  • Need structured objects
  • Want type-safe data
  • Need validation

Implementation:

import { BaseOutputParser, OutputParserException } from './base-parser.js';

/**
 * Parser that extracts JSON from LLM output
 * Handles markdown code blocks and extra text
 * 
 * Example:
 *   const parser = new JsonOutputParser();
 *   const result = await parser.parse('```json\n{"name": "Alice"}\n```');
 *   // { name: "Alice" }
 */
export class JsonOutputParser extends BaseOutputParser {
    constructor(options = {}) {
        super();
        this.schema = options.schema;
    }

    /**
     * Parse JSON from text
     */
    async parse(text) {
        try {
            // Try to extract JSON from the text
            const jsonText = this._extractJson(text);
            const parsed = JSON.parse(jsonText);

            // Validate against schema if provided
            if (this.schema) {
                this._validateSchema(parsed);
            }

            return parsed;
        } catch (error) {
            throw new OutputParserException(
                `Failed to parse JSON: ${error.message}`,
                text,
                error
            );
        }
    }

    /**
     * Extract JSON from text (handles markdown, extra text)
     */
    _extractJson(text) {
        // Try direct parse first
        try {
            JSON.parse(text.trim());
            return text.trim();
        } catch {
            // Not direct JSON, try to find it
        }

        // Look for JSON in markdown code blocks
        const markdownMatch = text.match(/```(?:json)?\s*\n?([\s\S]*?)\n?```/);
        if (markdownMatch) {
            return markdownMatch[1].trim();
        }

        // Look for JSON object/array patterns
        const jsonObjectMatch = text.match(/\{[\s\S]*\}/);
        if (jsonObjectMatch) {
            return jsonObjectMatch[0];
        }

        const jsonArrayMatch = text.match(/\[[\s\S]*\]/);
        if (jsonArrayMatch) {
            return jsonArrayMatch[0];
        }

        // Give up, return original
        return text.trim();
    }

    /**
     * Validate parsed JSON against schema
     */
    _validateSchema(parsed) {
        if (!this.schema) return;

        for (const [key, type] of Object.entries(this.schema)) {
            if (!(key in parsed)) {
                throw new Error(`Missing required field: ${key}`);
            }

            const actualType = typeof parsed[key];
            if (actualType !== type) {
                throw new Error(
                    `Field ${key} should be ${type}, got ${actualType}`
                );
            }
        }
    }

    getFormatInstructions() {
        let instructions = 'Respond with valid JSON.';
        
        if (this.schema) {
            const schemaDesc = Object.entries(this.schema)
                .map(([key, type]) => `"${key}": ${type}`)
                .join(', ');
            instructions += ` Schema: { ${schemaDesc} }`;
        }

        return instructions;
    }
}

Usage:

const parser = new JsonOutputParser({
    schema: {
        name: 'string',
        age: 'number',
        active: 'boolean'
    }
});

// Handles various JSON formats
await parser.parse('{"name": "Alice", "age": 30, "active": true}');
await parser.parse('```json\n{"name": "Bob", "age": 25, "active": false}\n```');
await parser.parse('Sure! Here\'s the data: {"name": "Charlie", "age": 35, "active": true}');

Step 4: List Output Parser

Location: src/output-parsers/list-parser.js

Extracts lists/arrays from text.

What it does:

  • Parses numbered lists, bullet points, comma-separated
  • Returns array of items
  • Cleans each item

Use when:

  • Need arrays of strings
  • LLM outputs lists
  • Want simple arrays

Implementation:

import { BaseOutputParser } from './base-parser.js';

/**
 * Parser that extracts lists from text
 * Handles: numbered lists, bullets, comma-separated
 * 
 * Example:
 *   const parser = new ListOutputParser();
 *   const result = await parser.parse("1. Apple\n2. Banana\n3. Orange");
 *   // ["Apple", "Banana", "Orange"]
 */
export class ListOutputParser extends BaseOutputParser {
    constructor(options = {}) {
        super();
        this.separator = options.separator;
    }

    /**
     * Parse list from text
     */
    async parse(text) {
        const cleaned = text.trim();

        // If separator specified, use it
        if (this.separator) {
            return cleaned
                .split(this.separator)
                .map(item => item.trim())
                .filter(item => item.length > 0);
        }

        // Try to detect format
        if (this._isNumberedList(cleaned)) {
            return this._parseNumberedList(cleaned);
        }

        if (this._isBulletList(cleaned)) {
            return this._parseBulletList(cleaned);
        }

        // Try comma-separated
        if (cleaned.includes(',')) {
            return cleaned
                .split(',')
                .map(item => item.trim())
                .filter(item => item.length > 0);
        }

        // Try newline-separated
        return cleaned
            .split('\n')
            .map(item => item.trim())
            .filter(item => item.length > 0);
    }

    /**
     * Check if text is numbered list (1. Item\n2. Item)
     */
    _isNumberedList(text) {
        return /^\d+\./.test(text);
    }

    /**
     * Check if text is bullet list (- Item\n- Item or * Item)
     */
    _isBulletList(text) {
        return /^[-*β€’]/.test(text);
    }

    /**
     * Parse numbered list
     */
    _parseNumberedList(text) {
        return text
            .split('\n')
            .map(line => line.replace(/^\d+\.\s*/, '').trim())
            .filter(item => item.length > 0);
    }

    /**
     * Parse bullet list
     */
    _parseBulletList(text) {
        return text
            .split('\n')
            .map(line => line.replace(/^[-*β€’]\s*/, '').trim())
            .filter(item => item.length > 0);
    }

    getFormatInstructions() {
        if (this.separator) {
            return `Respond with items separated by "${this.separator}".`;
        }
        return 'Respond with a numbered list (1. Item) or bullet list (- Item).';
    }
}

Usage:

const parser = new ListOutputParser();

// Handles various list formats
await parser.parse("1. Apple\n2. Banana\n3. Orange");
// ["Apple", "Banana", "Orange"]

await parser.parse("- Red\n- Green\n- Blue");
// ["Red", "Green", "Blue"]

await parser.parse("cat, dog, bird");
// ["cat", "dog", "bird"]

// Custom separator
const csvParser = new ListOutputParser({ separator: ',' });
await csvParser.parse("apple,banana,orange");
// ["apple", "banana", "orange"]

Step 5: Regex Output Parser

Location: src/output-parsers/regex-parser.js

Uses regex patterns to extract structured data.

What it does:

  • Applies regex to extract groups
  • Maps groups to field names
  • Returns structured object

Use when:

  • Output has predictable patterns
  • Need custom extraction logic
  • Regex is simplest solution

Implementation:

import { BaseOutputParser, OutputParserException } from './base-parser.js';

/**
 * Parser that uses regex to extract structured data
 * 
 * Example:
 *   const parser = new RegexOutputParser({
 *       regex: /Sentiment: (\w+), Confidence: ([\d.]+)/,
 *       outputKeys: ["sentiment", "confidence"]
 *   });
 *   
 *   const result = await parser.parse("Sentiment: positive, Confidence: 0.92");
 *   // { sentiment: "positive", confidence: "0.92" }
 */
export class RegexOutputParser extends BaseOutputParser {
    constructor(options = {}) {
        super();
        this.regex = options.regex;
        this.outputKeys = options.outputKeys || [];
        this.dotAll = options.dotAll ?? false;

        if (this.dotAll) {
            // Add 's' flag for dotAll if not present
            const flags = this.regex.flags.includes('s') 
                ? this.regex.flags 
                : this.regex.flags + 's';
            this.regex = new RegExp(this.regex.source, flags);
        }
    }

    /**
     * Parse using regex
     */
    async parse(text) {
        const match = text.match(this.regex);

        if (!match) {
            throw new OutputParserException(
                `Text does not match regex pattern: ${this.regex}`,
                text
            );
        }

        // If no output keys, return the groups as array
        if (this.outputKeys.length === 0) {
            return match.slice(1); // Exclude full match
        }

        // Map groups to keys
        const result = {};
        for (let i = 0; i < this.outputKeys.length; i++) {
            result[this.outputKeys[i]] = match[i + 1]; // +1 to skip full match
        }

        return result;
    }

    getFormatInstructions() {
        if (this.outputKeys.length > 0) {
            return `Format your response to match: ${this.outputKeys.join(', ')}`;
        }
        return 'Follow the specified format exactly.';
    }
}

Usage:

const parser = new RegexOutputParser({
    regex: /Sentiment: (\w+), Confidence: ([\d.]+)/,
    outputKeys: ["sentiment", "confidence"]
});

const result = await parser.parse("Sentiment: positive, Confidence: 0.92");
// { sentiment: "positive", confidence: "0.92" }

Output Parsers: Advanced Patterns & Integration

Advanced Parser: Structured Output Parser

Step 6: Structured Output Parser

Location: src/output-parsers/structured-parser.js

The most powerful parser - validates against a full schema with types and descriptions.

What it does:

  • Defines expected schema with types
  • Generates format instructions for LLM
  • Validates all fields and types
  • Provides detailed error messages

Use when:

  • Need complex structured data
  • Want strong type validation
  • Need to generate format instructions automatically

Implementation:

import { BaseOutputParser, OutputParserException } from './base-parser.js';

/**
 * Parser with full schema validation
 * 
 * Example:
 *   const parser = new StructuredOutputParser({
 *       responseSchemas: [
 *           {
 *               name: "sentiment",
 *               type: "string",
 *               description: "The sentiment (positive/negative/neutral)",
 *               enum: ["positive", "negative", "neutral"]
 *           },
 *           {
 *               name: "confidence",
 *               type: "number",
 *               description: "Confidence score between 0 and 1"
 *           }
 *       ]
 *   });
 */
export class StructuredOutputParser extends BaseOutputParser {
    constructor(options = {}) {
        super();
        this.responseSchemas = options.responseSchemas || [];
    }

    /**
     * Parse and validate against schema
     */
    async parse(text) {
        try {
            // Extract JSON
            const jsonText = this._extractJson(text);
            const parsed = JSON.parse(jsonText);

            // Validate against schema
            this._validateAgainstSchema(parsed);

            return parsed;
        } catch (error) {
            throw new OutputParserException(
                `Failed to parse structured output: ${error.message}`,
                text,
                error
            );
        }
    }

    /**
     * Extract JSON from text (same as JsonOutputParser)
     */
    _extractJson(text) {
        try {
            JSON.parse(text.trim());
            return text.trim();
        } catch {}

        const markdownMatch = text.match(/```(?:json)?\s*\n?([\s\S]*?)\n?```/);
        if (markdownMatch) return markdownMatch[1].trim();

        const jsonMatch = text.match(/\{[\s\S]*\}/);
        if (jsonMatch) return jsonMatch[0];

        return text.trim();
    }

    /**
     * Validate parsed data against schema
     */
    _validateAgainstSchema(parsed) {
        for (const schema of this.responseSchemas) {
            const { name, type, enum: enumValues, required = true } = schema;

            // Check required fields
            if (required && !(name in parsed)) {
                throw new Error(`Missing required field: ${name}`);
            }

            if (name in parsed) {
                const value = parsed[name];

                // Check type
                if (!this._checkType(value, type)) {
                    throw new Error(
                        `Field ${name} should be ${type}, got ${typeof value}`
                    );
                }

                // Check enum values
                if (enumValues && !enumValues.includes(value)) {
                    throw new Error(
                        `Field ${name} must be one of: ${enumValues.join(', ')}`
                    );
                }
            }
        }
    }

    /**
     * Check if value matches expected type
     */
    _checkType(value, type) {
        switch (type) {
            case 'string':
                return typeof value === 'string';
            case 'number':
                return typeof value === 'number' && !isNaN(value);
            case 'boolean':
                return typeof value === 'boolean';
            case 'array':
                return Array.isArray(value);
            case 'object':
                return typeof value === 'object' && value !== null && !Array.isArray(value);
            default:
                return true;
        }
    }

    /**
     * Generate format instructions for LLM
     */
    getFormatInstructions() {
        const schemaDescriptions = this.responseSchemas.map(schema => {
            let desc = `"${schema.name}": ${schema.type}`;
            if (schema.description) {
                desc += ` // ${schema.description}`;
            }
            if (schema.enum) {
                desc += ` (one of: ${schema.enum.join(', ')})`;
            }
            return desc;
        });

        return `Respond with valid JSON matching this schema:
{
${schemaDescriptions.map(d => '  ' + d).join(',\n')}
}`;
    }

    /**
     * Static helper to create from simple schema
     */
    static fromNamesAndDescriptions(schemas) {
        const responseSchemas = Object.entries(schemas).map(([name, description]) => ({
            name,
            description,
            type: 'string' // Default type
        }));

        return new StructuredOutputParser({ responseSchemas });
    }
}

Usage:

const parser = new StructuredOutputParser({
    responseSchemas: [
        {
            name: "sentiment",
            type: "string",
            description: "The sentiment of the text",
            enum: ["positive", "negative", "neutral"],
            required: true
        },
        {
            name: "confidence",
            type: "number",
            description: "Confidence score from 0 to 1",
            required: true
        },
        {
            name: "keywords",
            type: "array",
            description: "Key themes in the text",
            required: false
        }
    ]
});

// Get format instructions to add to prompt
const instructions = parser.getFormatInstructions();
console.log(instructions);

// Parse and validate
const result = await parser.parse(`{
    "sentiment": "positive",
    "confidence": 0.92,
    "keywords": ["great", "love", "excellent"]
}`);

Real-World Examples

Example 1: Email Classification with Structured Parser

import { StructuredOutputParser } from './output-parsers/structured-parser.js';
import { PromptTemplate } from './prompts/prompt-template.js';
import { LlamaCppLLM } from './llm/llama-cpp-llm.js';

// Define the output structure
const parser = new StructuredOutputParser({
    responseSchemas: [
        {
            name: "category",
            type: "string",
            description: "Email category",
            enum: ["spam", "invoice", "meeting", "urgent", "personal", "other"]
        },
        {
            name: "confidence",
            type: "number",
            description: "Confidence score (0-1)"
        },
        {
            name: "reason",
            type: "string",
            description: "Brief explanation for classification"
        },
        {
            name: "actionRequired",
            type: "boolean",
            description: "Does email require action?"
        }
    ]
});

// Build prompt with format instructions
const prompt = new PromptTemplate({
    template: `Classify this email.

Email:
From: {from}
Subject: {subject}
Body: {body}

{format_instructions}`,
    inputVariables: ["from", "subject", "body"],
    partialVariables: {
        format_instructions: parser.getFormatInstructions()
    }
});

// Create chain
const llm = new LlamaCppLLM({ modelPath: './model.gguf' });
const chain = prompt.pipe(llm).pipe(parser);

// Use it
const result = await chain.invoke({
    from: "billing@company.com",
    subject: "Invoice #12345",
    body: "Payment due by March 15th"
});

console.log(result);
// {
//   category: "invoice",
//   confidence: 0.98,
//   reason: "Email contains invoice number and payment deadline",
//   actionRequired: true
// }

Example 2: Content Extraction with JSON Parser

import { JsonOutputParser } from './output-parsers/json-parser.js';
import { ChatPromptTemplate } from './prompts/chat-prompt-template.js';

const parser = new JsonOutputParser({
    schema: {
        title: 'string',
        summary: 'string',
        tags: 'object',  // Will be array
        author: 'string'
    }
});

const prompt = ChatPromptTemplate.fromMessages([
    ["system", "Extract article metadata. Respond with JSON."],
    ["human", "Article: {article}"]
]);

const chain = prompt.pipe(llm).pipe(parser);

const result = await chain.invoke({
    article: "Title: AI Revolution\nBy: John Doe\n\nAI is transforming..."
});

// {
//   title: "AI Revolution",
//   summary: "Article discusses AI's transformative impact",
//   tags: ["AI", "technology", "future"],
//   author: "John Doe"
// }

Example 3: List Extraction for Recommendations

import { ListOutputParser } from './output-parsers/list-parser.js';
import { PromptTemplate } from './prompts/prompt-template.js';

const parser = new ListOutputParser();

const prompt = new PromptTemplate({
    template: `Recommend 5 {category} for someone interested in {interest}.

{format_instructions}

List:`,
    inputVariables: ["category", "interest"],
    partialVariables: {
        format_instructions: parser.getFormatInstructions()
    }
});

const chain = prompt.pipe(llm).pipe(parser);

const books = await chain.invoke({
    category: "books",
    interest: "machine learning"
});

console.log(books);
// [
//   "Pattern Recognition and Machine Learning",
//   "Deep Learning by Goodfellow",
//   "Hands-On Machine Learning",
//   "The Hundred-Page Machine Learning Book",
//   "Machine Learning Yearning"
// ]

Example 4: Sentiment Analysis with Retry

import { JsonOutputParser } from './output-parsers/json-parser.js';
import { PromptTemplate } from './prompts/prompt-template.js';

const parser = new JsonOutputParser();

// If parsing fails, retry with clearer instructions
async function robustSentimentAnalysis(text) {
    const prompt = new PromptTemplate({
        template: `Analyze sentiment of: "{text}"

Respond with ONLY valid JSON:
{{"sentiment": "positive/negative/neutral", "score": 0.0-1.0}}`
    });

    const chain = prompt.pipe(llm).pipe(parser);

    try {
        return await chain.invoke({ text });
    } catch (error) {
        console.log('Parse failed, retrying with stricter prompt...');
        
        // Retry with more explicit prompt
        const strictPrompt = new PromptTemplate({
            template: `Analyze: "{text}"

IMPORTANT: Respond with ONLY this JSON structure, nothing else:
{{"sentiment": "positive", "score": 0.9}}

Your response:`
        });

        const retryChain = strictPrompt.pipe(llm).pipe(parser);
        return await retryChain.invoke({ text });
    }
}

Advanced Patterns

Pattern 1: Fallback Parsing

class FallbackOutputParser extends BaseOutputParser {
    constructor(parsers) {
        super();
        this.parsers = parsers;
    }

    async parse(text) {
        const errors = [];

        for (const parser of this.parsers) {
            try {
                return await parser.parse(text);
            } catch (error) {
                errors.push({ parser: parser.name, error });
            }
        }

        throw new OutputParserException(
            `All parsers failed. Errors: ${JSON.stringify(errors)}`,
            text
        );
    }
}

// Usage
const parser = new FallbackOutputParser([
    new JsonOutputParser(),      // Try JSON first
    new RegexOutputParser({...}), // Try regex second
    new StringOutputParser()      // Fallback to string
]);

Pattern 2: Transform After Parse

class TransformOutputParser extends BaseOutputParser {
    constructor(parser, transform) {
        super();
        this.parser = parser;
        this.transform = transform;
    }

    async parse(text) {
        const parsed = await this.parser.parse(text);
        return this.transform(parsed);
    }
}

// Usage: parse JSON then transform values
const parser = new TransformOutputParser(
    new JsonOutputParser(),
    (data) => ({
        ...data,
        confidence: parseFloat(data.confidence),
        timestamp: new Date().toISOString()
    })
);

Pattern 3: Conditional Parsing

class ConditionalOutputParser extends BaseOutputParser {
    constructor(condition, trueParser, falseParser) {
        super();
        this.condition = condition;
        this.trueParser = trueParser;
        this.falseParser = falseParser;
    }

    async parse(text) {
        const useTrue = this.condition(text);
        const parser = useTrue ? this.trueParser : this.falseParser;
        return await parser.parse(text);
    }
}

// Usage: different parsers based on content
const parser = new ConditionalOutputParser(
    (text) => text.includes('{'),  // Has JSON?
    new JsonOutputParser(),
    new ListOutputParser()
);

Pattern 4: Validated Output

class ValidatedOutputParser extends BaseOutputParser {
    constructor(parser, validator) {
        super();
        this.parser = parser;
        this.validator = validator;
    }

    async parse(text) {
        const parsed = await this.parser.parse(text);
        
        const isValid = this.validator(parsed);
        if (!isValid) {
            throw new OutputParserException(
                'Parsed output failed validation',
                text
            );
        }

        return parsed;
    }
}

// Usage: ensure confidence is in range
const parser = new ValidatedOutputParser(
    new JsonOutputParser(),
    (data) => data.confidence >= 0 && data.confidence <= 1
);

Integration with Full Chain

Complete Example: Sentiment Analysis API

import { PromptTemplate } from './prompts/prompt-template.js';
import { LlamaCppLLM } from './llm/llama-cpp-llm.js';
import { StructuredOutputParser } from './output-parsers/structured-parser.js';
import { ConsoleCallback } from './utils/callbacks.js';

// Define output structure
const parser = new StructuredOutputParser({
    responseSchemas: [
        {
            name: "sentiment",
            type: "string",
            enum: ["positive", "negative", "neutral"]
        },
        {
            name: "confidence",
            type: "number"
        },
        {
            name: "emotions",
            type: "array",
            description: "List of detected emotions"
        }
    ]
});

// Build prompt
const prompt = new PromptTemplate({
    template: `Analyze the sentiment of this text:

"{text}"

{format_instructions}`,
    inputVariables: ["text"],
    partialVariables: {
        format_instructions: parser.getFormatInstructions()
    }
});

// Create LLM
const llm = new LlamaCppLLM({
    modelPath: './model.gguf',
    temperature: 0.1  // Low temp for consistent classification
});

// Build chain with logging
const chain = prompt.pipe(llm).pipe(parser);

const logger = new ConsoleCallback();

// Analyze sentiment
async function analyzeSentiment(text) {
    try {
        const result = await chain.invoke(
            { text },
            { callbacks: [logger] }
        );

        return {
            success: true,
            data: result
        };
    } catch (error) {
        return {
            success: false,
            error: error.message,
            rawOutput: error.llmOutput
        };
    }
}

// Use it
const result = await analyzeSentiment("I absolutely love this product! It's amazing!");
console.log(result);
// {
//   success: true,
//   data: {
//     sentiment: "positive",
//     confidence: 0.95,
//     emotions: ["joy", "excitement", "satisfaction"]
//   }
// }

Error Handling

Pattern: Graceful Degradation

async function parseWithFallback(text, primaryParser, fallbackValue) {
    try {
        return await primaryParser.parse(text);
    } catch (error) {
        console.warn('Primary parser failed:', error.message);
        console.warn('Using fallback value:', fallbackValue);
        return fallbackValue;
    }
}

// Usage
const result = await parseWithFallback(
    llmOutput,
    new JsonOutputParser(),
    { error: true, message: "Failed to parse", raw: llmOutput }
);

Pattern: Retry with Fix Instructions

async function parseWithRetry(text, parser, llm, maxRetries = 2) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            return await parser.parse(text);
        } catch (error) {
            if (attempt === maxRetries - 1) throw error;

            // Ask LLM to fix the output
            const fixPrompt = `The following output is malformed:
${text}

Error: ${error.message}

Please provide the output in correct format:
${parser.getFormatInstructions()}`;

            text = await llm.invoke(fixPrompt);
        }
    }
}

Testing Parsers

Unit Tests

import { describe, it, expect } from 'your-test-framework';
import { JsonOutputParser } from './output-parsers/json-parser.js';

describe('JsonOutputParser', () => {
    it('should parse plain JSON', async () => {
        const parser = new JsonOutputParser();
        const result = await parser.parse('{"name": "Alice", "age": 30}');
        
        expect(result.name).toBe('Alice');
        expect(result.age).toBe(30);
    });

    it('should extract JSON from markdown', async () => {
        const parser = new JsonOutputParser();
        const text = '```json\n{"key": "value"}\n```';
        const result = await parser.parse(text);
        
        expect(result.key).toBe('value');
    });

    it('should validate against schema', async () => {
        const parser = new JsonOutputParser({
            schema: { name: 'string', age: 'number' }
        });

        await expect(
            parser.parse('{"name": "Bob", "age": "invalid"}')
        ).rejects.toThrow();
    });

    it('should throw on invalid JSON', async () => {
        const parser = new JsonOutputParser();
        await expect(parser.parse('not json')).rejects.toThrow();
    });
});

Best Practices

βœ… DO:

1. Include format instructions in prompts

const prompt = new PromptTemplate({
    template: `{task}

{format_instructions}`,
    partialVariables: {
        format_instructions: parser.getFormatInstructions()
    }
});

2. Use schema validation for complex outputs

const parser = new StructuredOutputParser({
    responseSchemas: [
        { name: "field1", type: "string", required: true },
        { name: "field2", type: "number", required: true }
    ]
});

3. Handle parsing errors gracefully

try {
    const parsed = await parser.parse(text);
} catch (error) {
    console.error('Parsing failed:', error.message);
    // Fallback or retry logic
}

4. Test parsers independently

// Test without LLM
const result = await parser.parse(mockLLMOutput);
expect(result).toMatchSchema();

5. Use low temperature for structured outputs

const llm = new LlamaCppLLM({
    temperature: 0.1  // More consistent formatting
});

❌ DON'T:

1. Don't assume perfect LLM formatting

// Bad
const data = JSON.parse(llmOutput);  // Will fail often

// Good
const data = await jsonParser.parse(llmOutput);  // Handles variations

2. Don't skip validation

// Bad
const result = await parser.parse(text);
// Use result.field without checking

// Good
const result = await parser.parse(text);
if (result.field && typeof result.field === 'string') {
    // Use result.field
}

3. Don't use parsers for simple text

// Bad
const parser = new JsonOutputParser();
const result = await parser.parse(simpleText);

// Good
const parser = new StringOutputParser();
const result = await parser.parse(simpleText);

Exercises

Practice using output parsers in real-world scenarios from simple to complex:

Exercise 21: Product Review Analyzer

Extract clean summaries and sentiment from product reviews using StringOutputParser.
Starter code: exercises/21-review-analyzer.js

Exercise 22: Contact Information Extractor

Parse structured contact details and skills from unstructured text using JSON and List parsers.
Starter code: exercises/22-contact-extractor.js

Exercise 23: Article Metadata Extractor

Extract complex metadata with schema validation using StructuredOutputParser.
Starter code: exercises/23-article-metadata.js

Exercise 24: Multi-Parser Content Pipeline

Build production-ready pipelines with multiple parsers, fallback strategies, and content routing.
Starter code: exercises/24-multi-parser-pipeline.js


Summary

You've built a complete output parsing system!

Key Takeaways

  1. BaseOutputParser: Foundation for all parsers
  2. StringOutputParser: Clean text output
  3. JsonOutputParser: Extract and validate JSON
  4. ListOutputParser: Parse lists/arrays
  5. RegexOutputParser: Pattern-based extraction
  6. StructuredOutputParser: Full schema validation

What You Built

A parsing system that:

  • βœ… Extracts structured data reliably
  • βœ… Validates output formats
  • βœ… Handles errors gracefully
  • βœ… Generates format instructions
  • βœ… Works in chains with prompts
  • βœ… Is testable in isolation

Next Steps

Now you can combine prompts + LLMs + parsers into complete chains.

➑️ Next: LLM Chains

Learn how to build complete prompt β†’ LLM β†’ parser pipelines.


Built with ❀️ for learners who want to understand AI frameworks deeply

← Previous: Prompts | Tutorial Index | Next: LLM Chains β†’