Spaces:

lenzcom
/

Email

Sleeping

App Files Files Community

lenzcom commited on Feb 12

Commit

e706de2

verified ·

1 Parent(s): b5167d8

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.env.example +1 -0
.gitignore +11 -0
CONTRIBUTING.md +118 -0
DOWNLOAD.md +24 -0
Dockerfile +28 -0
LICENSE.md +21 -0
PROMPTING.md +160 -0
README.md +504 -10
SUMMARY_COMPOSITION.md +26 -0
SUMMARY_FOUNDATION.md +46 -0
SUMMARY_FULL.md +56 -0
examples/01_intro/CODE.md +112 -0
examples/01_intro/CONCEPT.md +175 -0
examples/01_intro/intro.js +36 -0
examples/02_openai-intro/CODE.md +394 -0
examples/02_openai-intro/CONCEPT.md +950 -0
examples/02_openai-intro/openai-intro.js +205 -0
examples/03_translation/CODE.md +231 -0
examples/03_translation/CONCEPT.md +302 -0
examples/03_translation/translation.js +82 -0
examples/04_think/CODE.md +257 -0
examples/04_think/CONCEPT.md +368 -0
examples/04_think/think.js +49 -0
examples/05_batch/CODE.md +323 -0
examples/05_batch/CONCEPT.md +365 -0
examples/05_batch/batch.js +60 -0
examples/06_coding/CODE.md +380 -0
examples/06_coding/CONCEPT.md +400 -0
examples/06_coding/coding.js +47 -0
examples/07_simple-agent/CODE.md +368 -0
examples/07_simple-agent/CONCEPT.md +69 -0
examples/07_simple-agent/simple-agent.js +62 -0
examples/08_simple-agent-with-memory/CODE.md +247 -0
examples/08_simple-agent-with-memory/CONCEPT.md +249 -0
examples/08_simple-agent-with-memory/agent-memory.json +19 -0
examples/08_simple-agent-with-memory/memory-manager.js +137 -0
examples/08_simple-agent-with-memory/simple-agent-with-memory.js +93 -0
examples/09_react-agent/CODE.md +278 -0
examples/09_react-agent/CONCEPT.md +372 -0
examples/09_react-agent/react-agent.js +241 -0
examples/10_aot-agent/CODE.md +178 -0
examples/10_aot-agent/CONCEPT.md +265 -0
examples/10_aot-agent/aot-agent.js +416 -0
helper/json-parser.js +282 -0
helper/prompt-debugger.js +350 -0
logs/.gitkeep +0 -0
package-lock.json +0 -0
package.json +18 -0
run_classifier.js +349 -0
secrets.local.md +22 -0

.env.example ADDED Viewed

	@@ -0,0 +1 @@


1	+ OPENAI_API_KEY=your_api_key_here

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+models
+node_modules
+.idea
+.env
+internal
+ui
+*.txt
+node-llama-docs
+frontend*
+VIDEO_SCRIPT.md

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,118 @@

+# Contributing Guidelines
+Thank you for considering contributing to AI Agents from Scratch!
+## Project Philosophy
+This repository teaches AI agent fundamentals by building from scratch. Every contribution should support this learning mission.
+**Core Principles:**
+- **Clarity over cleverness** - Code should be easy to understand
+- **Fundamentals first** - No black boxes or magic
+- **Progressive learning** - Each example builds on the previous
+- **Local-first** - No API dependencies
+## Types of Contributions
+### Bug Reports
+Found something broken? Open an issue with:
+- Which example (`intro/`, `react-agent/`, etc.)
+- What you expected vs. what happened
+- Your environment (Node version, OS, model used)
+- Steps to reproduce
+### Documentation Improvements
+- Typos and grammar fixes
+- Clearer explanations
+- Better code comments
+- Additional examples in documentation
+- Diagrams and visualizations
+### New Examples
+Want to add a new agent pattern? Great! Please:
+1. **Open an issue first** - let's discuss if it fits
+2. Follow the existing structure:
+- `pattern-name/pattern-name.js` - Working code
+- `pattern-name/CODE.md` - Detailed code walkthrough
+- `pattern-name/CONCEPT.md` - Why it matters, use cases
+3. Keep it simple and well-commented
+4. Test thoroughly with at least one model
+### Code Improvements
+- Performance optimizations (with benchmarks)
+- Better error handling
+- Clearer variable names
+- More helpful console output
+## What We're Not Looking For
+- Framework integrations (LangChain, etc.) - this repo teaches what they do
+- Cloud API examples - keep it local
+- Production features (monitoring, scaling) - this is educational
+- Complex abstractions - keep it beginner-friendly
+## Contribution Process
+1. **Fork** the repository
+2. **Create a branch**: `git checkout -b fix/issue-description`
+3. **Make changes** and test thoroughly
+4. **Commit** with clear messages: `git commit -m "Fix: clarify ReAct loop explanation"`
+5. **Push**: `git push origin fix/issue-description`
+6. **Open a Pull Request** with:
+- Clear title
+- Description of what changed and why
+- Which issue it addresses (if any)
+## Code Standards
+- Use clear, descriptive variable names
+- Add comments explaining *why*, not just *what*
+- Follow existing code style (no linter, just match the patterns)
+- Keep examples self-contained (one file when possible)
+- Test with Qwen or Llama models before submitting
+## Documentation Standards
+- Use clear, simple language
+- Explain concepts before code
+- Include diagrams where helpful (ASCII art is fine!)
+- Provide real-world use cases
+- Link to related examples
+## Example Structure
+```
+new-pattern/
+├── new-pattern.js # The working code
+├── CODE.md # Line-by-line walkthrough
+└── CONCEPT.md # High-level explanation
+```
+**CODE.md should include:**
+- Prerequisites
+- Step-by-step code breakdown
+- How to run it
+- Expected output
+**CONCEPT.md should include:**
+- What problem it solves
+- Why this pattern matters
+- Real-world applications
+- Simple diagrams
+## Getting Help
+- Not sure if your idea fits? **Open an issue to discuss**
+- Stuck on implementation? **Ask in the issue**
+- Want to pair on something? **Reach out!**
+## License
+By contributing, you agree that your contributions will be licensed under the same license as the project (MIT).
+## Recognition
+All contributors will be recognized in the README. Thank you for helping others learn!
+---
+**Questions?** Open an issue or reach out. Happy to help guide your contribution!

DOWNLOAD.md ADDED Viewed

	@@ -0,0 +1,24 @@

+Download the models used in this repository
+You can adjust the quantization level to balance model precision and file size:
+Use `:Q8_0` for higher precision and better output quality, but note that it requires more memory and storage.
+Use `:Q6_K` for a good balance between size and accuracy (recommended default).
+Use `:Q5_K_S` for a smaller model that loads faster and uses less memory, but with slightly lower precision.
+```
+npx --no node-llama-cpp pull --dir ./models hf:Qwen/Qwen3-1.7B-GGUF:Q8_0 --filename Qwen3-1.7B-Q8_0.gguf
+```
+```
+npx --no node-llama-cpp pull --dir ./models hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
+```
+```
+npx --no node-llama-cpp pull --dir ./models hf:unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q6_K --filename DeepSeek-R1-0528-Qwen3-8B-Q6_K.gguf
+```
+```
+npx --no node-llama-cpp pull --dir ./models hf:giladgd/Apertus-8B-Instruct-2509-GGUF:Q6_K
+```

Dockerfile ADDED Viewed

	@@ -0,0 +1,28 @@

+FROM node:18-slim
+# Install dependencies for building node-llama-cpp
+RUN apt-get update && apt-get install -y python3 make g++ curl
+WORKDIR /app
+# Copy package files
+COPY package*.json ./
+# Install npm dependencies
+RUN npm install
+# Copy source code
+COPY . .
+# Create models directory
+RUN mkdir -p models
+# Download the model during build (so it's baked into the image)
+# Using direct download URL for speed if possible, or use node-llama-cpp pull
+RUN npx --no node-llama-cpp pull --dir ./models hf:Qwen/Qwen3-1.7B-GGUF:Q8_0 --filename Qwen3-1.7B-Q8_0.gguf
+# Expose the port HF expects
+EXPOSE 7860
+# Start the server
+CMD ["node", "server.js"]

LICENSE.md ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 [Your Name]
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

PROMPTING.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# Prompt Engineering
+Prompt engineering offers the quickest and most straightforward method for shaping how an agent behaves—defining
+its personality, function, and choices (such as when it should utilize tools). Agents operate using two prompt categories:
+system-level and user-level prompts.
+User-level prompts consist of the messages individuals enter during conversation. These vary with each interaction
+and remain outside the developer's control.
+System-level prompts contain instructions established by developers that remain constant throughout the dialogue.
+These define the agent's tone, capabilities, limitations, and guidelines for tool usage.
+Look into the system prompts from Anthropic
+https://docs.claude.com/en/release-notes/system-prompts#september-29-2025
+## Prompt Design
+When creating prompts for agents, you need to achieve two things:
+1. Make the agent solve problems well
+- Help it complete complex tasks correctly
+- Enable clear, logical thinking
+- Reduce mistakes
+2. Keep the agent's personality consistent
+- Define who the agent is and how it speaks
+- Match your brand's voice
+- Respond with appropriate emotion for each situation
+Both goals matter equally. An accurate answer delivered rudely hurts the user experience. A friendly answer that
+doesn't actually help is useless.
+## Prompt Strategies
+### Agents Role
+Giving the LLM a specific role improves its responses - it naturally adopts that role's vocabulary and expertise.
+Examples:
+"You are a pediatrician" → Uses medical terms, discusses child development, recommends age-appropriate treatments
+"You are a chef" → Explains cooking techniques, suggests ingredient substitutions, discusses flavor profiles
+"You are a high school math teacher" → Breaks down problems step-by-step, uses simple language, provides practice examples
+"You are a startup founder" → Focuses on growth, uses business metrics, thinks about scalability
+Make roles specific:
+Instead of: "You are a writer"
+Better: "You are a tech blogger who simplifies complex AI concepts for beginners"
+Roles work best for specialized questions and should be set in system prompts.
+### Be Specific, Not Vague
+LLMs interpret instructions literally. Vague prompts produce random results. Specific prompts produce consistent outputs.
+Vague vs Specific Examples:
+❌ Vague: "Write something about dogs"
+✅ Specific: "Write a 3-paragraph guide on training a puppy to sit"
+❌ Vague: "Make it better"
+✅ Specific: "Fix grammar errors and shorten to under 100 words"
+❌ Vague: "Be professional"
+✅ Specific: "Use formal language, avoid contractions, address the reader as 'you'"
+❌ Vague: "Analyze this data"
+✅ Specific: "Find the top 3 trends and explain what caused each one"
+Why it matters: The LLM has thousands of ways to interpret vague instructions. It will guess what you want—and often
+guess wrong. Clear instructions eliminate guesswork and give you control over the output.
+Rule of thumb: If a human assistant would need to ask clarifying questions, your prompt is too vague.
+### Structuring LLM Inputs with JSON
+Using JSON to structure your input helps LLMs understand tasks more clearly and makes integration easier. Instead of
+sending a blob of text, break your request into labeled parts like task, input, constraints, and output_format.
+Benefits
+- Clarity: JSON keys show the model what each part means.
+- Reliability: Easier to parse and validate responses.
+- Consistency: Reduces random or narrative answers.
+- Integration: Works well with APIs and schemas.
+Best Practices
+- Keep it simple and shallow — avoid deep nesting.
+- Use descriptive keys ("task", "context", "constraints").
+- Tell the model the exact output format (e.g., “Respond with valid JSON only”).
+- Optionally define a JSON Schema to enforce structure.
+- Always validate the response in your code.
+Example
+````
+{
+  "task": "summarize",
+  "input_text": " - Article text here. - ",
+  "constraints": {
+    "max_words": 100,
+    "audience": "non-technical"
+  },
+  "output_format": {
+    "type": "JSON",
+    "schema": {
+      "summary": "string",
+      "key_points": ["string"]
+    }
+  }
+}
+````
+This structured format helps the model separate what to do, what data to use, and how to reply, resulting in
+more consistent, machine-readable outputs.
+### Few-Shot Prompting
+Few-shot prompting means giving the LLM a few examples of what you want before asking it to do a new task.
+It’s like showing a student two or three solved problems so they understand the pattern.
+Example
+```
+Example 1:
+Feedback: "The room was clean and quiet."
+Category: Positive
+Example 2:
+Feedback: "The staff were rude and unhelpful."
+Category: Negative
+Example 3:
+Feedback: "Breakfast was okay, but the coffee was cold."
+Category: Neutral
+Now categorize this:
+Feedback: "The view from the balcony was amazing!"
+Category:
+```
+The model learns from the examples and continues in the same style — here, it would answer:
+"Good morning"
+Few-shot prompts are useful when you want consistent tone, format, or logic without retraining the model.
+### Chain of Thought
+Chain of thought means asking the LLM to think step by step instead of jumping straight to the answer.
+It helps the model reason better, especially for logic, math, or multi-step problems.
+Example
+Question: If 3 apples cost $6, how much do 5 apples cost?
+Let's think step by step.
+Model reasoning:
+3 apples → $6 → each apple costs $2.
+5 apples × $2 = $10.
+Answer: $10
+By encouraging step-by-step thinking, you help the model make fewer mistakes and explain its reasoning clearly.

README.md CHANGED Viewed

@@ -1,10 +1,504 @@
----
-title: Email
-emoji: 🦀
-colorFrom: yellow
-colorTo: gray
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+> **Read the full interactive version:**
+> This repository is part of **AI Agents From Scratch** - a hands-on learning series where we build AI agents *step by step*, explain every design decision, and visualize what’s happening under the hood.
+>
+> 👉 **https://agentsfromscratch.com**
+>
+> If you prefer **long-form explanations, diagrams, and conceptual deep dives**, start there - then come back here to explore the code.
+# AI Agents From Scratch
+Learn to build AI agents locally without frameworks. Understand what happens under the hood before using production frameworks.
+## Purpose
+This repository teaches you to build AI agents from first principles using **local LLMs** and **node-llama-cpp**. By working through these examples, you'll understand:
+- How LLMs work at a fundamental level
+- What agents really are (LLM + tools + patterns)
+- How different agent architectures function
+- Why frameworks make certain design choices
+**Philosophy**: Learn by building. Understand deeply, then use frameworks wisely.
+## Related Projects
+### [AI Product from Scratch](https://github.com/pguso/ai-product-from-scratch)
+[![TypeScript](https://img.shields.io/badge/TypeScript-007ACC?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
+[![React](https://img.shields.io/badge/React-20232A?logo=react&logoColor=61DAFB)](https://reactjs.org/)
+[![Node.js](https://img.shields.io/badge/Node.js-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
+Learn AI product development fundamentals with local LLMs. Covers prompt engineering, structured output, multi-step reasoning, API design, and frontend integration through 10 comprehensive lessons with visual diagrams.
+### [AI Agents from Scratch in Python](https://github.com/pguso/agents-from-scratch)
+![Python](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=white)
+## Next Phase: Build LangChain & LangGraph Concepts From Scratch
+> After mastering the fundamentals, the next stage of this project walks you through **re-implementing the core parts of LangChain and LangGraph** in plain JavaScript using local models.
+> This is **not** about building a new framework, it’s about understanding *how frameworks work*.
+## Phase 1: Agent Fundamentals - From LLMs to ReAct
+### Prerequisites
+- Node.js 18+
+- At least 8GB RAM (16GB recommended)
+- Download models and place in `./models/` folder, details in [DOWNLOAD.md](DOWNLOAD.md)
+### Installation
+```bash
+npm install
+```
+### Run Examples
+```bash
+node intro/intro.js
+node simple-agent/simple-agent.js
+node react-agent/react-agent.js
+```
+## Learning Path
+Follow these examples in order to build understanding progressively:
+### 1. **Introduction** - Basic LLM Interaction
+`intro/` | [Code](examples/01_intro/intro.js) | [Code Explanation](examples/01_intro/CODE.md) | [Concepts](examples/01_intro/CONCEPT.md)
+**What you'll learn:**
+- Loading and running a local LLM
+- Basic prompt/response cycle
+**Key concepts**: Model loading, context, inference pipeline, token generation
+---
+### 2. (Optional) **OpenAI Intro** - Using Proprietary Models
+`openai-intro/` | [Code](examples/02_openai-intro/openai-intro.js) | [Code Explanation](examples/02_openai-intro/CODE.md) | [Concepts](examples/02_openai-intro/CONCEPT.md)
+**What you'll learn:**
+- How to call hosted LLMs (like GPT-4)
+- Temperature Control
+- Token Usage
+**Key concepts**: Inference endpoints, network latency, cost vs control, data privacy, vendor dependence
+---
+### 3. **Translation** - System Prompts & Specialization
+`translation/` | [Code](examples/03_translation/translation.js) | [Code Explanation](examples/03_translation/CODE.md) | [Concepts](examples/03_translation/CONCEPT.md)
+**What you'll learn:**
+- Using system prompts to specialize agents
+- Output format control
+- Role-based behavior
+- Chat wrappers for different models
+**Key concepts**: System prompts, agent specialization, behavioral constraints, prompt engineering
+---
+### 4. **Think** - Reasoning & Problem Solving
+`think/` | [Code](examples/04_think/think.js) | [Code Explanation](examples/04_think/CODE.md) | [Concepts](examples/04_think/CONCEPT.md)
+**What you'll learn:**
+- Configuring LLMs for logical reasoning
+- Complex quantitative problems
+- Limitations of pure LLM reasoning
+- When to use external tools
+**Key concepts**: Reasoning agents, problem decomposition, cognitive tasks, reasoning limitations
+---
+### 5. **Batch** - Parallel Processing
+`batch/` | [Code](examples/05_batch/batch.js) | [Code Explanation](examples/05_batch/CODE.md) | [Concepts](examples/05_batch/CONCEPT.md)
+**What you'll learn:**
+- Processing multiple requests concurrently
+- Context sequences for parallelism
+- GPU batch processing
+- Performance optimization
+**Key concepts**: Parallel execution, sequences, batch size, throughput optimization
+---
+### 6. **Coding** - Streaming & Response Control
+`coding/` | [Code](examples/06_coding/coding.js) | [Code Explanation](examples/06_coding/CODE.md) | [Concepts](examples/06_coding/CONCEPT.md)
+**What you'll learn:**
+- Real-time streaming responses
+- Token limits and budget management
+- Progressive output display
+- User experience optimization
+**Key concepts**: Streaming, token-by-token generation, response control, real-time feedback
+---
+### 7. **Simple Agent** - Function Calling (Tools)
+`simple-agent/` | [Code](examples/07_simple-agent/simple-agent.js) | [Code Explanation](examples/07_simple-agent/CODE.md) | [Concepts](examples/07_simple-agent/CONCEPT.md)
+**What you'll learn:**
+- Function calling / tool use fundamentals
+- Defining tools the LLM can use
+- JSON Schema for parameters
+- How LLMs decide when to use tools
+**Key concepts**: Function calling, tool definitions, agent decision making, action-taking
+**This is where text generation becomes agency!**
+---
+### 8. **Simple Agent with Memory** - Persistent State
+`simple-agent-with-memory/` | [Code](examples/08_simple-agent-with-memory/simple-agent-with-memory.js) | [Code Explanation](examples/08_simple-agent-with-memory/CODE.md) | [Concepts](examples/08_simple-agent-with-memory/CONCEPT.md)
+**What you'll learn:**
+- Persisting information across sessions
+- Long-term memory management
+- Facts and preferences storage
+- Memory retrieval strategies
+**Key concepts**: Persistent memory, state management, memory systems, context augmentation
+---
+### 9. **ReAct Agent** - Reasoning + Acting
+`react-agent/` | [Code](examples/09_react-agent/react-agent.js) | [Code Explanation](examples/09_react-agent/CODE.md) | [Concepts](examples/09_react-agent/CONCEPT.md)
+**What you'll learn:**
+- ReAct pattern (Reason → Act → Observe)
+- Iterative problem solving
+- Step-by-step tool use
+- Self-correction loops
+**Key concepts**: ReAct pattern, iterative reasoning, observation-action cycles, multi-step agents
+**This is the foundation of modern agent frameworks!**
+---
+### 10. **AoT Agent** - Atom of Thought Planning
+`aot-agent/` | [Code](examples/10_aot-agent/aot-agent.js) | [Code Explanation](examples/10_aot-agent/CODE.md) | [Concepts](examples/10_aot-agent/CONCEPT.md)
+**What you'll learn:**
+- Atom of Thought methodology
+- Atomic planning for multi-step computations
+- Dependency management between operations
+- Structured JSON output for reasoning plans
+- Deterministic execution of plans
+**Key concepts**: AoT planning, atomic operations, dependency resolution, plan validation, structured reasoning
+---
+## Documentation Structure
+Each example folder contains:
+- **`<name>.js`** - The working code example
+- **`CODE.md`** - Step-by-step code explanation
+- Line-by-line breakdowns
+- What each part does
+- How it works
+- **`CONCEPT.md`** - High-level concepts
+- Why it matters for agents
+- Architectural patterns
+- Real-world applications
+- Simple diagrams
+## Core Concepts
+### What is an AI Agent?
+```
+AI Agent = LLM + System Prompt + Tools + Memory + Reasoning Pattern
+           ─┬─   ──────┬──────   ──┬──   ──┬───   ────────┬────────
+            │          │           │       │              │
+         Brain      Identity    Hands   State         Strategy
+```
+### Evolution of Capabilities
+```
+1. intro          → Basic LLM usage
+2. translation    → Specialized behavior (system prompts)
+3. think          → Reasoning ability
+4. batch          → Parallel processing
+5. coding         → Streaming & control
+6. simple-agent   → Tool use (function calling)
+7. memory-agent   → Persistent state
+8. react-agent    → Strategic reasoning + tool use
+```
+### Architecture Patterns
+**Simple Agent (Steps 1-5)**
+```
+User → LLM → Response
+```
+**Tool-Using Agent (Step 6)**
+```
+User → LLM ⟷ Tools → Response
+```
+**Memory Agent (Step 7)**
+```
+User → LLM ⟷ Tools → Response
+       ↕
+     Memory
+```
+**ReAct Agent (Step 8)**
+```
+User → LLM → Think → Act → Observe
+       ↑      ↓      ↓      ↓
+       └──────┴──────┴──────┘
+           Iterate until solved
+```
+## ️ Helper Utilities
+### PromptDebugger
+`helper/prompt-debugger.js`
+Utility for debugging prompts sent to the LLM. Shows exactly what the model sees, including:
+- System prompts
+- Function definitions
+- Conversation history
+- Context state
+Usage example in `simple-agent/simple-agent.js`
+## ️ Project Structure - Fundamentals
+```
+ai-agents/
+├── README.md                          ← You are here
+├─ examples/
+├── 01_intro/
+│   ├── intro.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 02_openai-intro/
+│   ├── openai-intro.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 03_translation/
+│   ├── translation.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 04_think/
+│   ├── think.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 05_batch/
+│   ├── batch.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 06_coding/
+│   ├── coding.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 07_simple-agent/
+│   ├── simple-agent.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 08_simple-agent-with-memory/
+│   ├── simple-agent-with-memory.js
+│   ├── memory-manager.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── 09_react-agent/
+│   ├── react-agent.js
+│   ├── CODE.md
+│   └── CONCEPT.md
+├── helper/
+│   └── prompt-debugger.js
+├── models/                             ← Place your GGUF models here
+└── logs/                               ← Debug outputs
+```
+## Phase 2: Building a Production Framework (Tutorial)
+After mastering the fundamentals above, **Phase 2** takes you from scratch examples to production-grade framework design. You'll rebuild core concepts from **LangChain** and **LangGraph** to understand how real frameworks work internally.
+### What You'll Build
+A lightweight but complete agent framework with:
+- **Runnable Interface**, The composability pattern that powers everything
+- **Message System**, Typed conversation structures (Human, AI, System, Tool)
+- **Chains**, Composing multiple operations into pipelines
+- **Memory**, Persistent state across conversations
+- **Tools**, Function calling and external integrations
+- **Agents**, Decision-making loops (ReAct, Tool-calling)
+- **Graphs**, State machines for complex workflows (LangGraph concepts)
+### Learning Approach
+**Tutorial-first**: Step-by-step lessons with exercises
+**Implementation-driven**: Build each component yourself
+**Framework-compatible**: Learn patterns used in LangChain.js
+### Structure Overview
+```
+tutorial/
+├── 01-foundation/              # 1. Core Abstractions
+│   ├── 01-runnable/
+│   │   ├── lesson.md           # Why Runnable matters
+│   │   ├── exercises/          # Hands-on practice
+│   │   └── solutions/          # Reference implementations
+│   ├── 02-messages/            # Structuring conversations
+│   ├── 03-llm-wrapper/         # Wrapping node-llama-cpp
+│   └── 04-context/             # Configuration & callbacks
+│
+├── 02-composition/             # 2. Building Chains
+│   ├── 01-prompts/             # Template system
+│   ├── 02-parsers/             # Structured outputs
+│   ├── 03-llm-chain/           # Your first chain
+│   ├── 04-piping/              # Composition patterns
+│   └── 05-memory/              # Conversation state
+│
+├── 03-agency/                  # 3. Tools & Agents
+│   ├── 01-tools/               # Function definitions
+│   ├── 02-tool-executor/       # Safe execution
+│   ├── 03-simple-agent/        # Basic agent loop
+│   ├── 04-react-agent/         # Reasoning + Acting
+│   └── 05-structured-agent/    # JSON mode
+│
+└── 04-graphs/                  # 4. State Machines
+    ├── 01-state-basics/        # Nodes & edges
+    ├── 02-channels/            # State management
+    ├── 03-conditional-edges/   # Dynamic routing
+    ├── 04-executor/            # Running workflows
+    ├── 05-checkpointing/       # Persistence
+    └── 06-agent-graph/         # Agents as graphs
+src/
+├── core/                       # Runnable, Messages, Context
+├── llm/                        # LlamaCppLLM wrapper
+├── prompts/                    # Template system
+├── chains/                     # LLMChain, SequentialChain
+├── tools/                      # BaseTool, built-in tools
+├── agents/                     # AgentExecutor, ReActAgent
+├── memory/                     # BufferMemory, WindowMemory
+└── graph/                      # StateGraph, CompiledGraph
+```
+### Why This Matters
+**Understanding beats using**: When you know how frameworks work internally, you can:
+- Debug issues faster
+- Customize behavior confidently
+- Make architectural decisions wisely
+- Build your own extensions
+- Read framework source code fluently
+**Learn once, use everywhere**: The patterns you'll learn (Runnable, composition, state machines) apply to:
+- LangChain.js - You'll understand their abstractions
+- LangGraph.js - You'll grasp state management
+- Any agent framework - Same core concepts
+- Your own projects - Build custom solutions
+### Getting Started with Phase 2
+After completing the fundamentals (intro → react-agent), start the tutorial:
+[Overview](tutorial/README.md)
+```bash
+# Start with the foundation
+cd tutorial/01-foundation/01-runnable
+lesson.md                    # Read the lesson
+node exercises/01-*.js           # Complete exercises
+node solutions/01-*-solution.js  # Check your work
+```
+Each lesson includes:
+- **Conceptual explanation**, Why it matters
+- **Code walkthrough**, How to build it
+- **Exercises**, Practice implementing
+- **Solutions**, Reference code
+- **Real-world examples**, Practical usage
+**Time commitment**: ~8 weeks, 3-5 hours/week
+### What You'll Achieve
+By the end, you'll have:
+1. Built a working agent framework from scratch
+2. Understood how LangChain/LangGraph work internally
+3. Mastered composability patterns
+4. Created reusable components (tools, chains, agents)
+5. Implemented state machines for complex workflows
+6. Gained confidence to use or extend any framework
+**Then**: Use LangChain.js in production, knowing exactly what happens under the hood.
+---
+## Key Takeaways
+### After Phase 1 (Fundamentals), you'll understand:
+1. **LLMs are stateless**: Context must be managed explicitly
+2. **System prompts shape behavior**: Same model, different roles
+3. **Function calling enables agency**: Tools transform text generators into agents
+4. **Memory is essential**: Agents need to remember across sessions
+5. **Reasoning patterns matter**: ReAct > simple prompting for complex tasks
+6. **Performance matters**: Parallel processing, streaming, token limits
+7. **Debugging is crucial**: See exactly what the model receives
+### After Phase 2 (Framework Tutorial), you'll master:
+1. **The Runnable pattern**: Why everything in frameworks uses one interface
+2. **Composition over configuration**: Building complex systems from simple parts
+3. **Message-driven architecture**: How frameworks structure conversations
+4. **Chain abstraction**: Connecting prompts, LLMs, and parsers seamlessly
+5. **Tool orchestration**: Safe execution with timeouts and error handling
+6. **Agent execution loops**: The mechanics of decision-making agents
+7. **State machines**: Managing complex workflows with graphs
+8. **Production patterns**: Error handling, retries, streaming, and debugging
+### What frameworks give you:
+Now that you understand the fundamentals, frameworks like LangChain, CrewAI, or AutoGPT provide:
+- Pre-built reasoning patterns and agent templates
+- Extensive tool libraries and integrations
+- Production-ready error handling and retries
+- Multi-agent orchestration
+- Observability and monitoring
+- Community extensions and plugins
+**You'll use them better because you know what they're doing under the hood.**
+## Additional Resources
+- **node-llama-cpp**: [GitHub](https://github.com/withcatai/node-llama-cpp)
+- **Model Hub**: [Hugging Face](https://huggingface.co/models?library=gguf)
+- **GGUF Format**: Quantized models for local inference
+## Contributing
+This is a learning resource. Feel free to:
+- Suggest improvements to documentation
+- Add more example patterns
+- Fix bugs or unclear explanations
+- Share what you built!
+## License
+Educational resource - use and modify as needed for learning.
+---
+**Built with ❤️ for people who want to truly understand AI agents**
+Start with `intro/` and work your way through. Each example builds on the previous one. Read both CODE.md and CONCEPT.md for full understanding.
+Happy learning!

SUMMARY_COMPOSITION.md ADDED Viewed

	@@ -0,0 +1,26 @@

+# Tổng hợp kiến thức: AI Agents from Scratch - Phần Composition
+Tài liệu này tổng hợp các khái niệm về cách kết hợp các thành phần (Composition) để tạo nên hệ thống AI mạnh mẽ hơn.
+## 1. Prompts (Mẫu câu lệnh)
+Thay vì hardcode các chuỗi văn bản, chúng ta sử dụng các **Template** để quản lý đầu vào cho LLM.
+*   **PromptTemplate**: Mẫu cơ bản với các biến giữ chỗ (placeholders). Giúp tách biệt logic code khỏi nội dung văn bản.
+*   **ChatPromptTemplate**: Mẫu chuyên dụng cho các model chat (như GPT-4, Llama 3).
+    *   Cấu trúc hóa hội thoại thành danh sách tin nhắn: `System`, `Human`, `AI`.
+    *   Hỗ trợ tiêm biến vào từng loại tin nhắn.
+    *   Là tiêu chuẩn cho các ứng dụng AI hiện đại.
+*   **PipelinePromptTemplate**: Cho phép ghép nối nhiều template nhỏ thành một template lớn, giúp quản lý các prompt phức tạp.
+## 2. Output Parsers (Bộ phân tích đầu ra)
+Chuyển đổi văn bản thô từ LLM thành cấu trúc dữ liệu mà ứng dụng có thể sử dụng (JSON, Object, Array).
+*   **Vấn đề:** Output của LLM thường không nhất quán và khó parse bằng Regex.
+*   **StructuredOutputParser**: Công cụ mạnh mẽ nhất.
+    *   **Schema Definition**: Định nghĩa rõ ràng các trường (fields), kiểu dữ liệu (type), mô tả (description) và giá trị cho phép (enum).
+    *   **Format Instructions**: Parser tự động sinh ra hướng dẫn định dạng (ví dụ: "Respond in JSON format...") để chèn vào prompt.
+    *   **Validation**: Tự động kiểm tra kết quả trả về có đúng schema hay không.
+*   **Lợi ích:** Đảm bảo tính ổn định (reliability) cho hệ thống, biến AI từ một "chatbot" thành một "công cụ xử lý dữ liệu".
+---
+*Tài liệu được tạo tự động bởi Antigravity IDE sau quá trình tự học và phân tích code.*

SUMMARY_FOUNDATION.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# Tổng hợp kiến thức: AI Agents from Scratch - Phần Foundation
+Tài liệu này tổng hợp các khái niệm cốt lõi đã học được từ 4 bài học đầu tiên trong series "AI Agents from Scratch".
+## 1. Runnable (Đơn vị thực thi)
+**Runnable** là "viên gạch LEGO" của framework, chuẩn hóa giao diện cho mọi thành phần (LLM, Parser, Tool).
+*   **Hợp đồng (Contract):** Mọi Runnable đều phải triển khai phương thức `_call(input, config)`.
+*   **3 Phương thức thực thi:**
+    1.  `invoke(input)`: Chạy đơn lẻ (1 input -> 1 output).
+    2.  `stream(input)`: Trả về kết quả dạng dòng (chunks) theo thời gian thực.
+    3.  `batch([inputs])`: Xử lý song song một danh sách input để tăng hiệu suất.
+*   **Lợi ích:** Cho phép nối các thành phần khác nhau thành một chuỗi (chain) dễ dàng bằng `.pipe()`.
+## 2. Messages (Tin nhắn & Cấu trúc dữ liệu)
+Thay vì sử dụng chuỗi văn bản thuần túy, hội thoại được cấu trúc hóa thành các đối tượng để dễ quản lý và phân loại.
+*   **Các loại tin nhắn:**
+    *   `SystemMessage`: Chỉ thị hệ thống, thiết lập hành vi/nhân cách cho AI.
+    *   `HumanMessage`: Tin nhắn từ người dùng.
+    *   `AIMessage`: Phản hồi từ AI.
+    *   `ToolMessage`: Kết quả trả về từ việc gọi công cụ (function calling).
+*   **Quản lý hội thoại:** Cần có cơ chế (như `ConversationHistory`) để lưu trữ, giới hạn độ dài (sliding window) và lọc tin nhắn theo loại.
+## 3. LLM Wrapper (Bọc mô hình ngôn ngữ)
+**LLM Wrapper** biến đổi một thư viện LLM thô (như `node-llama-cpp`) thành một **Runnable**.
+*   **Vai trò:** Đóng vai trò như một Adapter (bộ chuyển đổi).
+*   **Chức năng:**
+    *   Chuyển đổi input (chuỗi hoặc danh sách Message) thành format mà model hiểu được.
+    *   Xử lý việc gọi model (generate/stream).
+    *   Trả về kết quả dưới dạng `AIMessage`.
+*   **Kết quả:** Giúp thay thế model dễ dàng mà không ảnh hưởng đến phần còn lại của hệ thống.
+## 4. Context & Configuration (Ngữ cảnh & Cấu hình)
+**RunnableConfig** là cơ chế truyền thông tin xuyên suốt chuỗi xử lý mà không làm rối mã nguồn.
+*   **Vấn đề giải quyết:** Tránh việc phải truyền tham số cấu hình (như `userId`, `debug flag`) qua từng hàm thủ công.
+*   **Thành phần của Config:**
+    *   `callbacks`: Hệ thống hook để theo dõi (log, metrics) tại các điểm bắt đầu/kết thúc/lỗi.
+    *   `metadata`: Dữ liệu ngữ cảnh (User ID, Session ID).
+    *   `configurable`: Các tham số thay đổi lúc chạy (Runtime overrides), ví dụ: thay đổi `temperature` của LLM cho từng request cụ thể.
+*   **Ứng dụng:** Rất hữu ích cho A/B testing, logging tập trung và quản lý đa người dùng.
+---
+*Tài liệu được tạo tự động bởi Antigravity IDE sau quá trình tự học và phân tích code.*

SUMMARY_FULL.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# Tổng hợp kiến thức: AI Agents from Scratch
+Tài liệu này tổng hợp toàn bộ các khái niệm cốt lõi và mẫu thiết kế đã học được từ repository "AI Agents from Scratch".
+## PHẦN 1: FOUNDATION (NỀN TẢNG)
+### 1. Runnable (Đơn vị thực thi)
+**Runnable** là "viên gạch LEGO" của framework, chuẩn hóa giao diện cho mọi thành phần.
+*   **Hợp đồng (Contract):** Triển khai phương thức `_call(input, config)`.
+*   **3 Chế độ:** `invoke` (đơn), `stream` (dòng), `batch` (song song).
+*   **Composition:** Dễ dàng nối chuỗi bằng `.pipe()`.
+### 2. Messages (Cấu trúc hội thoại)
+Sử dụng các lớp đối tượng thay vì chuỗi trần.
+*   **SystemMessage**: Chỉ thị, nhân cách.
+*   **HumanMessage**: Input người dùng.
+*   **AIMessage**: Output mô hình.
+*   **ToolMessage**: Kết quả gọi hàm.
+### 3. LLM Wrapper
+Đóng gói model thô (node-llama-cpp) thành một **Runnable** để đồng bộ hóa giao diện và dễ dàng thay thế.
+### 4. Context & Configuration
+Truyền `RunnableConfig` xuyên suốt pipeline.
+*   `callbacks`: Logging, metrics, side-effects.
+*   `metadata`: Context người dùng/phiên.
+*   `configurable`: Runtime overrides (ví dụ: thay đổi temperature động).
+---
+## PHẦN 2: COMPOSITION (KẾT HỢP)
+### 1. Prompts
+Quản lý đầu vào LLM bằng Templates.
+*   **PromptTemplate**: Tách logic khỏi văn bản, hỗ trợ biến số.
+*   **ChatPromptTemplate**: Cấu trúc hóa hội thoại đa lượt (Multi-turn conversation).
+### 2. Output Parsers
+Chuyển đổi văn bản thô từ LLM thành dữ liệu có cấu trúc.
+*   **StructuredOutputParser**: Định nghĩa Schema (JSON), tự động sinh hướng dẫn định dạng (`format_instructions`) và validate kết quả. Giải quyết vấn đề output không nhất quán của LLM.
+---
+## PHẦN 3: PROJECT PATTERNS (MẪU THIẾT KẾ THỰC TẾ)
+Từ dự án **Smart Email Classifier**, rút ra mẫu kiến trúc tham khảo cho các tác vụ phân loại/xử lý văn bản:
+1.  **Separation of Concerns (Phân tách mối quan tâm):**
+    *   `ParserRunnable`: Chỉ lo việc làm sạch và chuẩn hóa dữ liệu đầu vào.
+    *   `ClassifierRunnable`: Chỉ lo việc gọi LLM và xử lý logic phân loại.
+2.  **Pipeline:** Kết nối `Parser -> Classifier`.
+3.  **Side Effects via Callbacks:** Sử dụng Callback để ghi log lịch sử và tính toán thống kê (Statistics), giữ cho code chính sạch sẽ.
+4.  **Strict System Prompts:** Sử dụng System Prompt chi tiết để định nghĩa danh mục và ép kiểu JSON output.
+---
+*Tài liệu được tạo tự động bởi Antigravity IDE.*

examples/01_intro/CODE.md ADDED Viewed

	@@ -0,0 +1,112 @@

+# Code Explanation: intro.js
+This file demonstrates the most basic interaction with a local LLM (Large Language Model) using node-llama-cpp.
+## Step-by-Step Code Breakdown
+### 1. Import Required Modules
+```javascript
+import {
+    getLlama,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+```
+- **getLlama**: Main function to initialize the llama.cpp runtime
+- **LlamaChatSession**: Class for managing chat conversations with the model
+- **fileURLToPath** and **path**: Standard Node.js modules for handling file paths
+### 2. Set Up Directory Path
+```javascript
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+```
+- Since ES modules don't have `__dirname` by default, we create it manually
+- This gives us the directory path of the current file
+- Needed to locate the model file relative to this script
+### 3. Initialize Llama Runtime
+```javascript
+const llama = await getLlama();
+```
+- Creates the main llama.cpp instance
+- This initializes the underlying C++ runtime for model inference
+- Must be done before loading any models
+### 4. Load the Model
+```javascript
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        "../",
+        "models",
+        "Qwen3-1.7B-Q8_0.gguf"
+    )
+});
+```
+- Loads a quantized model file (GGUF format)
+- **Qwen3-1.7B-Q8_0.gguf**: A 1.7 billion parameter model, quantized to 8-bit
+- The model is stored in the `models` folder at the repository root
+- Loading the model into memory takes a few seconds
+### 5. Create a Context
+```javascript
+const context = await model.createContext();
+```
+- A **context** represents the model's working memory
+- It holds the conversation history and current state
+- Has a fixed size limit (default: model's maximum context size)
+- All prompts and responses are stored in this context
+### 6. Create a Chat Session
+```javascript
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+});
+```
+- **LlamaChatSession**: High-level API for chat-style interactions
+- Uses a sequence from the context to maintain conversation state
+- Automatically handles prompt formatting and response parsing
+### 7. Define the Prompt
+```javascript
+const prompt = `do you know node-llama-cpp`;
+```
+- Simple question to test if the model knows about the library we're using
+- This will be sent to the model for processing
+### 8. Send Prompt and Get Response
+```javascript
+const a1 = await session.prompt(prompt);
+console.log("AI: " + a1);
+```
+- **session.prompt()**: Sends the prompt to the model and waits for completion
+- The model generates a response based on its training
+- We log the response to the console with "AI:" prefix
+### 9. Clean Up Resources
+```javascript
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()
+```
+- **Important**: Always dispose of resources when done
+- Frees up memory and GPU resources
+- Prevents memory leaks in long-running applications
+- Must be done in this order (session → context → model → llama)
+## Key Concepts Demonstrated
+1. **Basic LLM initialization**: Loading a model and creating inference context
+2. **Simple prompting**: Sending a question and receiving a response
+3. **Resource management**: Proper cleanup of allocated resources
+## Expected Output
+When you run this script, you should see output like:
+```
+AI: Yes, I'm familiar with node-llama-cpp. It's a Node.js binding for llama.cpp...
+```
+The exact response will vary based on the model's training data and generation parameters.

examples/01_intro/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,175 @@

+# Concept: Basic LLM Interaction
+## Overview
+This example introduces the fundamental concepts of working with a Large Language Model (LLM) running locally on your machine. It demonstrates the simplest possible interaction: loading a model and asking it a question.
+## What is a Local LLM?
+A **Local LLM** is an AI language model that runs entirely on your own computer, without requiring internet connectivity or external API calls. Key benefits:
+- **Privacy**: Your data never leaves your machine
+- **Cost**: No per-token API charges
+- **Control**: Full control over model selection and parameters
+- **Offline**: Works without internet connection
+## Core Components
+### 1. Model Files (GGUF Format)
+```
+┌─────────────────────────────┐
+│   Qwen3-1.7B-Q8_0.gguf     │
+│   (Model Weights File)      │
+│                             │
+│  • Stores learned patterns  │
+│  • Quantized for efficiency │
+│  • Loaded into RAM/VRAM     │
+└─────────────────────────────┘
+```
+- **GGUF**: File format optimized for llama.cpp
+- **Quantization**: Reduces model size (e.g., 8-bit instead of 16-bit)
+- **Trade-off**: Smaller size and faster speed vs. slight quality loss
+### 2. The Inference Pipeline
+```
+User Input → Model → Generation → Response
+    ↓          ↓          ↓           ↓
+ "Hello"   Context   Sampling    "Hi there!"
+```
+**Flow Diagram:**
+```
+┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
+│  Prompt  │ --> │ Context  │ --> │  Model   │ --> │ Response │
+│          │     │ (Memory) │     │(Weights) │     │  (Text)  │
+└──────────┘     └──────────┘     └──────────┘     └──────────┘
+```
+### 3. Context Window
+The **context** is the model's working memory:
+```
+┌─────────────────────────────────────────┐
+│           Context Window                │
+│  ┌─────────────────────────────────┐   │
+│  │ System Prompt (if any)          │   │
+│  ├─────────────────────────────────┤   │
+│  │ User: "do you know node-llama?" │   │
+│  ├─────────────────────────────────┤   │
+│  │ AI: "Yes, I'm familiar..."      │   │
+│  ├─────────────────────────────────┤   │
+│  │ (Space for more conversation)   │   │
+│  └─────────────────────────────────┘   │
+└─────────────────────────────────────────┘
+```
+- Limited size (e.g., 2048, 4096, or 8192 tokens)
+- When full, old messages must be removed
+- All previous messages influence the next response
+## How LLMs Generate Responses
+### Token-by-Token Generation
+LLMs don't generate entire sentences at once. They predict one **token** (word piece) at a time:
+```
+Prompt: "What is AI?"
+Generation Process:
+"What is AI?" → [Model] → "AI"
+"What is AI? AI" → [Model] → "is"
+"What is AI? AI is" → [Model] → "a"
+"What is AI? AI is a" → [Model] → "field"
+... continues until stop condition
+```
+**Visualization:**
+```
+Input Prompt
+     ↓
+┌────────────┐
+│   Model    │ → Token 1: "AI"
+│ Processes  │ → Token 2: "is"
+│   & Predicts│ → Token 3: "a"
+└────────────┘ → Token 4: "field"
+                → ...
+```
+## Key Concepts for AI Agents
+### 1. Stateless Processing
+- Each prompt is independent unless you maintain context
+- The model has no memory between different script runs
+- To build an "agent", you need to:
+  - Keep the context alive between prompts
+  - Maintain conversation history
+  - Add tools/functions (covered in later examples)
+### 2. Prompt Engineering Basics
+The way you phrase questions affects the response:
+```
+❌ Poor: "node-llama-cpp"
+✅ Better: "do you know node-llama-cpp"
+✅ Best: "Explain what node-llama-cpp is and how it works"
+```
+### 3. Resource Management
+LLMs consume significant resources:
+```
+Model Loading
+     ↓
+┌─────────────────┐
+│  RAM/VRAM Usage │  ← Models need gigabytes
+│  CPU/GPU Time   │  ← Inference takes time
+│  Memory Leaks?  │  ← Must cleanup properly
+└─────────────────┘
+     ↓
+Proper Disposal
+```
+## Why This Matters for Agents
+This basic example establishes the foundation for AI agents:
+1. **Agents need LLMs to "think"**: The model processes information and generates responses
+2. **Agents need context**: To maintain state across interactions
+3. **Agents need structure**: Later examples add tools, memory, and reasoning loops
+## Next Steps
+After understanding basic prompting, explore:
+- **System prompts**: Giving the model a specific role or behavior
+- **Function calling**: Allowing the model to use tools
+- **Memory**: Persisting information across sessions
+- **Reasoning patterns**: Like ReAct (Reasoning + Acting)
+## Diagram: Complete Architecture
+```
+┌──────────────────────────────────────────────────┐
+│            Your Application                      │
+│  ┌────────────────────────────────────────────┐ │
+│  │         node-llama-cpp Library             │ │
+│  │  ┌──────────────────────────────────────┐  │ │
+│  │  │      llama.cpp (C++ Runtime)         │  │ │
+│  │  │  ┌────────────────────────────────┐  │  │ │
+│  │  │  │   Model File (GGUF)            │  │  │ │
+│  │  │  │   • Qwen3-1.7B-Q8_0.gguf       │  │  │ │
+│  │  │  └────────────────────────────────┘  │  │ │
+│  │  └──────────────────────────────────────┘  │ │
+│  └────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────┘
+           ↕
+    ┌──────────────┐
+    │  CPU / GPU   │
+    └──────────────┘
+```
+This layered architecture allows you to build sophisticated AI agents on top of basic LLM interactions.

examples/01_intro/intro.js ADDED Viewed

	@@ -0,0 +1,36 @@

+import {
+    getLlama,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const llama = await getLlama();
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'Qwen3-1.7B-Q8_0.gguf'
+    )
+});
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+});
+const prompt = `do you know node-llama-cpp`;
+const a1 = await session.prompt(prompt);
+console.log("AI: " + a1);
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()

examples/02_openai-intro/CODE.md ADDED Viewed

	@@ -0,0 +1,394 @@

+# Code Explanation: OpenAI Intro
+This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up.
+## Requirements
+Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method.
+### Get API Key
+https://platform.openai.com/api-keys
+### Add Billing Method
+https://platform.openai.com/settings/organization/billing/overview
+### Configure environment variables
+```bash
+   cp .env.example .env
+```
+Then edit `.env` and add your actual API key.
+## Setup and Initialization
+```javascript
+import OpenAI from 'openai';
+import 'dotenv/config';
+const client = new OpenAI({
+    apiKey: process.env.OPENAI_API_KEY,
+});
+```
+**What's happening:**
+- `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js
+- `import 'dotenv/config'` - Load environment variables from `.env` file
+- `new OpenAI({...})` - Create a client instance that handles API authentication and requests
+- `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!)
+**Why it matters:** The client object is your interface to OpenAI's models. All API calls go through this client.
+---
+## Example 1: Basic Chat Completion
+```javascript
+const response = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [
+        { role: 'user', content: 'What is node-llama-cpp?' }
+    ],
+});
+console.log(response.choices[0].message.content);
+```
+**What's happening:**
+- `chat.completions.create()` - The primary method for sending messages to ChatGPT models
+- `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model)
+- `messages` array - Contains the conversation history
+- `role: 'user'` - Indicates this message comes from the user (you)
+- `response.choices[0]` - The API returns an array of possible responses; we take the first one
+- `message.content` - The actual text response from the AI
+**Response structure:**
+```javascript
+{
+  id: 'chatcmpl-...',
+  object: 'chat.completion',
+  created: 1234567890,
+  model: 'gpt-4o',
+  choices: [
+    {
+      index: 0,
+      message: {
+        role: 'assistant',
+        content: 'node-llama-cpp is a...'
+      },
+      finish_reason: 'stop'
+    }
+  ],
+  usage: {
+    prompt_tokens: 10,
+    completion_tokens: 50,
+    total_tokens: 60
+  }
+}
+```
+---
+## Example 2: System Prompts
+```javascript
+const response = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [
+        { role: 'system', content: 'You are a coding assistant that talks like a pirate.' },
+        { role: 'user', content: 'Explain what async/await does in JavaScript.' }
+    ],
+});
+```
+**What's happening:**
+- `role: 'system'` - Special message type that sets the AI's behavior and personality
+- System messages are processed first and influence all subsequent responses
+- The model will maintain this behavior throughout the conversation
+**Why it matters:** System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.).
+**Key insight:** Same model + different system prompts = completely different agents!
+---
+## Example 3: Temperature Control
+```javascript
+// Focused response
+const focusedResponse = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [{ role: 'user', content: prompt }],
+    temperature: 0.2,
+});
+// Creative response
+const creativeResponse = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [{ role: 'user', content: prompt }],
+    temperature: 1.5,
+});
+```
+**What's happening:**
+- `temperature` - Controls randomness in the output (range: 0.0 to 2.0)
+- **Low temperature (0.0 - 0.3):**
+    - More focused and deterministic
+    - Same input → similar output
+    - Best for: factual answers, code generation, data extraction
+- **Medium temperature (0.7 - 1.0):**
+    - Balanced creativity and coherence
+    - Default for most use cases
+- **High temperature (1.2 - 2.0):**
+    - More creative and varied
+    - Same input → very different outputs
+    - Best for: creative writing, brainstorming, story generation
+**Real-world usage:**
+- Code completion: temperature 0.2
+- Customer support: temperature 0.5
+- Creative content: temperature 1.2
+---
+## Example 4: Conversation Context
+```javascript
+const messages = [
+    { role: 'system', content: 'You are a helpful coding tutor.' },
+    { role: 'user', content: 'What is a Promise in JavaScript?' },
+];
+const response1 = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: messages,
+});
+// Add AI response to history
+messages.push(response1.choices[0].message);
+// Add follow-up question
+messages.push({ role: 'user', content: 'Can you show me a simple example?' });
+// Second request with full context
+const response2 = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: messages,
+});
+```
+**What's happening:**
+- OpenAI models are **stateless** - they don't remember previous conversations
+- We maintain context by sending the entire conversation history with each request
+- Each request is independent; you must include all relevant messages
+**Message order in the array:**
+1. System prompt (optional, but recommended first)
+2. Previous user message
+3. Previous assistant response
+4. Current user message
+**Why it matters:** This is how chatbots remember context. The full conversation is sent every time.
+**Performance consideration:**
+- More messages = more tokens = higher cost
+- Longer conversations eventually hit token limits
+- Real applications need conversation trimming or summarization strategies
+---
+## Example 5: Streaming Responses
+```javascript
+const stream = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [
+        { role: 'user', content: 'Write a haiku about programming.' }
+    ],
+    stream: true,  // Enable streaming
+});
+for await (const chunk of stream) {
+    const content = chunk.choices[0]?.delta?.content || '';
+    process.stdout.write(content);
+}
+```
+**What's happening:**
+- `stream: true` - Instead of waiting for the complete response, receive it token-by-token
+- `for await...of` - Iterate over the stream as chunks arrive
+- `delta.content` - Each chunk contains a small piece of text (often just a word or partial word)
+- `process.stdout.write()` - Write without newline to display text progressively
+**Streaming vs. Non-streaming:**
+**Non-streaming (default):**
+```
+[Request sent]
+[Wait 5 seconds...]
+[Full response arrives]
+```
+**Streaming:**
+```
+[Request sent]
+Once [chunk arrives: "Once"]
+upon [chunk arrives: " upon"]
+a [chunk arrives: " a"]
+time [chunk arrives: " time"]
+...
+```
+**Why it matters:**
+- Better user experience (immediate feedback)
+- Appears faster even though total time is similar
+- Essential for real-time chat interfaces
+- Allows early processing/display of partial results
+**When to use streaming:**
+- Interactive chat applications
+- Long-form content generation
+- When user experience matters more than simplicity
+**When to NOT use streaming:**
+- Simple scripts or automation
+- When you need the complete response before processing
+- Batch processing
+---
+## Example 6: Token Usage
+```javascript
+const response = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [
+        { role: 'user', content: 'Explain recursion in 3 sentences.' }
+    ],
+    max_tokens: 100,
+});
+console.log("Token usage:");
+console.log("- Prompt tokens: " + response.usage.prompt_tokens);
+console.log("- Completion tokens: " + response.usage.completion_tokens);
+console.log("- Total tokens: " + response.usage.total_tokens);
+```
+**What's happening:**
+- `max_tokens` - Limits the length of the AI's response
+- `response.usage` - Contains token consumption details
+- **Prompt tokens:** Your input (messages you sent)
+- **Completion tokens:** AI's output (the response)
+- **Total tokens:** Sum of both (what you're billed for)
+**Understanding tokens:**
+- Tokens ≠ words
+- 1 token ≈ 0.75 words (in English)
+- "hello" = 1 token
+- "chatbot" = 2 tokens ("chat" + "bot")
+- Punctuation and spaces count as tokens
+**Why it matters:**
+1. **Cost control:** You pay per token
+2. **Context limits:** Models have maximum token limits (e.g., gpt-4o: 128,000 tokens)
+3. **Response control:** Use `max_tokens` to prevent overly long responses
+**Practical limits:**
+```javascript
+// Prevent runaway responses
+max_tokens: 150,  // ~100 words
+// Brief responses
+max_tokens: 50,   // ~35 words
+// Longer content
+max_tokens: 1000, // ~750 words
+```
+**Cost estimation (approximate):**
+- GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens
+- GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens
+---
+## Example 7: Model Comparison
+```javascript
+// GPT-4o - Most capable
+const gpt4Response = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [{ role: 'user', content: prompt }],
+});
+// GPT-3.5-turbo - Faster and cheaper
+const gpt35Response = await client.chat.completions.create({
+    model: 'gpt-3.5-turbo',
+    messages: [{ role: 'user', content: prompt }],
+});
+```
+**Available models:**
+| Model | Best For | Speed | Cost | Context Window |
+|-------|----------|-------|------|----------------|
+| `gpt-4o` | Complex tasks, reasoning, accuracy | Medium | $$$ | 128K tokens |
+| `gpt-4o-mini` | Balanced performance/cost | Fast | $$ | 128K tokens |
+| `gpt-3.5-turbo` | Simple tasks, high volume | Very Fast | $ | 16K tokens |
+**Choosing the right model:**
+- **Use GPT-4o when:**
+    - Complex reasoning required
+    - High accuracy is critical
+    - Working with code or technical content
+    - Quality > speed/cost
+- **Use GPT-4o-mini when:**
+    - Need good performance at lower cost
+    - Most general-purpose tasks
+- **Use GPT-3.5-turbo when:**
+    - Simple classification or extraction
+    - High-volume, low-complexity tasks
+    - Speed is critical
+    - Budget constraints
+**Pro tip:** Start with gpt-4o for development, then evaluate if cheaper models work for your use case.
+---
+## Error Handling
+```javascript
+try {
+    await basicCompletion();
+} catch (error) {
+    console.error("Error:", error.message);
+    if (error.message.includes('API key')) {
+        console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");
+    }
+}
+```
+**Common errors:**
+- `401 Unauthorized` - Invalid or missing API key
+- `429 Too Many Requests` - Rate limit exceeded
+- `500 Internal Server Error` - OpenAI service issue
+- `Context length exceeded` - Too many tokens in conversation
+**Best practices:**
+- Always use try-catch with async calls
+- Check error types and provide helpful messages
+- Implement retry logic for transient failures
+- Monitor token usage to avoid limit errors
+---
+## Key Takeaways
+1. **Stateless Nature:** Models don't remember. You send full context each time.
+2. **Message Roles:** `system` (behavior), `user` (input), `assistant` (AI response)
+3. **Temperature:** Controls creativity (0 = focused, 2 = creative)
+4. **Streaming:** Better UX for real-time applications
+5. **Token Management:** Monitor usage for cost and limits
+6. **Model Selection:** Choose based on task complexity and budget

examples/02_openai-intro/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,950 @@

+# Concepts: Understanding OpenAI APIs
+This guide explains the fundamental concepts behind working with OpenAI's language models, which form the foundation for building AI agents.
+## What is the OpenAI API?
+The OpenAI API provides programmatic access to powerful language models like GPT-4o and GPT-3.5-turbo. Instead of running models locally, you send requests to OpenAI's servers and receive responses.
+**Key characteristics:**
+- **Cloud-based:** Models run on OpenAI's infrastructure
+- **Pay-per-use:** Charged by token consumption
+- **Production-ready:** Enterprise-grade reliability and performance
+- **Latest models:** Immediate access to newest model releases
+**Comparison with Local LLMs (like node-llama-cpp):**
+| Aspect | OpenAI API | Local LLMs |
+|--------|------------|------------|
+| **Setup** | API key only | Download models, need GPU/RAM |
+| **Cost** | Pay per token | Free after initial setup |
+| **Performance** | Consistent, high-quality | Depends on your hardware |
+| **Privacy** | Data sent to OpenAI | Completely local/private |
+| **Scalability** | Unlimited (with payment) | Limited by your hardware |
+---
+## The Chat Completions API
+### Request-Response Cycle
+```
+You (Client)                    OpenAI (Server)
+     |                                |
+     |  POST /v1/chat/completions    |
+     |  {                             |
+     |    model: "gpt-4o",            |
+     |    messages: [...]             |
+     |  }                             |
+     |------------------------------->|
+     |                                |
+     |        [Processing...]         |
+     |        [Model inference]       |
+     |        [Generate response]     |
+     |                                |
+     |  Response                      |
+     |  {                             |
+     |    choices: [{                 |
+     |      message: {                |
+     |        content: "..."          |
+     |      }                         |
+     |    }]                          |
+     |  }                             |
+     |<-------------------------------|
+     |                                |
+```
+**Key point:** Each request is independent. The API doesn't store conversation history.
+---
+## Message Roles: The Conversation Structure
+Every message has a `role` that determines its purpose:
+### 1. System Messages
+```javascript
+{ role: 'system', content: 'You are a helpful Python tutor.' }
+```
+**Purpose:** Define the AI's behavior, personality, and capabilities
+**Think of it as:**
+- The AI's "job description"
+- Invisible to the end user
+- Sets constraints and guidelines
+**Examples:**
+```javascript
+// Specialist agent
+"You are an expert SQL database administrator."
+// Tone and style
+"You are a friendly customer support agent. Be warm and empathetic."
+// Output format control
+"You are a JSON API. Always respond with valid JSON, never plain text."
+// Behavioral constraints
+"You are a code reviewer. Be constructive and focus on best practices."
+```
+**Best practices:**
+- Keep it concise but specific
+- Place at the beginning of the messages array
+- Update it to change agent behavior
+- Use for ethical guidelines and output formatting
+### 2. User Messages
+```javascript
+{ role: 'user', content: 'How do I use async/await?' }
+```
+**Purpose:** Represent the human's input or questions
+**Think of it as:**
+- What you're asking the AI
+- The prompt or query
+- The instruction to follow
+### 3. Assistant Messages
+```javascript
+{ role: 'assistant', content: 'Async/await is a way to handle promises...' }
+```
+**Purpose:** Represent the AI's previous responses
+**Think of it as:**
+- The AI's conversation history
+- Context for follow-up questions
+- What the AI has already said
+### Conversation Flow Example
+```javascript
+[
+  { role: 'system', content: 'You are a math tutor.' },
+  // First exchange
+  { role: 'user', content: 'What is 15 * 24?' },
+  { role: 'assistant', content: '15 * 24 = 360' },
+  // Follow-up (knows context)
+  { role: 'user', content: 'What about dividing that by 3?' },
+  { role: 'assistant', content: '360 ÷ 3 = 120' },
+]
+```
+**Why this matters:** The role structure enables:
+1. **Context awareness:** AI understands conversation history
+2. **Behavior control:** System prompts shape responses
+3. **Multi-turn conversations:** Natural back-and-forth dialogue
+---
+## Statelessness: A Critical Concept
+**Most important principle:** OpenAI's API is stateless.
+### What does stateless mean?
+Each API call is independent. The model doesn't remember previous requests.
+```
+Request 1: "My name is Alice"
+Response 1: "Hello Alice!"
+Request 2: "What's my name?"
+Response 2: "I don't know your name."  ← No memory!
+```
+### How to maintain context
+**You must send the full conversation history:**
+```javascript
+const messages = [];
+// First turn
+messages.push({ role: 'user', content: 'My name is Alice' });
+const response1 = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: messages  // ["My name is Alice"]
+});
+messages.push(response1.choices[0].message);
+// Second turn - include full history
+messages.push({ role: 'user', content: "What's my name?" });
+const response2 = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: messages  // Full conversation!
+});
+```
+### Implications
+**Benefits:**
+- ✅ Simple architecture (no server-side state)
+- ✅ Easy to scale (any server can handle any request)
+- ✅ Full control over context (you decide what to include)
+**Challenges:**
+- ❌ You manage conversation history
+- ❌ Token costs increase with conversation length
+- ❌ Must implement your own memory/persistence
+- ❌ Context window limits eventually hit
+**Real-world solutions:**
+```javascript
+// Trim old messages when too long
+if (messages.length > 20) {
+    messages = [messages[0], ...messages.slice(-10)];  // Keep system + last 10
+}
+// Summarize old context
+if (totalTokens > 10000) {
+    const summary = await summarizeConversation(messages);
+    messages = [systemMessage, summary, ...recentMessages];
+}
+```
+---
+## Temperature: Controlling Randomness
+Temperature controls how "creative" or "random" the model's output is.
+### How it works technically
+When generating each token, the model assigns probabilities to possible next tokens:
+```
+Input: "The sky is"
+Possible next tokens:
+  - "blue"     → 70% probability
+  - "clear"    → 15% probability
+  - "dark"     → 10% probability
+  - "purple"   → 5% probability
+```
+**Temperature modifies these probabilities:**
+**Temperature = 0.0 (Deterministic)**
+```
+Always pick the highest probability token
+"The sky is blue"  ← Same output every time
+```
+**Temperature = 0.7 (Balanced)**
+```
+Sample probabilistically with slight randomness
+"The sky is blue" or "The sky is clear"
+```
+**Temperature = 1.5 (Creative)**
+```
+Flatten probabilities, allow unlikely choices
+"The sky is purple" or "The sky is dancing"  ← More surprising!
+```
+### Practical Guidelines
+**Temperature 0.0 - 0.3: Focused Tasks**
+- Code generation
+- Data extraction
+- Factual Q&A
+- Classification
+- Translation
+Example:
+```javascript
+// Extract JSON from text - needs consistency
+temperature: 0.1
+```
+**Temperature 0.5 - 0.9: Balanced Tasks**
+- General conversation
+- Customer support
+- Content summarization
+- Educational content
+Example:
+```javascript
+// Friendly chatbot
+temperature: 0.7
+```
+**Temperature 1.0 - 2.0: Creative Tasks**
+- Story writing
+- Brainstorming
+- Poetry/creative content
+- Generating variations
+Example:
+```javascript
+// Generate 10 different marketing taglines
+temperature: 1.3
+```
+---
+## Streaming: Real-time Responses
+### Non-Streaming (Default)
+```
+User: "Tell me a story"
+[Wait...]
+[Wait...]
+[Wait...]
+Response: "Once upon a time, there was a..." (all at once)
+```
+**Pros:**
+- Simple to implement
+- Easy to handle errors
+- Get complete response before processing
+**Cons:**
+- Appears slow for long responses
+- No feedback during generation
+- Poor user experience for chat
+### Streaming
+```
+User: "Tell me a story"
+"Once"
+"Once upon"
+"Once upon a"
+"Once upon a time"
+"Once upon a time there"
+...
+```
+**Pros:**
+- Immediate feedback
+- Appears faster
+- Better user experience
+- Can process tokens as they arrive
+**Cons:**
+- More complex code
+- Harder error handling
+- Can't see full response before displaying
+### When to Use Each
+**Use Non-Streaming:**
+- Batch processing scripts
+- When you need to analyze the full response
+- Simple command-line tools
+- API endpoints that return complete results
+**Use Streaming:**
+- Chat interfaces
+- Interactive applications
+- Long-form content generation
+- Any user-facing application where UX matters
+---
+## Tokens: The Currency of LLMs
+### What are tokens?
+Tokens are the fundamental units that language models process. They're not exactly words, but pieces of text.
+**Tokenization examples:**
+```
+"Hello world"        → ["Hello", " world"]           = 2 tokens
+"coding"             → ["coding"]                    = 1 token
+"uncoded"            → ["un", "coded"]               = 2 tokens
+```
+### Why tokens matter
+**1. Cost**
+You pay per token (input + output):
+```
+Request: 100 tokens
+Response: 150 tokens
+Total billed: 250 tokens
+```
+**2. Context Limits**
+Each model has a maximum token limit:
+```
+gpt-4o:        128,000 tokens  (≈96,000 words)
+gpt-3.5-turbo: 16,384 tokens   (≈12,000 words)
+```
+**3. Performance**
+More tokens = longer processing time and higher cost
+### Managing Token Usage
+**Monitor usage:**
+```javascript
+console.log(response.usage.total_tokens);
+// Track cumulative usage for budgeting
+```
+**Limit response length:**
+```javascript
+max_tokens: 150  // Cap the response
+```
+**Trim conversation history:**
+```javascript
+// Keep only recent messages
+if (messages.length > 20) {
+    messages = messages.slice(-20);
+}
+```
+**Estimate before sending:**
+```javascript
+import { encode } from 'gpt-tokenizer';
+const text = "Your message here";
+const tokens = encode(text).length;
+console.log(`Estimated tokens: ${tokens}`);
+```
+---
+## Model Selection: Choosing the Right Tool
+### GPT-4o: The Powerhouse
+**Best for:**
+- Complex reasoning tasks
+- Code generation and debugging
+- Technical content
+- Tasks requiring high accuracy
+- Working with structured data
+**Characteristics:**
+- Most capable model
+- Higher cost
+- Slower than GPT-3.5
+- Best for quality-critical applications
+**Example use cases:**
+- Legal document analysis
+- Complex code refactoring
+- Research and analysis
+- Educational tutoring
+### GPT-4o-mini: The Balanced Choice
+**Best for:**
+- General-purpose applications
+- Good balance of cost and performance
+- Most everyday tasks
+**Characteristics:**
+- Good performance
+- Moderate cost
+- Fast response times
+- Sweet spot for many applications
+**Example use cases:**
+- Customer support chatbots
+- Content summarization
+- General Q&A
+- Moderate complexity tasks
+### GPT-3.5-turbo: The Speed Demon
+**Best for:**
+- High-volume, simple tasks
+- Speed-critical applications
+- Budget-conscious projects
+- Classification and extraction
+**Characteristics:**
+- Very fast
+- Lowest cost
+- Good for simple tasks
+- Less capable reasoning
+**Example use cases:**
+- Sentiment analysis
+- Text classification
+- Simple formatting
+- High-throughput processing
+### Decision Framework
+```
+Is task critical and complex?
+├─ YES → GPT-4o
+└─ NO
+   └─ Is speed important and task simple?
+      ├─ YES → GPT-3.5-turbo
+      └─ NO → GPT-4o-mini
+```
+---
+## Error Handling and Resilience
+### Common Error Scenarios
+**1. Authentication Errors (401)**
+```javascript
+// Invalid API key
+Error: Incorrect API key provided
+```
+**2. Rate Limiting (429)**
+```javascript
+// Too many requests
+Error: Rate limit exceeded
+```
+**3. Token Limits (400)**
+```javascript
+// Context too long
+Error: This model's maximum context length is 16385 tokens
+```
+**4. Service Errors (500)**
+```javascript
+// OpenAI service issue
+Error: The server had an error processing your request
+```
+### Best Practices
+**1. Always use try-catch:**
+```javascript
+try {
+    const response = await client.chat.completions.create({...});
+} catch (error) {
+    if (error.status === 429) {
+        // Implement backoff and retry
+    } else if (error.status === 500) {
+        // Retry with exponential backoff
+    } else {
+        // Log and handle appropriately
+    }
+}
+```
+**2. Implement retry logic:**
+```javascript
+async function retryWithBackoff(fn, maxRetries = 3) {
+    for (let i = 0; i < maxRetries; i++) {
+        try {
+            return await fn();
+        } catch (error) {
+            if (i === maxRetries - 1) throw error;
+            await sleep(Math.pow(2, i) * 1000);  // Exponential backoff
+        }
+    }
+}
+```
+**3. Monitor token usage:**
+```javascript
+let totalTokens = 0;
+totalTokens += response.usage.total_tokens;
+if (totalTokens > MONTHLY_BUDGET_TOKENS) {
+    throw new Error('Monthly token budget exceeded');
+}
+```
+---
+## Architectural Patterns
+### Pattern 1: Simple Request-Response
+**Use case:** One-off queries, simple automation
+```javascript
+const response = await client.chat.completions.create({
+    model: 'gpt-4o',
+    messages: [{ role: 'user', content: query }]
+});
+```
+**Pros:** Simple, easy to understand
+**Cons:** No context, no memory
+### Pattern 2: Stateful Conversation
+**Use case:** Chat applications, tutoring, customer support
+```javascript
+class Conversation {
+    constructor() {
+        this.messages = [
+            { role: 'system', content: 'Your behavior' }
+        ];
+    }
+    async ask(userMessage) {
+        this.messages.push({ role: 'user', content: userMessage });
+        const response = await client.chat.completions.create({
+            model: 'gpt-4o',
+            messages: this.messages
+        });
+        this.messages.push(response.choices[0].message);
+        return response.choices[0].message.content;
+    }
+}
+```
+**Pros:** Maintains context, natural conversation
+**Cons:** Token costs grow, needs management
+### Pattern 3: Specialized Agents
+**Use case:** Domain-specific applications
+```javascript
+class PythonTutor {
+    async help(question) {
+        return await client.chat.completions.create({
+            model: 'gpt-4o',
+            messages: [
+                {
+                    role: 'system',
+                    content: 'You are an expert Python tutor. Explain concepts clearly with code examples.'
+                },
+                { role: 'user', content: question }
+            ],
+            temperature: 0.3  // Focused responses
+        });
+    }
+}
+```
+**Pros:** Consistent behavior, optimized for domain
+**Cons:** Less flexible
+---
+## Hybrid Approach: Combining Proprietary and Open Source Models
+In real-world projects, the best solution often isn't choosing between OpenAI and local LLMs - it's using **both strategically**.
+### Why Use a Hybrid Approach?
+**Cost optimization:** Use expensive models only when necessary
+**Privacy compliance:** Keep sensitive data local while leveraging cloud for general tasks
+**Performance balance:** Fast local models for simple tasks, powerful cloud models for complex ones
+**Reliability:** Fallback options when one service is down
+**Flexibility:** Match the right tool to each specific task
+### Common Hybrid Architectures
+#### Pattern 1: Tiered Processing
+```
+Simple tasks → Local LLM (fast, free, private)
+    ↓ If complex
+Complex tasks → OpenAI API (powerful, accurate)
+```
+**Example workflow:**
+```javascript
+async function processQuery(query) {
+    const complexity = await assessComplexity(query);
+    if (complexity < 0.5) {
+        // Use local model for simple queries
+        return await localLLM.generate(query);
+    } else {
+        // Use OpenAI for complex reasoning
+        return await openai.chat.completions.create({
+            model: 'gpt-4o',
+            messages: [{ role: 'user', content: query }]
+        });
+    }
+}
+```
+**Use cases:**
+- Customer support: Local model for FAQs, GPT-4 for complex issues
+- Code generation: Local for simple scripts, GPT-4 for architecture
+- Content moderation: Local for obvious cases, cloud for edge cases
+#### Pattern 2: Privacy-Based Routing
+```
+Public data → OpenAI (best quality)
+Sensitive data → Local LLM (private, secure)
+```
+**Example:**
+```javascript
+async function handleRequest(data, containsSensitiveInfo) {
+    if (containsSensitiveInfo) {
+        // Process locally - data never leaves your infrastructure
+        return await localLLM.generate(data, {
+            systemPrompt: "You are a HIPAA-compliant assistant"
+        });
+    } else {
+        // Use cloud for better quality
+        return await openai.chat.completions.create({
+            model: 'gpt-4o',
+            messages: [{ role: 'user', content: data }]
+        });
+    }
+}
+```
+**Use cases:**
+- Healthcare: Patient data → Local, General medical info → OpenAI
+- Finance: Transaction details → Local, Market analysis → OpenAI
+- Legal: Client communications → Local, Legal research → OpenAI
+#### Pattern 3: Specialized Agent Ecosystem
+```
+Agent 1 (Local): Fast classifier
+    ↓ Routes to
+Agent 2 (OpenAI): Deep analyzer
+    ↓ Routes to
+Agent 3 (Local): Action executor
+```
+**Example:**
+```javascript
+class MultiModelAgent {
+    async process(input) {
+        // Step 1: Local model classifies intent (fast, cheap)
+        const intent = await localLLM.classify(input);
+        // Step 2: Route to appropriate handler
+        if (intent.requiresReasoning) {
+            // Complex reasoning with GPT-4
+            const analysis = await openai.chat.completions.create({
+                model: 'gpt-4o',
+                messages: [{ role: 'user', content: input }]
+            });
+            return analysis.choices[0].message.content;
+        } else {
+            // Simple response with local model
+            return await localLLM.generate(input);
+        }
+    }
+}
+```
+**Use cases:**
+- Multi-stage pipelines with different complexity levels
+- Agent systems where each agent has specialized capabilities
+- Workflows requiring both speed and intelligence
+#### Pattern 4: Development vs Production
+```
+Development → OpenAI (fast iteration, best results)
+    ↓ Optimize
+Production → Local LLM (cost-effective, private)
+```
+**Workflow:**
+```javascript
+const MODEL_PROVIDER = process.env.NODE_ENV === 'production'
+    ? 'local'
+    : 'openai';
+async function generateResponse(prompt) {
+    if (MODEL_PROVIDER === 'local') {
+        return await localLLM.generate(prompt);
+    } else {
+        return await openai.chat.completions.create({
+            model: 'gpt-4o',
+            messages: [{ role: 'user', content: prompt }]
+        });
+    }
+}
+```
+**Strategy:**
+1. Develop with GPT-4 to get best results quickly
+2. Fine-tune prompts and test thoroughly
+3. Switch to local model for production
+4. Fall back to OpenAI for edge cases
+#### Pattern 5: Ensemble Approach
+```
+Query → [Local Model, OpenAI, Another API]
+           ↓          ↓            ↓
+        Response  Response     Response
+           ↓          ↓            ↓
+        Aggregator / Validator
+                  ↓
+            Best Response
+```
+**Example:**
+```javascript
+async function ensembleGenerate(prompt) {
+    // Get responses from multiple sources
+    const [local, openai, backup] = await Promise.allSettled([
+        localLLM.generate(prompt),
+        openaiClient.chat.completions.create({
+            model: 'gpt-4o',
+            messages: [{ role: 'user', content: prompt }]
+        }),
+        backupAPI.generate(prompt)
+    ]);
+    // Use validator to pick best or combine
+    return validator.selectBest([local, openai, backup]);
+}
+```
+**Use cases:**
+- Critical applications requiring high confidence
+- Fact-checking and verification
+- Reducing hallucinations through consensus
+### Cost-Benefit Analysis
+#### Scenario: Customer Support Chatbot (10,000 queries/day)
+**Option A: OpenAI Only**
+```
+10,000 queries × 500 tokens avg = 5M tokens/day
+Cost: ~$25-50/day = ~$750-1500/month
+Pros: Highest quality, zero infrastructure
+Cons: Expensive at scale, privacy concerns
+```
+**Option B: Local LLM Only**
+```
+Infrastructure: $100-500/month (server/GPU)
+Cost: $100-500/month
+Pros: Predictable costs, private, unlimited usage
+Cons: Setup complexity, maintenance, lower quality
+```
+**Option C: Hybrid (80% local, 20% OpenAI)**
+```
+8,000 simple queries → Local LLM (free after setup)
+2,000 complex queries → OpenAI (~$5-10/day)
+Infrastructure: $100-500/month
+API costs: $150-300/month
+Total: $250-800/month
+Pros: Cost-effective, high quality when needed, flexible
+Cons: More complex architecture
+```
+**Winner for most projects: Hybrid approach** ✓
+### Decision Framework
+```
+START: New query arrives
+    ↓
+Is data sensitive/regulated?
+├─ YES → Use local model (privacy first)
+└─ NO → Continue
+    ↓
+Is task simple/repetitive?
+├─ YES → Use local model (cost-effective)
+└─ NO → Continue
+    ↓
+Is high accuracy critical?
+├─ YES → Use OpenAI (quality first)
+└─ NO → Continue
+    ↓
+Is it high volume?
+├─ YES → Use local model (cost at scale)
+└─ NO → Use OpenAI (simplicity)
+```
+### The Future: Intelligent Model Selection
+Advanced systems will automatically choose models based on real-time factors:
+```javascript
+class IntelligentModelSelector {
+    async selectModel(query, context) {
+        const factors = {
+            complexity: await this.analyzeComplexity(query),
+            latency: context.userTolerance,
+            budget: context.remainingBudget,
+            accuracy: context.requiredConfidence,
+            privacy: context.dataClassification
+        };
+        // ML model predicts best provider
+        const selection = await this.mlSelector.predict(factors);
+        return {
+            provider: selection.provider,  // 'local' | 'openai-mini' | 'openai-4'
+            confidence: selection.confidence,
+            reasoning: selection.reasoning
+        };
+    }
+}
+```
+### Key Takeaway
+**You don't have to choose.** Modern AI applications benefit from using the right model for each task:
+- **OpenAI / Claude / Host own big open source models:** Complex reasoning, critical accuracy, rapid development
+- **Local for scale:** Privacy, cost control, high volume, offline operation
+- **Both for success:** Cost-effective, flexible, reliable production systems
+The best architecture leverages the strengths of each approach while mitigating their weaknesses.
+---
+## Preparing for Agents
+The concepts covered here are **foundational** for building AI agents:
+### You now understand:
+- **How to communicate with LLMs** (API basics)
+- **How to shape behavior** (system prompts)
+- **How to maintain context** (message history)
+- **How to control output** (temperature, tokens)
+- **How to handle responses** (streaming, errors)
+### What's next for agents:
+- **Function calling / Tool use** - Let the AI take actions
+- **Memory systems** - Persistent state across sessions
+- **ReAct patterns** - Iterative reasoning and observation
+**Bottom line:** You can't build good agents without mastering these fundamentals. Every agent pattern builds on this foundation.
+---
+## Key Insights
+1. **Statelessness is power and burden:** You control context, but you must manage it
+2. **System prompts are your secret weapon:** Same model → different behaviors
+3. **Temperature changes everything:** Match it to your task type
+4. **Tokens are the real currency:** Monitor and optimize usage
+5. **Model choice matters:** Don't use a sledgehammer for a nail
+6. **Streaming improves UX:** Use it for user-facing applications
+7. **Error handling is not optional:** The network will fail, plan for it
+---
+## Further Reading
+- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
+- [OpenAI Cookbook](https://cookbook.openai.com/)
+- [Best Practices for Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering)
+- [Token Counting](https://platform.openai.com/tokenizer)

examples/02_openai-intro/openai-intro.js ADDED Viewed

	@@ -0,0 +1,205 @@

+import OpenAI from 'openai';
+import 'dotenv/config';
+// Initialize OpenAI client
+// Create an API key at https://platform.openai.com/api-keys
+const client = new OpenAI({
+    apiKey: process.env.OPENAI_API_KEY,
+});
+console.log("=== OpenAI Intro: Understanding the Basics ===\n");
+// ============================================
+// EXAMPLE 1: Basic Chat Completion
+// ============================================
+async function basicCompletion() {
+    console.log("--- Example 1: Basic Chat Completion ---");
+    const response = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [
+            { role: 'user', content: 'What is node-llama-cpp?' }
+        ],
+    });
+    console.log("AI: " + response.choices[0].message.content);
+    console.log("\n");
+}
+// ============================================
+// EXAMPLE 2: Using System Prompts
+// ============================================
+async function systemPromptExample() {
+    console.log("--- Example 2: System Prompts (Behavioral Control) ---");
+    const response = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [
+            { role: 'system', content: 'You are a coding assistant that talks like a pirate.' },
+            { role: 'user', content: 'Explain what async/await does in JavaScript.' }
+        ],
+    });
+    console.log("AI: " + response.choices[0].message.content);
+    console.log("\n");
+}
+// ============================================
+// EXAMPLE 3: Temperature and Creativity
+// ============================================
+async function temperatureExample() {
+    console.log("--- Example 3: Temperature Control ---");
+    const prompt = "Write a one-sentence tagline for a coffee shop.";
+    // Low temperature = more focused and deterministic
+    const focusedResponse = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [{ role: 'user', content: prompt }],
+        temperature: 0.2,
+    });
+    // High temperature = more creative and varied
+    const creativeResponse = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [{ role: 'user', content: prompt }],
+        temperature: 1.5,
+    });
+    console.log("Low temp (0.2): " + focusedResponse.choices[0].message.content);
+    console.log("High temp (1.5): " + creativeResponse.choices[0].message.content);
+    console.log("\n");
+}
+// ============================================
+// EXAMPLE 4: Conversation with Context
+// ============================================
+async function conversationContext() {
+    console.log("--- Example 4: Multi-turn Conversation ---");
+    // Build conversation history
+    const messages = [
+        { role: 'system', content: 'You are a helpful coding tutor.' },
+        { role: 'user', content: 'What is a Promise in JavaScript?' },
+    ];
+    // First response
+    const response1 = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: messages,
+        max_tokens: 150,
+    });
+    console.log("User: What is a Promise in JavaScript?");
+    console.log("AI: " + response1.choices[0].message.content);
+    // Add AI response to history
+    messages.push(response1.choices[0].message);
+    // Add follow-up question
+    messages.push({ role: 'user', content: 'Can you show me a simple example?' });
+    // Second response (with context)
+    const response2 = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: messages,
+    });
+    console.log("\nUser: Can you show me a simple example?");
+    console.log("AI: " + response2.choices[0].message.content);
+    console.log("\n");
+}
+// ============================================
+// EXAMPLE 5: Streaming Responses
+// ============================================
+async function streamingExample() {
+    console.log("--- Example 5: Streaming Response ---");
+    console.log("AI: ");
+    const stream = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [
+            { role: 'user', content: 'Write a haiku about programming.' }
+        ],
+        stream: true,
+    });
+    for await (const chunk of stream) {
+        const content = chunk.choices[0]?.delta?.content || '';
+        process.stdout.write(content);
+    }
+    console.log("\n\n");
+}
+// ============================================
+// EXAMPLE 6: Token Usage and Limits
+// ============================================
+async function tokenUsageExample() {
+    console.log("--- Example 6: Understanding Token Usage ---");
+    const response = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [
+            { role: 'user', content: 'Explain recursion in 3 sentences.' }
+        ],
+        max_tokens: 100,
+    });
+    console.log("AI: " + response.choices[0].message.content);
+    console.log("\nToken usage:");
+    console.log("- Prompt tokens: " + response.usage.prompt_tokens);
+    console.log("- Completion tokens: " + response.usage.completion_tokens);
+    console.log("- Total tokens: " + response.usage.total_tokens);
+    console.log("\n");
+}
+// ============================================
+// EXAMPLE 7: Model Comparison
+// ============================================
+async function modelComparison() {
+    console.log("--- Example 7: Different Models ---");
+    const prompt = "What's 25 * 47?";
+    // GPT-4o - Most capable
+    const gpt4Response = await client.chat.completions.create({
+        model: 'gpt-4o',
+        messages: [{ role: 'user', content: prompt }],
+    });
+    // GPT-3.5-turbo - Faster and cheaper
+    const gpt35Response = await client.chat.completions.create({
+        model: 'gpt-3.5-turbo',
+        messages: [{ role: 'user', content: prompt }],
+    });
+    console.log("GPT-4o: " + gpt4Response.choices[0].message.content);
+    console.log("GPT-3.5-turbo: " + gpt35Response.choices[0].message.content);
+    console.log("\n");
+}
+// ============================================
+// Run all examples
+// ============================================
+async function main() {
+    try {
+        await basicCompletion();
+        await systemPromptExample();
+        await temperatureExample();
+        await conversationContext();
+        await streamingExample();
+        await tokenUsageExample();
+        await modelComparison();
+        console.log("=== All examples completed! ===");
+    } catch (error) {
+        console.error("Error:", error.message);
+        if (error.message.includes('API key')) {
+            console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");
+        }
+    }
+}
+main();

examples/03_translation/CODE.md ADDED Viewed

	@@ -0,0 +1,231 @@

+# Code Explanation: translation.js
+This file demonstrates how to use **system prompts** to specialize an AI agent for a specific task - in this case, professional German translation.
+## Step-by-Step Code Breakdown
+### 1. Import Required Modules
+```javascript
+import {
+  getLlama, LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+```
+- Imports are the same as the intro example
+### 2. Initialize and Load Model
+```javascript
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const llama = await getLlama();
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        "../",
+        "models",
+        "hf_giladgd_Apertus-8B-Instruct-2509.Q6_K.gguf"
+    )
+});
+```
+#### Why Apertus-8B?
+Apertus-8B is a multilingual language model specifically trained to support over 1,000 languages, with 40% of its training data in non-English languages. This makes it an excellent choice for translation tasks because:
+1. **Massive Multilingual Coverage**: The model was trained on 15 trillion tokens across 1,811 natively supported languages, including underrepresented languages like Swiss German and Romansh
+2. **Larger Size**: With 8 billion parameters, it's larger than the intro.js example, providing better understanding and output quality
+3. **Translation-Focused Training**: The model was explicitly designed for applications including translation systems
+4. **Q6_K Quantization**: 6-bit quantization provides a good balance between quality and file size
+**Experiment suggestion**: Try swapping this model with others to compare translation quality! For example:
+- Use a smaller 3B model to see how size affects translation accuracy
+- Use a monolingual model to demonstrate why multilingual training matters
+- Use a general-purpose model without translation-specific training
+Read more about Apertus [arXiv](https://arxiv.org/abs/2509.14233)
+### 3. Create Context and Chat Session with System Prompt
+```javascript
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt: `Du bist ein erfahrener wissenschaftlicher Übersetzer...`
+});
+```
+**Key difference from intro.js**: The **systemPrompt**!
+#### What is a System Prompt?
+The system prompt defines the agent's role, behavior, and rules. It's like giving the AI a job description:
+```
+┌─────────────────────────────────────┐
+│       System Prompt                 │
+│  "You are a professional translator"│
+│  + Detailed instructions            │
+│  + Rules to follow                  │
+└─────────────────────────────────────┘
+         ↓
+    Affects every response
+```
+### 4. The System Prompt Breakdown
+The system prompt (in German) tells the model:
+**Role:**
+```
+"Du bist ein erfahrener wissenschaftlicher Übersetzer für technische Texte
+aus dem Englischen ins Deutsche."
+```
+Translation: "You are an experienced scientific translator for technical texts from English to German."
+**Task:**
+```
+"Deine Aufgabe: Erstelle eine inhaltlich exakte Übersetzung..."
+```
+Translation: "Your task: Create a content-accurate translation that maintains full meaning and technical precision."
+**Rules (Lines 33-41):**
+1. Preserve every technical statement exactly
+2. Use idiomatic, fluent German
+3. Avoid literal sentence structures
+4. Use correct terminology (e.g., "Multi-Agenten-System")
+5. Use German typography for numbers (e.g., "54 %")
+6. Adapt compound terms to German grammar
+7. Shorten overly complex sentences while preserving meaning
+8. Use neutral, scientific style
+**Critical Instruction (Line 48):**
+```
+"DO NOT add any addition text or explanation. ONLY respond with the translated text"
+```
+- Forces the model to return ONLY the translation
+- No "Here's the translation:" prefix
+- No explanations or commentary
+### 5. The Translation Query
+```javascript
+const q1 = `Translate this text into german:
+We address the long-horizon gap in large language model (LLM) agents by en-
+abling them to sustain coherent strategies in adversarial, stochastic environments.
+...
+`;
+```
+- Contains a scientific abstract about LLM agents (HexMachina paper)
+- Complex technical content with specialized terms
+- Tests the model's ability to:
+  - Understand technical AI/ML concepts
+  - Translate accurately
+  - Follow the detailed system prompt rules
+### 6. Execute Translation
+```javascript
+const a1 = await session.prompt(q1);
+console.log("AI: " + a1);
+```
+- Sends the translation request to the model
+- The model will:
+  1. Read the system prompt (its "role")
+  2. Read the user's request
+  3. Apply all the rules from the system prompt
+  4. Generate a German translation
+### 7. Cleanup
+```javascript
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()
+```
+- Same cleanup as intro.js
+- Always dispose resources when done
+## Key Concepts Demonstrated
+### 1. System Prompts for Specialization
+System prompts transform a general-purpose LLM into a specialized agent:
+```
+General LLM + System Prompt = Specialized Agent
+                              (Translator, Coder, Analyst, etc.)
+```
+### 2. Detailed Instructions Matter
+Compare these approaches:
+**❌ Minimal approach:**
+```javascript
+systemPrompt: "Translate to German"
+```
+**✅ This example (detailed):**
+```javascript
+systemPrompt: `
+  You are a professional translator
+  Follow these rules:
+  - Rule 1
+  - Rule 2
+  - Rule 3
+  ...
+`
+```
+The detailed approach gives much better, more consistent results.
+### 3. Constraining Output Format
+The line "DO NOT add any addition text" demonstrates output control:
+**Without constraint:**
+```
+AI: Here's the translation of the text you provided:
+[German text]
+I hope this helps! Let me know if you need anything else.
+```
+**With constraint:**
+```
+AI: [German text only]
+```
+## What Makes This an "Agent"?
+This is a **specialized agent** because:
+1. **Specific Role**: Has a defined purpose (translation)
+2. **Constrained Behavior**: Follows specific rules and guidelines
+3. **Consistent Output**: Produces predictable, formatted results
+4. **Domain Expertise**: Optimized for scientific/technical content
+## Expected Output
+When run, you'll see a German translation of the English abstract, following all the rules:
+- Proper German scientific style
+- Correct technical terminology
+- German number formatting (e.g., "54 %")
+- No extra commentary
+The quality depends on the model's training and size.
+## Experimentation Ideas
+1. **Try different models**:
+  - Swap Apertus-8B with a smaller model (3B) to see size impact
+  - Try a monolingual English model to demonstrate the importance of multilingual training
+  - Use models with different quantization levels (Q4, Q6, Q8) to compare quality vs. size
+2. **Modify the system prompt**:
+  - Remove specific rules one by one to see their impact
+  - Change the translation target language
+  - Adjust the style (formal vs. casual)
+3. **Test with different content**:
+  - Technical documentation
+  - Creative writing
+  - Business communications
+  - Simple vs. complex sentences
+Each experiment will help you understand how system prompts, model selection, and prompt engineering work together to create effective AI agents.

examples/03_translation/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,302 @@

+# Concept: System Prompts & Agent Specialization
+## Overview
+This example demonstrates how to transform a general-purpose LLM into a **specialized agent** using **system prompts**. The key insight: you don't need different models for different tasks—you need different instructions.
+## What is a System Prompt?
+A **system prompt** is a persistent instruction that shapes the AI's behavior for an entire conversation session.
+### Analogy
+Think of hiring someone for a job:
+```
+Without System Prompt          With System Prompt
+─────────────────────         ──────────────────────
+"Hi, I'm an AI."              "I'm a professional translator
+                               with expertise in scientific
+What do you want?"            German. I follow strict quality
+                              guidelines and output format."
+```
+## How System Prompts Work
+### The Context Structure
+```
+┌─────────────────────────────────────────────┐
+│           CONTEXT WINDOW                    │
+│                                             │
+│  ┌───────────────────────────────────────┐ │
+│  │  SYSTEM PROMPT (Always present)       │ │
+│  │  "You are a professional translator..." │
+│  │  "Follow these rules..."              │ │
+│  └───────────────────────────────────────┘ │
+│                    ↓                        │
+│  ┌───────────────────────────────────────┐ │
+│  │  USER MESSAGES                        │ │
+│  │  "Translate this text..."             │ │
+│  └───────────────────────────────────────┘ │
+│                    ↓                        │
+│  ┌───────────────────────────────────────┐ │
+│  │  AI RESPONSES                         │ │
+│  │  (Shaped by system prompt)            │ │
+│  └───────────────────────────────────────┘ │
+└─────────────────────────────────────────────┘
+```
+The system prompt sits at the top of the context and influences **every** response.
+## Agent Specialization Pattern
+### Transformation Flow
+```
+┌──────────────────┐    ┌─────────────────┐    ┌──────────────────┐
+│  General Model   │ +  │ System Prompt   │ =  │ Specialized Agent│
+│                  │    │                 │    │                  │
+│ • Knows many     │    │ • Define role   │    │ • Translation    │
+│   things         │    │ • Set rules     │    │   Agent          │
+│ • No specific    │    │ • Constrain     │    │ • Coding Agent   │
+│   role           │    │   output        │    │ • Analysis Agent │
+└──────────────────┘    └─────────────────┘    └──────────────────┘
+```
+### Example Specializations
+**Translation Agent (this example):**
+```
+System Prompt = Role + Rules + Output Format
+```
+**Code Assistant:**
+```javascript
+systemPrompt: "You are an expert programmer.
+Always provide working code with comments.
+Explain complex logic."
+```
+**Data Analyst:**
+```javascript
+systemPrompt: "You are a data analyst.
+Always show your calculations step-by-step.
+Cite data sources when available."
+```
+## Anatomy of an Effective System Prompt
+### The 5 Components
+```
+┌─────────────────────────────────────────┐
+│  1. ROLE DEFINITION                     │
+│  "You are a [specific role]..."         │
+├─────────────────────────────────────────┤
+│  2. TASK DESCRIPTION                    │
+│  "Your goal is to..."                   │
+├─────────────────────────────────────────┤
+│  3. BEHAVIORAL RULES                    │
+│  "Always do X, Never do Y..."           │
+├─────────────��───────────────────────────┤
+│  4. OUTPUT FORMAT                       │
+│  "Format your response as..."           │
+├─────────────────────────────────────────┤
+│  5. CONSTRAINTS                         │
+│  "Do NOT include..."                    │
+└─────────────────────────────────────────┘
+```
+### This Example's Structure
+```
+Role:        "Professional scientific translator"
+Task:        "Translate English to German with precision"
+Rules:       8 specific translation guidelines
+Format:      Idiomatic German, scientific style
+Constraints: "ONLY translated text, no explanation"
+```
+## Why Detailed System Prompts Matter
+### Comparison Study
+**Minimal System Prompt:**
+```javascript
+systemPrompt: "Translate to German"
+```
+**Result:**
+- May add unnecessary explanations
+- Inconsistent terminology
+- Mixed formality levels
+- Extra conversational text
+**Detailed System Prompt (this example):**
+```javascript
+systemPrompt: `You are a professional translator...
+- Rule 1: Preserve technical accuracy
+- Rule 2: Use idiomatic German
+- Rule 3: Follow scientific conventions
+...
+DO NOT add any explanations`
+```
+**Result:**
+- ✅ Consistent quality
+- ✅ Correct terminology
+- ✅ Proper formatting
+- ✅ Only translation output
+### Quality Impact
+```
+Detail Level          Output Quality
+───────────         ─────────────────
+Very minimal  →     Unpredictable
+Basic role    →     Somewhat consistent
+Detailed      →     Highly consistent ⭐
+Over-detailed →     May confuse model
+```
+## System Prompt Design Patterns
+### Pattern 1: Role-Playing
+```
+"You are a [profession] with expertise in [domain]..."
+```
+Makes the model adopt that perspective.
+### Pattern 2: Rule-Based
+```
+"Follow these rules:
+1. Always...
+2. Never...
+3. When X, do Y..."
+```
+Explicit constraints lead to predictable behavior.
+### Pattern 3: Output Formatting
+```
+"Format your response as:
+- JSON
+- Markdown
+- Plain text only
+- Step-by-step list"
+```
+Controls the structure of responses.
+### Pattern 4: Contextual Awareness
+```
+"You remember: [previous facts]
+You know that: [domain knowledge]
+Current situation: [context]"
+```
+Primes the model with relevant information.
+## How This Relates to AI Agents
+### Agent = Model + System Prompt + Tools
+```
+┌────────────────────────────────────────────┐
+│             AI Agent                       │
+│                                            │
+│  ┌──────────────────────────────────────┐ │
+│  │  System Prompt (Agent's "Identity")  │ │
+│  └──────────────────────────────────────┘ │
+│                  ↓                         │
+│  ┌──────────────────────────────────────┐ │
+│  │  LLM (Agent's "Brain")               │ │
+│  └──────────────────────────────────────┘ │
+│                  ↓                         │
+│  ┌──────────────────────────────────────┐ │
+│  │  Tools (Agent's "Hands") [Optional]  │ │
+│  └──────────────────────────────────────┘ │
+└────────────────────────────────────────────┘
+```
+**In this example:**
+- System Prompt: "You are a translator..."
+- LLM: Apertus-8B model
+- Tools: None (translation is done by the model itself)
+**In more complex agents:**
+- System Prompt: "You are a research assistant..."
+- LLM: Any model
+- Tools: Web search, calculator, file access, etc.
+## Practical Applications
+### 1. Domain Specialization
+```
+Medical → "You are a medical professional..."
+Legal → "You are a legal expert..."
+Technical → "You are an engineer..."
+```
+### 2. Output Control
+```
+JSON API → "Always respond in valid JSON"
+Markdown → "Format all responses as markdown"
+Code → "Only output executable code"
+```
+### 3. Behavioral Constraints
+```
+Concise → "Use maximum 2 sentences"
+Detailed → "Explain thoroughly with examples"
+Neutral → "Avoid opinions, state only facts"
+```
+### 4. Multi-Language Support
+```
+systemPrompt: `You are a multilingual assistant.
+Respond in the same language as the input.`
+```
+## Chat Wrappers Explained
+Different models need different conversation formats:
+```
+Model Type        Format Needed         Wrapper
+──────────────   ───────────────────   ─────────────────
+Llama 2/3        Llama format          LlamaChatWrapper
+GPT-style        ChatML format         ChatMLWrapper
+Harmony models   Harmony format        HarmonyChatWrapper
+```
+**What they do:**
+```
+Your Message → [Chat Wrapper] → Formatted Prompt → Model
+                    ↓
+          Adds special tokens:
+          <|system|>, <|user|>, <|assistant|>
+```
+The wrapper ensures the model understands which part is the system prompt, which is the user message, etc.
+## Key Takeaways
+1. **System prompts are powerful**: They fundamentally change how the model behaves
+2. **Detailed is better**: More specific instructions = more consistent results
+3. **Structure matters**: Role + Rules + Format + Constraints
+4. **No retraining needed**: Same model, different behaviors
+5. **Foundation for agents**: System prompts are the first step in building specialized agents
+## Evolution Path
+```
+1. Basic Prompting           (intro.js)
+       ↓
+2. System Prompts            (translation.js) ← You are here
+       ↓
+3. System Prompts + Tools    (simple-agent.js)
+       ↓
+4. Multi-turn reasoning      (react-agent.js)
+       ↓
+5. Full Agent Systems
+```
+This example bridges the gap between basic LLM usage and true agent behavior by showing how to specialize through instructions.

examples/03_translation/translation.js ADDED Viewed

	@@ -0,0 +1,82 @@

+import {
+    getLlama,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const llama = await getLlama({
+    logLevel: 'error'
+});
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'hf_giladgd_Apertus-8B-Instruct-2509.Q6_K.gguf'
+    )
+});
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt: `Du bist ein erfahrener wissenschaftlicher Übersetzer für technische Texte aus dem Englischen ins
+    Deutsche.
+    Deine Aufgabe: Erstelle eine inhaltlich exakte Übersetzung, die den vollen Sinn und die technische Präzision
+    des Originaltexts erhält.
+    Gleichzeitig soll die Übersetzung klar, natürlich und leicht lesbar auf Deutsch klingen – also so, wie ein
+    deutscher Wissenschaftler oder Ingenieur denselben Text schreiben würde.
+    Befolge diese Regeln:
+    Bewahre jede fachliche Aussage und Nuance exakt. Kein Inhalt darf verloren gehen oder verändert werden.
+    Verwende idiomatisches, flüssiges Deutsch, wie es in wissenschaftlichen Abstracts (z. B. NeurIPS, ICLR, AAAI) üblich ist.
+    Vermeide wörtliche Satzstrukturen. Formuliere so, wie ein deutscher Wissenschaftler denselben Inhalt selbst schreiben würde.
+    Verwende korrekte Terminologie (z. B. Multi-Agenten-System, Adapterlayer, Baseline, Strategieverbesserung).
+    Verwende bei Zahlen, Einheiten und Prozentangaben deutsche Typografie (z. B. „54 %“, „3 m“, „2 000“).
+    Passe zusammengesetzte Begriffe an die deutsche Grammatik an (z. B. „kontinuierlich lernendes System“ statt „kontinuierliches Lernen System“).
+    Kürze lange oder verschachtelte Sätze behutsam, ohne Bedeutung zu verändern, um Lesbarkeit zu verbessern.
+    Verwende einen neutralen, wissenschaftlichen Stil, ohne Werbesprache oder unnötige Ausschmückung.
+    Zusatzinstruktion:
+    Wenn der Originaltext englische Satzlogik enthält, restrukturiere den Satz so, dass er auf Deutsch elegant und klar klingt, aber denselben Inhalt vermittelt.
+    Zielqualität: Eine Übersetzung, die sich wie ein Originaltext liest – technisch präzise, flüssig und grammatikalisch einwandfrei.
+    DO NOT add any addition text or explanation. ONLY respond with the translated text
+    `
+});
+const q1 = `Translate this text into german:
+We address the long-horizon gap in large language model (LLM) agents by en-
+abling them to sustain coherent strategies in adversarial, stochastic environments.
+Settlers of Catan provides a challenging benchmark: success depends on balanc-
+ing short- and long-term goals amid randomness, trading, expansion, and block-
+ing. Prompt-centric LLM agents (e.g., ReAct, Reflexion) must re-interpret large,
+evolving game states each turn, quickly saturating context windows and losing
+strategic consistency. We propose HexMachina, a continual learning multi-agent
+system that separates environment discovery (inducing an adapter layer without
+documentation) from strategy improvement (evolving a compiled player through
+code refinement and simulation). This design preserves executable artifacts, al-
+lowing the LLM to focus on high-level strategy rather than per-turn reasoning. In
+controlled Catanatron experiments, HexMachina learns from scratch and evolves
+players that outperform the strongest human-crafted baseline (AlphaBeta), achiev-
+ing a 54% win rate and surpassing prompt-driven and no-discovery baselines. Ab-
+lations confirm that isolating pure strategy learning improves performance. Over-
+all, artifact-centric continual learning transforms LLMs from brittle stepwise de-
+ciders into stable strategy designers, advancing long-horizon autonomy.
+`;
+console.log('Translation started...')
+const a1 = await session.prompt(q1);
+console.log("AI: " + a1);
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()

examples/04_think/CODE.md ADDED Viewed

	@@ -0,0 +1,257 @@

+# Code Explanation: think.js
+This file demonstrates using system prompts for **logical reasoning** and **quantitative problem-solving**, showing how to configure an LLM as a specialized reasoning agent.
+## Step-by-Step Code Breakdown
+### 1. Import and Setup (Lines 1-8)
+```javascript
+import {
+    getLlama,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+```
+- Standard imports for LLM interaction
+- Path setup for locating the model file
+### 2. Initialize and Load Model (Lines 10-18)
+```javascript
+const llama = await getLlama();
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        "../",
+        "models",
+        "Qwen3-1.7B-Q6_K.gguf"
+    )
+});
+```
+- Uses **Qwen3-1.7B-Q6_K**: A 1.7B parameter model with 6-bit quantization
+- Smaller than the translation example (1.7B vs 8B parameters)
+- Q6_K quantization provides a balance between size and quality
+### 3. Define the System Prompt (Lines 19-24)
+```javascript
+const systemPrompt = `You are an expert logical and quantitative reasoner.
+    Your goal is to analyze real-world word problems involving families, quantities, averages, and relationships
+    between entities, and compute the exact numeric answer.
+    Goal: Return the correct final number as a single value — no explanation, no reasoning steps, just the answer.
+    `
+```
+**Key elements:**
+1. **Role**: "expert logical and quantitative reasoner"
+   - Sets expectations for mathematical/analytical thinking
+2. **Task Scope**: "real-world word problems involving families, quantities, averages, and relationships"
+   - Tells the model what type of problems to expect
+   - Primes it for complex counting and calculation tasks
+3. **Output Constraint**: "Return the correct final number as a single value — no explanation"
+   - Forces concise output
+   - Just the answer, not the work
+### Why This System Prompt Design?
+The prompt is designed for the specific problem type:
+- Word problems with complex family relationships
+- Multiple nested conditions
+- Requires careful tracking of people and quantities
+- Needs arithmetic calculation
+### 4. Create Context and Session (Lines 25-29)
+```javascript
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt
+});
+```
+- Creates context for the conversation
+- Initializes session with the reasoning system prompt
+- No chat wrapper needed (using model's default format)
+### 5. The Complex Word Problem (Lines 31-40)
+```javascript
+const prompt = `My family reunion is this week, and I was assigned the mashed potatoes to bring.
+The attendees include my married mother and father, my twin brother and his family, my aunt and her family, my grandma
+and her brother, her brother's daughter, and his daughter's family. All the adults but me have been married, and no one
+is divorced or remarried, but my grandpa and my grandma's sister-in-law passed away last year. All living spouses are attending.
+My brother has two children that are still kids, my aunt has one six-year-old, and my grandma's brother's daughter has
+three kids under 12. I figure each adult will eat about 1.5 potatoes and each kid will eat about 1/2 a potato, except my
+second cousins don't eat carbs. The average potato is about half a pound, and potatoes are sold in 5-pound bags.
+How many whole bags of potatoes do I need?
+`;
+```
+**This is intentionally complex to test reasoning:**
+**People to count:**
+- Speaker (1)
+- Mother and father (2)
+- Twin brother + spouse (2)
+- Brother's 2 kids (2)
+- Aunt + spouse (2)
+- Aunt's 1 kid (1)
+- Grandma (1)
+- Grandma's brother + spouse (2)
+- Brother's daughter + spouse (2)
+- Their 3 kids (3, but don't eat carbs)
+**Calculations needed:**
+1. Count total adults
+2. Count total kids
+3. Subtract non-eating kids
+4. Calculate potato needs: (adults × 1.5) + (eating kids × 0.5)
+5. Convert to pounds: total potatoes × 0.5 lbs
+6. Convert to bags: pounds ÷ 5, round up
+**The complexity:**
+- Family relationships (who's married to whom)
+- Deceased people (subtract from count)
+- Special dietary needs (second cousins don't eat carbs)
+- Unit conversions (potatoes → pounds → bags)
+### 6. Execute and Display (Lines 42-43)
+```javascript
+const answer = await session.prompt(prompt);
+console.log(`AI: ${answer}`);
+```
+- Sends the complex problem to the model
+- The model uses its reasoning abilities to work through the problem
+- Outputs just the final number (based on system prompt)
+### 7. Cleanup (Lines 45-48)
+```javascript
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()
+```
+- Standard resource cleanup
+## Key Concepts Demonstrated
+### 1. Reasoning Agent Configuration
+This shows how to configure an LLM for analytical thinking:
+```
+System Prompt → LLM becomes a "reasoning engine"
+```
+Instead of conversational AI, we get:
+- Focused analytical processing
+- Mathematical computation
+- Logical deduction
+### 2. Output Format Control
+Compare these approaches:
+**Without constraint:**
+```
+AI: Let me work through this step by step.
+First, I'll count the adults...
+[lengthy explanation]
+So the answer is 3 bags.
+```
+**With constraint (this example):**
+```
+AI: 3
+```
+### 3. Problem Complexity Testing
+This example tests the model's ability to:
+- Parse complex natural language
+- Track multiple entities and relationships
+- Apply arithmetic operations
+- Handle edge cases (deceased people, dietary restrictions)
+- Perform unit conversions
+### 4. Specialized Task Agents
+This demonstrates creating task-specific agents:
+```
+General LLM + "Reasoning Agent" System Prompt = Math Problem Solver
+```
+Same pattern works for:
+- Logic puzzles
+- Data analysis
+- Scientific calculations
+- Statistical reasoning
+## Challenges & Limitations
+### 1. Model Size Matters
+The 1.7B parameter model may struggle with:
+- Very complex counting problems
+- Multi-step reasoning requiring working memory
+- Edge cases in the problem
+Larger models (7B, 13B+) generally perform better on reasoning tasks.
+### 2. Hidden Reasoning
+The system prompt asks for "just the answer," so we don't see:
+- The model's reasoning process
+- Where it might have made mistakes
+- Its confidence level
+### 3. No Tool Use
+The model must do all calculations "in its head" without:
+- A calculator
+- Note-taking
+- Step-by-step verification
+Later examples (like react-agent) address this by giving the model tools.
+## Why This Matters for AI Agents
+### Reasoning is Fundamental
+All useful agents need reasoning capabilities:
+- **Planning agents**: Reason about sequences of actions
+- **Research agents**: Analyze and synthesize information
+- **Decision agents**: Evaluate options and consequences
+### System Prompt Shapes Behavior
+This example shows that the same model can behave differently based on instructions:
+- Translator agent (previous example)
+- Reasoning agent (this example)
+- Code agent (later examples)
+### Foundation for Complex Agents
+Understanding how to prompt for reasoning is essential before adding:
+- Tools (giving the model a calculator)
+- Memory (remembering previous calculations)
+- Multi-step processes (ReAct pattern)
+## Expected Output
+Running this script should output something like:
+```
+AI: 3
+```
+The exact answer depends on the model's ability to:
+- Correctly count all family members
+- Apply the eating rates
+- Convert units
+- Round up for whole bags
+## Improving This Approach
+To get better reasoning:
+1. **Use larger models**: 7B+ parameters
+2. **Add step-by-step prompting**: "Show your work"
+3. **Provide tools**: Give the model a calculator
+4. **Use chain-of-thought**: Encourage explicit reasoning
+5. **Verify answers**: Run multiple times or use multiple models
+The react-agent example demonstrates some of these improvements.

examples/04_think/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,368 @@

+# Concept: Reasoning & Problem-Solving Agents
+## Overview
+This example demonstrates how to configure an LLM as a **reasoning agent** capable of analytical thinking and quantitative problem-solving. It shows the bridge between simple text generation and complex cognitive tasks.
+## What is a Reasoning Agent?
+A **reasoning agent** is an LLM configured to perform logical analysis, mathematical computation, and multi-step problem-solving through careful system prompt design.
+### Human Analogy
+```
+Regular Chat                    Reasoning Agent
+─────────────                  ──────────────────
+"Can you help me?"            "I am a mathematician.
+"Sure! What do you need?"     I analyze problems methodically
+                              and compute exact answers."
+```
+## The Reasoning Challenge
+### Why Reasoning is Hard for LLMs
+LLMs are trained on text prediction, not explicit reasoning:
+```
+┌───────────────────────────────────────┐
+│  LLM Training                         │
+│  "Predict next word in text"         │
+│                                       │
+│  NOT explicitly trained for:         │
+│  • Step-by-step logic                │
+│  • Arithmetic computation            │
+│  • Tracking multiple variables       │
+│  • Systematic problem decomposition  │
+└───────────────────────────────────────┘
+```
+However, they can learn reasoning patterns from training data and be guided by system prompts.
+## Reasoning Through System Prompts
+### Configuration Pattern
+```
+┌─────────────────────────────────────────┐
+│  System Prompt Components              │
+├─────────────────────────────────────────┤
+│  1. Role: "Expert reasoner"            │
+│  2. Task: "Analyze and solve problems" │
+│  3. Method: "Compute exact answers"    │
+│  4. Output: "Single numeric value"     │
+└─────────────────────────────────────────┘
+         ↓
+   Reasoning Behavior
+```
+### Types of Reasoning Tasks
+**Quantitative Reasoning (this example):**
+```
+Problem → Count entities → Calculate → Convert units → Answer
+```
+**Logical Reasoning:**
+```
+Premises → Apply rules → Deduce conclusions → Answer
+```
+**Analytical Reasoning:**
+```
+Data → Identify patterns → Form hypothesis → Conclude
+```
+## How LLMs "Reason"
+### Pattern Matching vs. True Reasoning
+LLMs don't reason like humans, but they can:
+```
+┌─────────────────────────────────────────────┐
+│  What LLMs Actually Do                      │
+│                                             │
+│  1. Pattern Recognition                     │
+│     "This looks like a counting problem"    │
+│                                             │
+│  2. Template Application                    │
+│     "Similar problems follow this pattern"  │
+│                                             │
+│  3. Statistical Inference                   │
+│     "These numbers likely combine this way" │
+│                                             │
+│  4. Learned Procedures                      │
+│     "I've seen this type of calculation"    │
+└─────────────────────────────────────────────┘
+```
+### The Reasoning Process
+```
+Input: Complex Word Problem
+         ↓
+    ┌────────────┐
+    │   Parse    │  Identify entities and relationships
+    └────────────┘
+         ↓
+    ┌────────────┐
+    │  Decompose │  Break into sub-problems
+    └────────────┘
+         ↓
+    ┌────────────┐
+    │  Calculate │  Apply arithmetic operations
+    └────────────┘
+         ↓
+    ┌────────────┐
+    │  Synthesize│  Combine results
+    └────────────┘
+         ↓
+     Final Answer
+```
+## Problem Complexity Hierarchy
+### Levels of Reasoning Difficulty
+```
+Easy                                        Hard
+│                                             │
+│  Simple    Multi-step   Nested    Implicit │
+│  Arithmetic  Logic    Conditions  Reasoning│
+│                                             │
+└─────────────────────────────────────────────┘
+Examples:
+Easy:    "What is 5 + 3?"
+Medium:  "If 3 apples cost $2 each, what's the total?"
+Hard:    "Count family members with complex relationships"
+```
+### This Example's Complexity
+The potato problem is **highly complex**:
+```
+┌─────────────────────────────────────────┐
+│  Complexity Factors                     │
+├─────────────────────────────────────────┤
+│  ✓ Multiple entities (15+ people)      │
+│  ✓ Relationship reasoning (family tree)│
+│  ✓ Conditional logic (if married then..)│
+│  ✓ Negative conditions (deceased people)│
+│  ✓ Special cases (dietary restrictions)│
+│  ✓ Multiple calculations                │
+│  ✓ Unit conversions                     │
+└─────────────────────────────────────────┘
+```
+## Limitations of Pure LLM Reasoning
+### Why This Approach Has Issues
+```
+┌────────────────────────────────────┐
+│  Problem: No External Tools        │
+│                                    │
+│  LLM must hold everything in       │
+│  "mental" context:                 │
+│  • All entity counts               │
+│  • Intermediate calculations       │
+│  • Conversion factors              │
+│  • Final arithmetic                │
+│                                    │
+│  Result: Prone to errors           │
+└────────────────────────────────────┘
+```
+### Common Failure Modes
+**1. Counting Errors:**
+```
+Problem: "Count 15 people with complex relationships"
+LLM: "14" or "16" (off by one)
+```
+**2. Arithmetic Mistakes:**
+```
+Problem: "13 adults × 1.5 + 3 kids × 0.5"
+LLM: May get intermediate steps wrong
+```
+**3. Lost Context:**
+```
+Problem: Multi-step with many facts
+LLM: Forgets earlier information
+```
+## Improving Reasoning: Evolution Path
+### Level 1: Pure Prompting (This Example)
+```
+User → LLM → Answer
+       ↑
+   System Prompt
+```
+**Limitations:**
+- All reasoning internal to LLM
+- No verification
+- No tools
+- Hidden process
+### Level 2: Chain-of-Thought
+```
+User → LLM → Show Work → Answer
+       ↑
+   "Explain your reasoning"
+```
+**Improvements:**
+- Visible reasoning steps
+- Can catch some errors
+- Still no tools
+### Level 3: Tool-Augmented (simple-agent)
+```
+User → LLM ⟷ Tools → Answer
+       ↑    (Calculator)
+   System Prompt
+```
+**Improvements:**
+- External computation
+- Reduced errors
+- Verifiable steps
+### Level 4: ReAct Pattern (react-agent)
+```
+User → LLM → Think → Act → Observe
+       ↑      ↓      ↓      ↓
+   System  Reason  Tool   Result
+   Prompt         Use
+       ↑           ↓       ↓
+       └───────────Iterate──┘
+```
+**Best approach:**
+- Explicit reasoning loop
+- Tool use at each step
+- Self-correction possible
+## System Prompt Design for Reasoning
+### Key Elements
+**1. Role Definition:**
+```
+"You are an expert logical and quantitative reasoner"
+```
+Sets the mental framework.
+**2. Task Specification:**
+```
+"Analyze real-world word problems involving..."
+```
+Defines the problem domain.
+**3. Output Format:**
+```
+"Return the correct final number as a single value"
+```
+Controls response structure.
+### Design Patterns
+**Pattern A: Direct Answer (This Example)**
+```
+Prompt: [Problem]
+Output: [Number]
+```
+Pros: Concise, fast
+Cons: No insight into reasoning
+**Pattern B: Show Work**
+```
+Prompt: [Problem] "Show your steps"
+Output: Step 1: ... Step 2: ... Answer: [Number]
+```
+Pros: Transparent, debuggable
+Cons: Longer, may still have errors
+**Pattern C: Self-Verification**
+```
+Prompt: [Problem] "Solve, then verify"
+Output: Solution + Verification + Final Answer
+```
+Pros: More reliable
+Cons: Slower, uses more tokens
+## Real-World Applications
+### Use Cases for Reasoning Agents
+**1. Data Analysis:**
+```
+Input: Dataset summary
+Task: Compute statistics, identify trends
+Output: Numerical insights
+```
+**2. Planning:**
+```
+Input: Goal + constraints
+Task: Reason about optimal sequence
+Output: Action plan
+```
+**3. Decision Support:**
+```
+Input: Options + criteria
+Task: Evaluate and compare
+Output: Recommended choice
+```
+**4. Problem Solving:**
+```
+Input: Complex scenario
+Task: Break down and solve
+Output: Solution
+```
+## Comparison: Different Agent Types
+```
+                  Reasoning  Tools  Memory  Multi-turn
+                  ─────────  ─────  ──────  ──────────
+intro.js              ✗        ✗      ✗        ✗
+translation.js        ~        ✗      ✗        ✗
+think.js (here)       ✓        ✗      ✗        ✗
+simple-agent.js       ✓        ✓      ✗        ~
+memory-agent.js       ✓        ✓      ✓        ✓
+react-agent.js        ✓✓       ✓      ~        ✓
+```
+Legend:
+- ✗ = Not present
+- ~ = Limited/implicit
+- ✓ = Present
+- ✓✓ = Advanced/explicit
+## Key Takeaways
+1. **System prompts enable reasoning**: Proper configuration transforms an LLM into a reasoning agent
+2. **Limitations exist**: Pure LLM reasoning is prone to errors on complex problems
+3. **Tools help**: External computation (calculators, etc.) improves accuracy
+4. **Iteration matters**: Multi-step reasoning patterns (like ReAct) work better
+5. **Transparency is valuable**: Seeing the reasoning process helps debug and verify
+## Next Steps
+After understanding basic reasoning:
+- **Add tools**: Let the agent use calculators, databases, APIs
+- **Implement verification**: Check answers, retry on errors
+- **Use chain-of-thought**: Make reasoning explicit
+- **Apply ReAct pattern**: Combine reasoning and tool use systematically
+This example is the foundation for more sophisticated agent architectures that combine reasoning with external capabilities.

examples/04_think/think.js ADDED Viewed

	@@ -0,0 +1,49 @@

+import {
+    getLlama,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const llama = await getLlama();
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'Qwen3-1.7B-Q8_0.gguf'
+    )
+});
+const systemPrompt = `You are an expert logical and quantitative reasoner.
+    Your goal is to analyze real-world word problems involving families, quantities, averages, and relationships
+    between entities, and compute the exact numeric answer.
+    Goal: Return the correct final number as a single value — no explanation, no reasoning steps, just the answer.
+    `
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt
+});
+const prompt = `My family reunion is this week, and I was assigned the mashed potatoes to bring.
+The attendees include my married mother and father, my twin brother and his family, my aunt and her family, my grandma
+and her brother, her brother's daughter, and his daughter's family. All the adults but me have been married, and no one
+is divorced or remarried, but my grandpa and my grandma's sister-in-law passed away last year. All living spouses are attending.
+My brother has two children that are still kids, my aunt has one six-year-old, and my grandma's brother's daughter has
+three kids under 12. I figure each adult will eat about 1.5 potatoes and each kid will eat about 1/2 a potato, except my
+second cousins don't eat carbs. The average potato is about half a pound, and potatoes are sold in 5-pound bags.
+How many whole bags of potatoes do I need?
+`;
+const answer = await session.prompt(prompt);
+console.log(`AI: ${answer}`);
+llama.dispose()
+model.dispose()
+context.dispose()
+session.dispose()

examples/05_batch/CODE.md ADDED Viewed

	@@ -0,0 +1,323 @@

+# Code Explanation: batch.js
+This file demonstrates **parallel execution** of multiple LLM prompts using separate context sequences, enabling concurrent processing for better performance.
+## Step-by-Step Code Breakdown
+### 1. Import and Setup (Lines 1-10)
+```javascript
+import {getLlama, LlamaChatSession} from "node-llama-cpp";
+import path from "path";
+import {fileURLToPath} from "url";
+/**
+ * Asynchronous execution improves performance in GAIA benchmarks,
+ * multi-agent applications, and other high-throughput scenarios.
+ */
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+```
+- Standard imports for LLM interaction
+- Comment explains the performance benefit
+- **GAIA benchmark**: A standard for testing AI agent performance
+- Useful for multi-agent systems that need to handle many requests
+### 2. Model Path Configuration (Lines 11-16)
+```javascript
+const modelPath = path.join(
+    __dirname,
+    "../",
+    "models",
+    "DeepSeek-R1-0528-Qwen3-8B-Q6_K.gguf"
+)
+```
+- Uses **DeepSeek-R1**: An 8B parameter model optimized for reasoning
+- **Q6_K quantization**: Balance between quality and size
+- Model is loaded once and shared between sequences
+### 3. Initialize Llama and Load Model (Lines 18-19)
+```javascript
+const llama = await getLlama();
+const model = await llama.loadModel({modelPath});
+```
+- Standard initialization
+- Model is loaded into memory once
+- Will be used by multiple sequences simultaneously
+### 4. Create Context with Multiple Sequences (Lines 20-23)
+```javascript
+const context = await model.createContext({
+    sequences: 2,
+    batchSize: 1024 // The number of tokens that can be processed at once by the GPU.
+});
+```
+**Key parameters:**
+- **sequences: 2**: Creates 2 independent conversation sequences
+  - Each sequence has its own conversation history
+  - Both share the same model and context memory pool
+  - Can be processed in parallel
+- **batchSize: 1024**: Maximum tokens processed per GPU batch
+  - Larger = better GPU utilization
+  - Smaller = lower memory usage
+  - 1024 is a good balance for most GPUs
+### Why Multiple Sequences?
+```
+Single Sequence (Sequential)     Multiple Sequences (Parallel)
+─────────────────────────       ──────────────────────────────
+Process Prompt 1 → Response 1    Process Prompt 1 ──┐
+Wait...                                              ├→ Both responses
+Process Prompt 2 → Response 2    Process Prompt 2 ──┘   in parallel!
+Total Time: T1 + T2              Total Time: max(T1, T2)
+```
+### 5. Get Individual Sequences (Lines 25-26)
+```javascript
+const sequence1 = context.getSequence();
+const sequence2 = context.getSequence();
+```
+- Retrieves two separate sequence objects from the context
+- Each sequence maintains its own state
+- They can be used independently for different conversations
+### 6. Create Separate Sessions (Lines 28-33)
+```javascript
+const session1 = new LlamaChatSession({
+    contextSequence: sequence1
+});
+const session2 = new LlamaChatSession({
+    contextSequence: sequence2
+});
+```
+- Creates a chat session for each sequence
+- Each session has its own conversation history
+- Sessions are completely independent
+- No system prompts in this example (could be added)
+### 7. Define Questions (Lines 35-36)
+```javascript
+const q1 = "Hi there, how are you?";
+const q2 = "How much is 6+6?";
+```
+- Two completely different questions
+- Will be processed simultaneously
+- Different types: conversational vs. computational
+### 8. Parallel Execution with Promise.all (Lines 38-44)
+```javascript
+const [
+    a1,
+    a2
+] = await Promise.all([
+    session1.prompt(q1),
+    session2.prompt(q2)
+]);
+```
+**How this works:**
+1. `session1.prompt(q1)` starts asynchronously
+2. `session2.prompt(q2)` starts asynchronously (doesn't wait for #1)
+3. `Promise.all()` waits for BOTH to complete
+4. Returns results in array: [response1, response2]
+5. Destructures into `a1` and `a2`
+**Key benefit**: Both prompts are processed at the same time, not one after another!
+### 9. Display Results (Lines 46-50)
+```javascript
+console.log("User: " + q1);
+console.log("AI: " + a1);
+console.log("User: " + q2);
+console.log("AI: " + a2);
+```
+- Outputs both question-answer pairs
+- Results appear in order despite parallel processing
+## Key Concepts Demonstrated
+### 1. Parallel Processing
+Instead of:
+```javascript
+// Sequential (slow)
+const a1 = await session1.prompt(q1);  // Wait
+const a2 = await session2.prompt(q2);  // Wait again
+```
+We use:
+```javascript
+// Parallel (fast)
+const [a1, a2] = await Promise.all([
+    session1.prompt(q1),
+    session2.prompt(q2)
+]);
+```
+### 2. Context Sequences
+A context can hold multiple independent sequences:
+```
+┌─────────────────────────────────────┐
+│          Context (Shared)           │
+│  ┌───────────────────────────────┐  │
+│  │  Model Weights (8B params)    │  │
+│  └───────────────────────────────┘  │
+│                                     │
+│  ┌─────────────┐  ┌─────────────┐  │
+│  │ Sequence 1  │  │ Sequence 2  │  │
+│  │ "Hi there"  │  │ "6+6?"      │  │
+│  │ History...  │  │ History...  │  │
+│  └─────────────┘  └─────────────┘  │
+└─────────────────────────────────────┘
+```
+## Performance Comparison
+### Sequential Execution
+```
+Request 1: 2 seconds
+Request 2: 2 seconds
+Total: 4 seconds
+```
+### Parallel Execution (This Example)
+```
+Request 1: 2 seconds ──┐
+Request 2: 2 seconds ──┤ Both running
+Total: ~2 seconds      └─ simultaneously
+```
+**Speedup**: ~2x for 2 sequences, scales with more sequences
+## Use Cases
+### 1. Multi-User Applications
+```javascript
+// Handle multiple users simultaneously
+const [user1Response, user2Response, user3Response] = await Promise.all([
+    session1.prompt(user1Query),
+    session2.prompt(user2Query),
+    session3.prompt(user3Query)
+]);
+```
+### 2. Multi-Agent Systems
+```javascript
+// Multiple agents working on different tasks
+const [
+    plannerResponse,
+    analyzerResponse,
+    executorResponse
+] = await Promise.all([
+    plannerSession.prompt("Plan the task"),
+    analyzerSession.prompt("Analyze the data"),
+    executorSession.prompt("Execute step 1")
+]);
+```
+### 3. Benchmarking
+```javascript
+// Test multiple prompts for evaluation
+const results = await Promise.all(
+    testPrompts.map(prompt => session.prompt(prompt))
+);
+```
+### 4. A/B Testing
+```javascript
+// Test different system prompts
+const [responseA, responseB] = await Promise.all([
+    sessionWithPromptA.prompt(query),
+    sessionWithPromptB.prompt(query)
+]);
+```
+## Resource Considerations
+### Memory Usage
+Each sequence needs memory for:
+- Conversation history
+- Intermediate computations
+- KV cache (key-value cache for transformer attention)
+**Rule of thumb**: More sequences = more memory needed
+### GPU Utilization
+- **Single sequence**: May underutilize GPU
+- **Multiple sequences**: Better GPU utilization
+- **Too many sequences**: May exceed VRAM, causing slowdown
+### Optimal Number of Sequences
+Depends on:
+- Available VRAM
+- Model size
+- Context length
+- Batch size
+**Typical**: 2-8 sequences for consumer GPUs
+## Limitations & Considerations
+### 1. Shared Context Limit
+All sequences share the same context memory pool:
+```
+Total context size: 8192 tokens
+Sequence 1: 4096 tokens
+Sequence 2: 4096 tokens
+Maximum distribution!
+```
+### 2. Not True Parallelism for CPU
+On CPU-only systems, sequences are interleaved, not truly parallel. Still provides better overall throughput.
+### 3. Model Loading Overhead
+The model is loaded once and shared, which is efficient. But initial loading still takes time.
+## Why This Matters for AI Agents
+### Efficiency in Production
+Real-world agent systems need to:
+- Handle multiple requests concurrently
+- Respond quickly to users
+- Make efficient use of hardware
+### Multi-Agent Architectures
+Complex agent systems often have:
+- **Planner agent**: Thinks about strategy
+- **Executor agent**: Takes actions
+- **Critic agent**: Evaluates results
+These can run in parallel using separate sequences.
+### Scalability
+This pattern is the foundation for:
+- Web services with multiple users
+- Batch processing of data
+- Distributed agent systems
+## Best Practices
+1. **Match sequences to workload**: Don't create more than you need
+2. **Monitor memory usage**: Each sequence consumes VRAM
+3. **Use appropriate batch size**: Balance speed vs. memory
+4. **Clean up resources**: Always dispose when done
+5. **Handle errors**: Wrap Promise.all in try-catch
+## Expected Output
+Running this script should output something like:
+```
+User: Hi there, how are you?
+AI: Hello! I'm doing well, thank you for asking...
+User: How much is 6+6?
+AI: 12
+```
+Both responses appear quickly because they were processed simultaneously!

examples/05_batch/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,365 @@

+# Concept: Parallel Processing & Performance Optimization
+## Overview
+This example demonstrates **concurrent execution** of multiple LLM requests using separate context sequences, a critical technique for building scalable AI agent systems.
+## The Performance Problem
+### Sequential Processing (Slow)
+Traditional approach processes one request at a time:
+```
+Request 1 ────────→ Response 1 (2s)
+                        ↓
+                    Request 2 ────────→ Response 2 (2s)
+                                            ↓
+                                        Total: 4 seconds
+```
+### Parallel Processing (Fast)
+This example processes multiple requests simultaneously:
+```
+Request 1 ────────→ Response 1 (2s) ──┐
+                                       ├→ Total: 2 seconds
+Request 2 ────────→ Response 2 (2s) ──┘
+     (Both running at the same time)
+```
+**Performance gain: 2x speedup!**
+## Core Concept: Context Sequences
+### Single vs. Multiple Sequences
+```
+┌────────────────────────────────────────────────┐
+│              Model (Loaded Once)               │
+├────────────────────────────────────────────────┤
+│                   Context                      │
+│  ┌──────────────┐          ┌──────────────┐   │
+│  │  Sequence 1  │          │  Sequence 2  │   │
+│  │              │          │              │   │
+│  │ Conversation │          │ Conversation │   │
+│  │  History A   │          │  History B   │   │
+│  └──────────────┘          └──────────────┘   │
+└────────────────────────────────────────────────┘
+```
+**Key insights:**
+- Model weights are shared (memory efficient)
+- Each sequence has independent history
+- Sequences can process in parallel
+- Both use the same underlying model
+## How Parallel Processing Works
+### Promise.all Pattern
+JavaScript's `Promise.all()` enables concurrent execution:
+```
+Sequential:
+────────────────────────────────────
+await fn1();  // Wait 2s
+await fn2();  // Wait 2s more
+Total: 4s
+Parallel:
+────────────────────────────────────
+await Promise.all([
+    fn1(),    // Start immediately
+    fn2()     // Start immediately (don't wait!)
+]);
+Total: 2s (whichever finishes last)
+```
+### Execution Timeline
+```
+Time →  0s      1s      2s      3s      4s
+        │       │       │       │       │
+Seq 1:  ├───────Processing───────┤
+        │                        └─ Response 1
+        │
+Seq 2:  ├───────Processing───────┤
+                                 └─ Response 2
+        Both complete at ~2s instead of 4s!
+```
+## GPU Batch Processing
+### Why Batching Matters
+Modern GPUs process multiple operations efficiently:
+```
+Without Batching (Inefficient)
+──────────────────────────────
+GPU: [Token 1] ... wait ...
+GPU: [Token 2] ... wait ...
+GPU: [Token 3] ... wait ...
+     └─ GPU underutilized
+With Batching (Efficient)
+─────────────────────────
+GPU: [Tokens 1-1024]  ← Full batch
+     └─ GPU fully utilized!
+```
+**batchSize parameter**: Controls how many tokens process together.
+### Trade-offs
+```
+Small Batch (e.g., 128)     Large Batch (e.g., 2048)
+───────────────────────     ────────────────────────
+✓ Lower memory              ✓ Better GPU utilization
+✓ More flexible             ✓ Faster throughput
+✗ Slower throughput         ✗ Higher memory usage
+✗ GPU underutilized         ✗ May exceed VRAM
+```
+**Sweet spot**: Usually 512-1024 for consumer GPUs.
+## Architecture Patterns
+### Pattern 1: Multi-User Service
+```
+┌─────────┐  ┌─────────┐  ┌─────────┐
+│ User A  │  │ User B  │  │ User C  │
+└────┬────┘  └────┬────┘  └────┬────┘
+     │            │            │
+     └────────────┼��───────────┘
+                  ↓
+         ┌────────────────┐
+         │  Load Balancer │
+         └────────────────┘
+                  ↓
+     ┌────────────┼────────────┐
+     ↓            ↓            ↓
+┌─────────┐  ┌─────────┐  ┌─────────┐
+│  Seq 1  │  │  Seq 2  │  │  Seq 3  │
+└─────────┘  └─────────┘  └─────────┘
+     └────────────┼────────────┘
+                  ↓
+         ┌────────────────┐
+         │  Shared Model  │
+         └────────────────┘
+```
+### Pattern 2: Multi-Agent System
+```
+         ┌──────────────┐
+         │     Task     │
+         └──────┬───────┘
+                │
+       ┌────────┼────────┐
+       ↓        ↓        ↓
+  ┌────────┐ ┌──────┐ ┌──────────┐
+  │Planner │ │Critic│ │ Executor │
+  │ Agent  │ │Agent │ │  Agent   │
+  └───┬────┘ └──┬───┘ └────┬─────┘
+      │         │          │
+      └─────────┼──────────┘
+                ↓
+       (All run in parallel)
+```
+### Pattern 3: Pipeline Processing
+```
+Input Queue: [Task1, Task2, Task3, ...]
+                    ↓
+            ┌───────────────┐
+            │  Dispatcher   │
+            └───────────────┘
+                    ↓
+        ┌───────────┼───────────┐
+        ↓           ↓           ↓
+    Sequence 1  Sequence 2  Sequence 3
+        ↓           ↓           ↓
+        └───────────┼───────────┘
+                    ↓
+            Output: [R1, R2, R3]
+```
+## Resource Management
+### Memory Allocation
+Each sequence consumes memory:
+```
+┌──────────────────────────────────┐
+│        Total VRAM: 8GB           │
+├──────────────────────────────────┤
+│  Model Weights:        4.0 GB    │
+│  Context Base:         1.0 GB    │
+│  Sequence 1 (KV Cache): 0.8 GB   │
+│  Sequence 2 (KV Cache): 0.8 GB   │
+│  Sequence 3 (KV Cache): 0.8 GB   │
+│  Overhead:             0.6 GB    │
+├──────────────────────────────────┤
+│  Total Used:           8.0 GB    │
+│  Remaining:            0.0 GB    │
+└──────────────────────────────────┘
+        Maximum capacity!
+```
+**Formula**:
+```
+Required VRAM = Model + Context + (NumSequences × KVCache)
+```
+### Finding Optimal Sequence Count
+```
+Too Few (1-2)              Optimal (4-8)           Too Many (16+)
+─────────────              ─────────────           ──────────────
+GPU underutilized          Balanced use            Memory overflow
+↓                          ↓                       ↓
+Slow throughput            Best performance        Thrashing/crashes
+```
+**Test your system**:
+1. Start with 2 sequences
+2. Monitor VRAM usage
+3. Increase until performance plateaus
+4. Back off if memory issues occur
+## Real-World Scenarios
+### Scenario 1: Chatbot Service
+```
+Challenge: 100 users, each waiting 2s per response
+Sequential: 100 × 2s = 200s (3.3 minutes!)
+Parallel (10 seq): 10 batches × 2s = 20s
+                   10x speedup!
+```
+### Scenario 2: Batch Analysis
+```
+Task: Analyze 1000 documents
+Sequential: 1000 × 3s = 50 minutes
+Parallel (8 seq): 125 batches × 3s = 6.25 minutes
+                  8x speedup!
+```
+### Scenario 3: Multi-Agent Collaboration
+```
+Agents: Planner, Analyzer, Executor (all needed)
+Sequential: Wait for each → Slow pipeline
+Parallel: All work together → Fast decision-making
+```
+## Limitations & Considerations
+### 1. Context Capacity Sharing
+```
+Problem: Sequences share total context space
+───────────────────────────────────────────
+Total context: 4096 tokens
+2 sequences: Each gets ~2048 tokens max
+4 sequences: Each gets ~1024 tokens max
+More sequences = Less history per sequence!
+```
+### 2. CPU vs GPU Parallelism
+```
+With GPU:                    CPU Only:
+True parallel processing     Interleaved processing
+Multiple CUDA streams        Single thread context-switching
+                            (Still helps throughput!)
+```
+### 3. Not Always Faster
+```
+When parallel helps:         When it doesn't:
+• Independent requests       • Dependent requests (must wait)
+• I/O-bound operations      • Very short prompts (overhead)
+• Multiple users            • Single sequential conversation
+```
+## Best Practices
+### 1. Design for Independence
+```
+✓ Good: Separate user conversations
+✓ Good: Independent analysis tasks
+✗ Bad: Sequential reasoning steps (use ReAct instead)
+```
+### 2. Monitor Resources
+```
+Track:
+• VRAM usage per sequence
+• Processing time per request
+• Queue depths
+• Error rates
+```
+### 3. Implement Graceful Degradation
+```
+if (vramExceeded) {
+    reduceSequenceCount();
+    // or queue requests instead
+}
+```
+### 4. Handle Errors Properly
+```javascript
+try {
+    const results = await Promise.all([...]);
+} catch (error) {
+    // One failure doesn't crash all sequences
+    handlePartialResults();
+}
+```
+## Comparison: Evolution of Performance
+```
+Stage              Requests/Min    Pattern
+─────────────────  ─────────────   ───────────────
+1. Basic (intro)        30          Sequential
+2. Batch (this)        120          4 sequences
+3. Load balanced       240          8 sequences + queue
+4. Distributed        1000+         Multiple machines
+```
+## Key Takeaways
+1. **Parallelism is essential** for production AI agent systems
+2. **Sequences share model** but maintain independent state
+3. **Promise.all** enables concurrent JavaScript execution
+4. **Batch size** affects GPU utilization and throughput
+5. **Memory is the limit** - more sequences need more VRAM
+6. **Not magic** - only helps with independent tasks
+## Practical Formula
+```
+Speedup = min(
+    Number_of_Sequences,
+    Available_VRAM / Memory_Per_Sequence,
+    GPU_Compute_Limit
+)
+```
+Typically: 2-10x speedup for well-designed systems.
+This technique is foundational for building scalable agent architectures that can handle real-world workloads efficiently.

examples/05_batch/batch.js ADDED Viewed

	@@ -0,0 +1,60 @@

+import {getLlama, LlamaChatSession} from "node-llama-cpp";
+import path from "path";
+import {fileURLToPath} from "url";
+/**
+ * Asynchronous execution improves performance in GAIA benchmarks,
+ * multi-agent applications, and other high-throughput scenarios.
+ */
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const modelPath = path.join(
+    __dirname,
+    '..',
+    '..',
+    'models',
+    'DeepSeek-R1-0528-Qwen3-8B-Q6_K.gguf'
+)
+const llama = await getLlama({
+    logLevel: 'error'
+});
+const model = await llama.loadModel({modelPath});
+const context = await model.createContext({
+    sequences: 2,
+    batchSize: 1024 // The number of tokens that can be processed at once by the GPU.
+});
+const sequence1 = context.getSequence();
+const sequence2 = context.getSequence();
+const session1 = new LlamaChatSession({
+    contextSequence: sequence1
+});
+const session2 = new LlamaChatSession({
+    contextSequence: sequence2
+});
+const q1 = "Hi there, how are you?";
+const q2 = "How much is 6+6?";
+console.log('Batching started...')
+const [
+    a1,
+    a2
+] = await Promise.all([
+    session1.prompt(q1),
+    session2.prompt(q2)
+]);
+console.log("User: " + q1);
+console.log("AI: " + a1);
+console.log("User: " + q2);
+console.log("AI: " + a2);
+session1.dispose();
+session2.dispose();
+context.dispose();
+model.dispose();
+llama.dispose();

examples/06_coding/CODE.md ADDED Viewed

	@@ -0,0 +1,380 @@

+# Code Explanation: coding.js
+This file demonstrates **streaming responses** with token limits and real-time output, showing how to get immediate feedback from the LLM as it generates text.
+## Step-by-Step Code Breakdown
+### 1. Import and Setup (Lines 1-8)
+```javascript
+import {
+    getLlama,
+    HarmonyChatWrapper,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+```
+- Standard setup for LLM interaction
+- **HarmonyChatWrapper**: A chat format wrapper for models that use the Harmony format (more on this below)
+### 2. Understanding the Harmony Chat Format
+#### What is Harmony?
+Harmony is a structured message format used for multi-role chat interactions designed by OpenAI for their gpt-oss models. It's not just a prompt format - it's a complete rethinking of how models should structure their outputs, especially for complex reasoning and tool use.
+#### Harmony Format Structure
+The format uses special tokens and syntax to define roles such as `system`, `developer`, `user`, `assistant`, and `tool`, as well as output "channels" (`analysis`, `commentary`, `final`) that let the model reason internally, call tools, and produce clean user-facing responses.
+**Basic message structure:**
+```
+<|start|>ROLE<|message|>CONTENT<|end|>
+<|start|>assistant<|channel|>CHANNEL<|message|>CONTENT<|end|>
+```
+**The five roles in hierarchy order** (system > developer > user > assistant > tool):
+1. **system**: Global identity, guardrails, and model configuration
+2. **developer**: Product policy and style instructions (what you typically think of as "system prompt")
+3. **user**: User messages and queries
+4. **assistant**: Model responses
+5. **tool**: Tool execution results
+**The three output channels:**
+1. **analysis**: Private chain-of-thought reasoning not shown to users
+2. **commentary**: Tool calling preambles and process updates
+3. **final**: Clean user-facing responses
+**Example of Harmony in action:**
+```
+<|start|>system<|message|>You are a helpful assistant.<|end|>
+<|start|>developer<|message|>Always be concise.<|end|>
+<|start|>user<|message|>What time is it?<|end|>
+<|start|>assistant<|channel|>commentary<|message|>{"tool_use": {"name": "get_current_time", "arguments": {}}}<|end|>
+<|start|>tool<|message|>{"time": "2025-10-25T13:47:00Z"}<|end|>
+<|start|>assistant<|channel|>final<|message|>The current time is 1:47 PM UTC.<|end|>
+```
+#### Why Use Harmony?
+Harmony separates how the model thinks, what actions it takes, and what finally goes to the user, resulting in cleaner tool use, safer defaults for UI, and better observability. For our translation example:
+- The `final` channel ensures we only get the translation, not explanations
+- The structured format helps the model follow instructions more reliably
+- The role hierarchy prevents instruction conflicts
+**Important Note**: Models need to be specifically trained or fine-tuned to produce Harmony output correctly. You can't just apply this format to any model. Apertus and other models not explicitly trained on Harmony may be confused by this structure, but the HarmonyChatWrapper in node-llama-cpp handles the necessary formatting automatically.
+### 3. Load Model (Lines 10-18)
+```javascript
+const llama = await getLlama();
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        "../",
+        "models",
+        "hf_giladgd_gpt-oss-20b.MXFP4.gguf"
+    )
+});
+```
+- Uses **gpt-oss-20b**: A 20 billion parameter model
+- **MXFP4**: Mixed precision 4-bit quantization for smaller size
+- Larger model = better code explanations
+### 4. Create Context and Session (Lines 19-22)
+```javascript
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    chatWrapper: new HarmonyChatWrapper(),
+    contextSequence: context.getSequence(),
+});
+```
+Basic session setup with no system prompt.
+### 5. Define the Question (Line 24)
+```javascript
+const q1 = `What is hoisting in JavaScript? Explain with examples.`;
+```
+A technical programming question that requires detailed explanation.
+### 6. Display Context Size (Line 26)
+```javascript
+console.log('context.contextSize', context.contextSize)
+```
+- Shows the maximum context window size
+- Helps understand memory limitations
+- Useful for debugging
+### 7. Streaming Prompt Execution (Lines 28-36)
+```javascript
+const a1 = await session.prompt(q1, {
+    // Tip: let the lib choose or cap reasonably; using the whole context size can be wasteful
+    maxTokens: 2000,
+    // Fires as soon as the first characters arrive
+    onTextChunk: (text) => {
+        process.stdout.write(text); // optional: live print
+    },
+});
+```
+**Key parameters:**
+**maxTokens: 2000**
+- Limits response length to 2000 tokens (~1500 words)
+- Prevents runaway generation
+- Saves time and compute
+- Without limit: model uses entire context
+**onTextChunk callback**
+- Fires **as each token is generated**
+- Receives text as it's produced
+- `process.stdout.write()`: Prints without newlines
+- Creates real-time "typing" effect
+### How Streaming Works
+```
+Without streaming:
+User → [Wait 10 seconds...] → Complete response appears
+With streaming:
+User → [Token 1] → [Token 2] → [Token 3] → ... → Complete
+       "What"      "is"        "hoisting"
+       (Immediate feedback!)
+```
+### 8. Display Final Answer (Line 38)
+```javascript
+console.log("\n\nFinal answer:\n", a1);
+```
+- Prints the complete response again
+- Useful for logging or verification
+- Shows full text after streaming
+### 9. Cleanup (Lines 41-44)
+```javascript
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()
+```
+Standard resource cleanup.
+## Key Concepts Demonstrated
+### 1. Streaming Responses
+**Why streaming matters:**
+- **Better UX**: Users see progress immediately
+- **Early termination**: Can stop if response is off-track
+- **Perceived speed**: Feels faster than waiting
+- **Debugging**: See generation in real-time
+**Comparison:**
+```
+Non-streaming:           Streaming:
+═══════════════         ═══════════════
+Request sent            Request sent
+[10s wait...]           "What" (0.1s)
+Complete response       "is" (0.2s)
+                        "hoisting" (0.3s)
+                        ... continues
+                        (Same total time, better experience!)
+```
+### 2. Token Limits
+**maxTokens controls generation length:**
+```
+No limit:               With limit (2000):
+─────────             ─────────────────
+May generate forever   Stops at 2000 tokens
+Uses entire context    Saves computation
+Unpredictable cost     Predictable cost
+```
+**Token approximation:**
+- 1 token ≈ 0.75 words (English)
+- 2000 tokens ≈ 1500 words
+- 4-5 paragraphs of detailed explanation
+### 3. Real-Time Feedback Pattern
+The `onTextChunk` callback enables:
+```javascript
+onTextChunk: (text) => {
+    // Do anything with each chunk:
+    process.stdout.write(text);      // Console output
+    // socket.emit('chunk', text);   // WebSocket to client
+    // buffer += text;               // Accumulate for processing
+    // analyzePartial(text);         // Real-time analysis
+}
+```
+### 4. Context Size Awareness
+```javascript
+console.log('context.contextSize', context.contextSize)
+```
+Shows model's memory capacity:
+- Small models: 2048-4096 tokens
+- Medium models: 8192-16384 tokens
+- Large models: 32768+ tokens
+**Why it matters:**
+```
+Context Size: 4096 tokens
+Prompt: 100 tokens
+Max response: 2000 tokens
+History: Up to 1996 tokens
+```
+## Use Cases
+### 1. Code Explanations (This Example)
+```javascript
+prompt: "Explain hoisting in JavaScript"
+→ Streams detailed explanation with examples
+```
+### 2. Long-Form Content Generation
+```javascript
+prompt: "Write a blog post about AI agents"
+maxTokens: 3000
+→ Streams article as it's written
+```
+### 3. Interactive Tutoring
+```javascript
+// User sees explanation being built
+prompt: "Teach me about closures"
+onTextChunk: (text) => displayToUser(text)
+```
+### 4. Web Applications
+```javascript
+// Server-Sent Events or WebSocket
+onTextChunk: (text) => {
+    websocket.send(text);  // Send to browser
+}
+```
+## Performance Considerations
+### Token Generation Speed
+Depends on:
+- **Model size**: Larger = slower per token
+- **Hardware**: GPU > CPU
+- **Quantization**: Lower bits = faster
+- **Context length**: Longer context = slower
+**Typical speeds:**
+```
+Model Size    GPU (RTX 4090)    CPU (M2 Max)
+──────────    ──────────────    ────────────
+1.7B          50-80 tok/s       15-25 tok/s
+8B            20-35 tok/s       5-10 tok/s
+20B           10-15 tok/s       2-4 tok/s
+```
+### When to Use maxTokens
+```
+✓ Use maxTokens when:
+  • Response length is predictable
+  • You want to save computation
+  • Testing/debugging
+  • API rate limiting
+✗ Don't limit when:
+  • Need complete answer
+  • Length varies greatly
+  • Using stop sequences instead
+```
+## Advanced Streaming Patterns
+### Pattern 1: Progressive Enhancement
+```javascript
+let buffer = '';
+onTextChunk: (text) => {
+    buffer += text;
+    if (buffer.includes('\n\n')) {
+        // Complete paragraph ready
+        processParagraph(buffer);
+        buffer = '';
+    }
+}
+```
+### Pattern 2: Early Stopping
+```javascript
+let isRelevant = true;
+onTextChunk: (text) => {
+    if (text.includes('irrelevant_keyword')) {
+        isRelevant = false;
+        // Stop generation (would need additional API)
+    }
+}
+```
+### Pattern 3: Multi-Consumer
+```javascript
+onTextChunk: (text) => {
+    console.log(text);           // Console
+    logFile.write(text);         // File
+    websocket.send(text);        // Client
+    analyzer.process(text);      // Analysis
+}
+```
+## Expected Output
+When run, you'll see:
+1. Context size logged (e.g., "context.contextSize 32768")
+2. Streaming response appearing token-by-token
+3. Complete final answer printed again
+Example output flow:
+```
+context.contextSize 32768
+Hoisting is a JavaScript mechanism where variables and function
+declarations are moved to the top of their scope before code
+execution. For example:
+console.log(x); // undefined (not an error!)
+var x = 5;
+This works because...
+[continues streaming...]
+Final answer:
+[Complete response printed again]
+```
+## Why This Matters for AI Agents
+### User Experience
+- Real-time agents feel more responsive
+- Users can interrupt if going wrong direction
+- Better for conversational interfaces
+### Resource Management
+- Token limits prevent runaway generation
+- Predictable costs and timing
+- Can cancel expensive operations early
+### Integration Patterns
+- Web UIs show "typing" effect
+- CLIs display progressive output
+- APIs stream to clients efficiently
+This pattern is essential for production agent systems where user experience and resource control matter.

examples/06_coding/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,400 @@

+# Concept: Streaming & Response Control
+## Overview
+This example demonstrates **streaming responses** and **token limits**, two essential techniques for building responsive AI agents with controlled output.
+## The Streaming Problem
+### Traditional (Non-Streaming) Approach
+```
+User sends prompt
+       ↓
+   [Wait 10 seconds...]
+       ↓
+Complete response appears all at once
+```
+**Problems:**
+- Poor user experience (long wait)
+- No progress indication
+- Can't interrupt bad responses
+- Feels unresponsive
+### Streaming Approach (This Example)
+```
+User sends prompt
+       ↓
+"Hoisting" (0.1s) → User sees first word!
+       ↓
+"is a" (0.2s) → More text appears
+       ↓
+"JavaScript" (0.3s) → Continuous feedback
+       ↓
+[Continues token by token...]
+```
+**Benefits:**
+- Immediate feedback
+- Progress visible
+- Can interrupt early
+- Feels interactive
+## How Streaming Works
+### Token-by-Token Generation
+LLMs generate one token at a time internally. Streaming exposes this:
+```
+Internal LLM Process:
+┌─────────────────────────────────────┐
+│  Token 1: "Hoisting"                │
+│  Token 2: "is"                      │
+│  Token 3: "a"                       │
+│  Token 4: "JavaScript"              │
+│  Token 5: "mechanism"               │
+│  ...                                │
+└─────────────────────────────────────┘
+Without Streaming:        With Streaming:
+Wait for all tokens       Emit each token immediately
+└─→ Buffer → Return      └─→ Callback → Display
+```
+### The onTextChunk Callback
+```
+┌────────────────────────────────────┐
+│        Model Generation            │
+└────────────┬───────────────────────┘
+             │
+    ┌────────┴─────────┐
+    │  Each new token  │
+    └────────┬─────────┘
+             ↓
+    ┌────────────────────┐
+    │ onTextChunk(text)  │  ← Your callback
+    └────────┬───────────┘
+             ↓
+    Your code processes it:
+    • Display to user
+    • Send over network
+    • Log to file
+    • Analyze content
+```
+## Token Limits: maxTokens
+### Why Limit Output?
+Without limits, models might generate:
+```
+User: "Explain hoisting"
+Model: [Generates 10,000 words including:
+        - Complete JavaScript history
+        - Every edge case
+        - Unrelated examples
+        - Never stops...]
+```
+With limits:
+```
+User: "Explain hoisting"
+Model: [Generates ~1500 words
+        - Core concept
+        - Key examples
+        - Stops at 2000 tokens]
+```
+### Token Budgeting
+```
+Context Window: 4096 tokens
+├─ System Prompt: 200 tokens
+├─ User Message: 100 tokens
+├─ Response (maxTokens): 2000 tokens
+└─ Remaining for history: 1796 tokens
+Total used: 2300 tokens
+Available: 1796 tokens for future conversation
+```
+### Cost vs Quality
+```
+Token Limit        Output Quality      Use Case
+───────────       ───────────────     ─────────────────
+100               Brief, may be cut   Quick answers
+500               Concise but complete Short explanations
+2000 (example)    Detailed            Full explanations
+No limit          Risk of rambling    When length unknown
+```
+## Real-Time Applications
+### Pattern 1: Interactive CLI
+```
+User: "Explain closures"
+       ↓
+Terminal: "A closure is a function..."
+         (Appears word by word, like typing)
+       ↓
+User sees progress, knows it's working
+```
+### Pattern 2: Web Application
+```
+Browser                    Server
+   │                         │
+   ├─── Send prompt ────────→│
+   │                         │
+   │←── Chunk 1: "Closures"──┤
+   │    (Display immediately) │
+   │                         │
+   │←── Chunk 2: "are"───────┤
+   │    (Append to display)  │
+   │                         │
+   │←── Chunk 3: "functions"─┤
+   │    (Keep appending...)  │
+```
+Implementation:
+- Server-Sent Events (SSE)
+- WebSockets
+- HTTP streaming
+### Pattern 3: Multi-Consumer
+```
+         onTextChunk(text)
+                │
+        ┌───────┼───────┐
+        ↓       ↓       ↓
+    Console  WebSocket  Log File
+    Display  → Client   → Storage
+```
+## Performance Characteristics
+### Latency vs Throughput
+```
+Time to First Token (TTFT):
+├─ Small model (1.7B): ~100ms
+├─ Medium model (8B): ~200ms
+└─ Large model (20B): ~500ms
+Tokens Per Second:
+├─ Small model: 50-80 tok/s
+├─ Medium model: 20-35 tok/s
+└─ Large model: 10-15 tok/s
+User Experience:
+TTFT < 500ms → Feels instant
+Tok/s > 20 → Reads naturally
+```
+### Resource Trade-offs
+```
+Model Size      Memory    Speed     Quality
+──────────     ────────   ─────     ───────
+1.7B           ~2GB       Fast      Good
+8B             ~6GB       Medium    Better
+20B            ~12GB      Slower    Best
+```
+## Advanced Concepts
+### Buffering Strategies
+**No Buffer (Immediate)**
+```
+Every token → callback → display
+└─ Smoothest UX but more overhead
+```
+**Line Buffer**
+```
+Accumulate until newline → flush
+└─ Better for paragraph-based output
+```
+**Time Buffer**
+```
+Accumulate for 50ms → flush batch
+└─ Reduces callback frequency
+```
+### Early Stopping
+```
+Generation in progress:
+"The answer is clearly... wait, actually..."
+                         ↑
+                  onTextChunk detects issue
+                         ↓
+                   Stop generation
+                         ↓
+              "Let me reconsider"
+```
+Useful for:
+- Detecting off-topic responses
+- Safety filters
+- Relevance checking
+### Progressive Enhancement
+```
+Partial Response Analysis:
+┌─────────────────────────────────┐
+│ "To implement this feature..."  │
+│                                 │
+│ ← Already useful information   │
+│                                 │
+│ "...you'll need: 1) Node.js"    │
+│                                 │
+│ ← Can start acting on this     │
+│                                 │
+│ "2) Express framework"          │
+└─────────────────────────────────┘
+Agent can begin working before response completes!
+```
+## Context Size Awareness
+### Why It Matters
+```
+┌────────────────────────────────┐
+│    Context Window (4096)       │
+├────────────────────────────────┤
+│ System Prompt       200 tokens │
+│ Conversation History 1000      │
+│ Current Prompt      100        │
+│ Response Space      2796       │
+└────────────────────────────────┘
+If maxTokens > 2796:
+└─→ Error or truncation!
+```
+### Dynamic Adjustment
+```
+Available = contextSize - (prompt + history)
+if (maxTokens > available) {
+    maxTokens = available;
+    // or clear old history
+}
+```
+## Streaming in Agent Architectures
+### Simple Agent
+```
+User → LLM (streaming) → Display
+       └─ onTextChunk shows progress
+```
+### Multi-Step Agent
+```
+Step 1: Plan (stream) → Show thinking
+Step 2: Act (stream) → Show action
+Step 3: Result (stream) → Show outcome
+       └─ User sees agent's process
+```
+### Collaborative Agents
+```
+Agent A (streaming) ──┐
+                      ├─→ Coordinator → User
+Agent B (streaming) ──┘
+       └─ Both stream simultaneously
+```
+## Best Practices
+### 1. Always Set maxTokens
+```
+✓ Good:
+session.prompt(query, { maxTokens: 2000 })
+✗ Risky:
+session.prompt(query)
+└─ May use entire context!
+```
+### 2. Handle Partial Updates
+```
+let fullResponse = '';
+onTextChunk: (chunk) => {
+    fullResponse += chunk;
+    display(chunk);        // Show immediately
+    logComplete = false;   // Mark incomplete
+}
+// After completion:
+saveToDatabase(fullResponse);
+```
+### 3. Provide Feedback
+```
+onTextChunk: (chunk) => {
+    if (firstChunk) {
+        showLoadingDone();
+        firstChunk = false;
+    }
+    appendToDisplay(chunk);
+}
+```
+### 4. Monitor Performance
+```
+const startTime = Date.now();
+let tokenCount = 0;
+onTextChunk: (chunk) => {
+    tokenCount += estimateTokens(chunk);
+    const elapsed = (Date.now() - startTime) / 1000;
+    const tokensPerSecond = tokenCount / elapsed;
+    updateMetrics(tokensPerSecond);
+}
+```
+## Key Takeaways
+1. **Streaming improves UX**: Users see progress immediately
+2. **maxTokens controls cost**: Prevents runaway generation
+3. **Token-by-token generation**: LLMs produce one token at a time
+4. **onTextChunk callback**: Your hook into the generation process
+5. **Context awareness matters**: Monitor available space
+6. **Essential for production**: Real-time systems need streaming
+## Comparison
+```
+Feature           intro.js    coding.js (this)
+────────────────  ─────────   ─────────────────
+Streaming         ✗           ✓
+Token limit       ✗           ✓ (2000)
+Real-time output  ✗           ✓
+Progress visible  ✗           ✓
+User control      ✗           ✓
+```
+This pattern is foundational for building responsive, user-friendly AI agent interfaces.

examples/06_coding/coding.js ADDED Viewed

	@@ -0,0 +1,47 @@

+import {
+    getLlama,
+    HarmonyChatWrapper,
+    LlamaChatSession,
+} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const llama = await getLlama();
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'hf_giladgd_gpt-oss-20b.MXFP4.gguf'
+    )
+});
+const context = await model.createContext();
+const session = new LlamaChatSession({
+    chatWrapper: new HarmonyChatWrapper(),
+    contextSequence: context.getSequence(),
+});
+const q1 = `What is hoisting in JavaScript? Explain with examples.`;
+console.log('context.contextSize', context.contextSize)
+const a1 = await session.prompt(q1, {
+    // Tip: let the lib choose or cap reasonably; using the whole context size can be wasteful
+    maxTokens: 2000,
+    // Fires as soon as the first characters arrive
+    onTextChunk: (text) => {
+        process.stdout.write(text); // optional: live print
+    },
+});
+console.log("\n\nFinal answer:\n", a1);
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()

examples/07_simple-agent/CODE.md ADDED Viewed

	@@ -0,0 +1,368 @@

+# Code Explanation: simple-agent.js
+This file demonstrates **function calling** - the core feature that transforms an LLM from a text generator into an agent that can take actions using tools.
+## Step-by-Step Code Breakdown
+### 1. Import and Setup (Lines 1-7)
+```javascript
+import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+import {PromptDebugger} from "../helper/prompt-debugger.js";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const debug = false;
+```
+- **defineChatSessionFunction**: Key import for creating callable functions
+- **PromptDebugger**: Helper for debugging prompts (covered at the end)
+- **debug**: Controls verbose logging
+### 2. Initialize and Load Model (Lines 9-17)
+```javascript
+const llama = await getLlama({debug});
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        "../",
+        "models",
+        "Qwen3-1.7B-Q8_0.gguf"
+    )
+});
+const context = await model.createContext({contextSize: 2000});
+```
+- Uses Qwen3-1.7B model (good for function calling)
+- Sets context size to 2000 tokens explicitly
+### 3. System Prompt for Time Conversion (Lines 20-23)
+```javascript
+const systemPrompt = `You are a professional chronologist who standardizes time representations across different systems.
+Always convert times from 12-hour format (e.g., "1:46:36 PM") to 24-hour format (e.g., "13:46") without seconds
+before returning them.`;
+```
+**Purpose:**
+- Defines agent's role and behavior
+- Instructs on output format (24-hour, no seconds)
+- Ensures consistency in time representation
+### 4. Create Session (Lines 25-28)
+```javascript
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt,
+});
+```
+Standard session with system prompt.
+### 5. Define a Tool Function (Lines 30-39)
+```javascript
+const getCurrentTime = defineChatSessionFunction({
+    description: "Get the current time",
+    params: {
+        type: "object",
+        properties: {}
+    },
+    async handler() {
+        return new Date().toLocaleTimeString();
+    }
+});
+```
+**Breaking it down:**
+**description:**
+- Tells the LLM what this function does
+- LLM reads this to decide when to call it
+**params:**
+- Defines function parameters (JSON Schema format)
+- Empty `properties: {}` means no parameters needed
+- Type must be "object" even if no properties
+**handler:**
+- The actual JavaScript function that executes
+- Returns current time as string (e.g., "1:46:36 PM")
+- Can be async (use await inside)
+### How Function Calling Works
+```
+1. User asks: "What time is it?"
+2. LLM reads:
+   - System prompt
+   - Available functions (getCurrentTime)
+   - Function description
+3. LLM decides: "I should call getCurrentTime()"
+4. Library executes: handler()
+5. Handler returns: "1:46:36 PM"
+6. LLM receives result as "tool output"
+7. LLM processes: Converts to 24-hour format per system prompt
+8. LLM responds: "13:46"
+```
+### 6. Register Functions (Line 41)
+```javascript
+const functions = {getCurrentTime};
+```
+- Creates object with all available functions
+- Multiple functions: `{getCurrentTime, getWeather, calculate, ...}`
+- LLM can choose which function(s) to call
+### 7. Define User Prompt (Line 42)
+```javascript
+const prompt = `What time is it right now?`;
+```
+A question that requires using the tool.
+### 8. Execute with Functions (Line 45)
+```javascript
+const a1 = await session.prompt(prompt, {functions});
+console.log("AI: " + a1);
+```
+- **{functions}** makes tools available to the LLM
+- LLM will automatically call getCurrentTime if needed
+- Response includes tool result processed by LLM
+### 9. Debug Prompt Context (Lines 49-55)
+```javascript
+const promptDebugger = new PromptDebugger({
+    outputDir: './logs',
+    filename: 'qwen_prompts.txt',
+    includeTimestamp: true,
+    appendMode: false
+});
+await promptDebugger.debugContextState({session, model});
+```
+**What this does:**
+- Saves the entire prompt sent to the model
+- Shows exactly what the LLM sees (including function definitions)
+- Useful for debugging why model does/doesn't call functions
+- Writes to `./logs/qwen_prompts_[timestamp].txt`
+### 10. Cleanup (Lines 58-61)
+```javascript
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()
+```
+Standard cleanup.
+## Key Concepts Demonstrated
+### 1. Function Calling (Tool Use)
+This is what makes it an "agent":
+```
+Without tools:          With tools:
+LLM → Text only        LLM → Can take actions
+                              ↓
+                       Call functions
+                       Access data
+                       Execute code
+```
+### 2. Function Definition Pattern
+```javascript
+defineChatSessionFunction({
+    description: "What the function does",  // LLM reads this
+    params: {                               // Expected parameters
+        type: "object",
+        properties: {
+            paramName: {
+                type: "string",
+                description: "What this param is for"
+            }
+        },
+        required: ["paramName"]
+    },
+    handler: async (params) => {            // Your code
+        // Do something with params
+        return result;
+    }
+});
+```
+### 3. JSON Schema for Parameters
+Uses standard JSON Schema:
+```javascript
+// No parameters
+properties: {}
+// One string parameter
+properties: {
+    city: {
+        type: "string",
+        description: "City name"
+    }
+}
+// Multiple parameters
+properties: {
+    a: { type: "number" },
+    b: { type: "number" }
+},
+required: ["a", "b"]
+```
+### 4. Agent Decision Making
+```
+User: "What time is it?"
+         ↓
+    LLM thinks:
+    "I need current time"
+    "I see function: getCurrentTime"
+    "Description matches what I need"
+         ↓
+    LLM outputs special format:
+    {function_call: "getCurrentTime"}
+         ↓
+    Library intercepts and runs handler()
+         ↓
+    Handler returns: "1:46:36 PM"
+         ↓
+    LLM receives: Tool result
+         ↓
+    LLM applies system prompt:
+    Convert to 24-hour format
+         ↓
+    Final answer: "13:46"
+```
+## Use Cases
+### 1. Information Retrieval
+```javascript
+const getWeather = defineChatSessionFunction({
+    description: "Get weather for a city",
+    params: {
+        type: "object",
+        properties: {
+            city: { type: "string" }
+        }
+    },
+    handler: async ({city}) => {
+        return await fetchWeather(city);
+    }
+});
+```
+### 2. Calculations
+```javascript
+const calculate = defineChatSessionFunction({
+    description: "Perform arithmetic calculation",
+    params: {
+        type: "object",
+        properties: {
+            expression: { type: "string" }
+        }
+    },
+    handler: async ({expression}) => {
+        return eval(expression); // (Be careful with eval!)
+    }
+});
+```
+### 3. Data Access
+```javascript
+const queryDatabase = defineChatSessionFunction({
+    description: "Query user database",
+    params: {
+        type: "object",
+        properties: {
+            userId: { type: "string" }
+        }
+    },
+    handler: async ({userId}) => {
+        return await db.users.findById(userId);
+    }
+});
+```
+### 4. External APIs
+```javascript
+const searchWeb = defineChatSessionFunction({
+    description: "Search the web",
+    params: {
+        type: "object",
+        properties: {
+            query: { type: "string" }
+        }
+    },
+    handler: async ({query}) => {
+        return await googleSearch(query);
+    }
+});
+```
+## Expected Output
+When run:
+```
+AI: 13:46
+```
+The LLM:
+1. Called getCurrentTime() internally
+2. Got "1:46:36 PM"
+3. Converted to 24-hour format
+4. Removed seconds
+5. Returned "13:46"
+## Debugging with PromptDebugger
+The debug output shows the full prompt including function schemas:
+```
+System: You are a professional chronologist...
+Functions available:
+- getCurrentTime: Get the current time
+  Parameters: (none)
+User: What time is it right now?
+```
+This helps debug:
+- Did the model see the function?
+- Was the description clear?
+- Did parameters match expectations?
+## Why This Matters for AI Agents
+### Agents = LLMs + Tools
+```
+LLM alone:                    LLM + Tools:
+├─ Generate text              ├─ Generate text
+└─ That's it                  ├─ Access real data
+                              ├─ Perform calculations
+                              ├─ Call APIs
+                              ├─ Execute actions
+                              └─ Interact with world
+```
+### Foundation for Complex Agents
+This simple example is the foundation for:
+- **Research agents**: Search web, read documents
+- **Coding agents**: Run code, check errors
+- **Personal assistants**: Calendar, email, reminders
+- **Analysis agents**: Query databases, compute statistics
+All start with basic function calling!
+## Best Practices
+1. **Clear descriptions**: LLM uses these to decide when to call
+2. **Type safety**: Use JSON Schema properly
+3. **Error handling**: Handler should catch errors
+4. **Return strings**: LLM processes text best
+5. **Keep functions focused**: One clear purpose per function
+This is the minimum viable agent: one LLM + one tool + proper configuration.

examples/07_simple-agent/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,69 @@

+# Concept: Function Calling & Tool Use
+## Overview
+Function calling transforms LLMs from text generators into agents that can take actions and interact with the world.
+## What Makes an Agent?
+```
+Text Generator              Agent
+──────────────             ──────
+LLM → Text only            LLM + Tools → Can act
+```
+**Function calling** lets the LLM invoke predefined functions to access data or perform actions it cannot do alone.
+## The Core Idea
+```
+User: "What time is it?"
+       ↓
+LLM thinks: "I need current time"
+       ↓
+LLM calls: getCurrentTime()
+       ↓
+Tool returns: "1:46:36 PM"
+       ↓
+LLM responds: "It's 13:46"
+```
+This is agency - the ability to DO, not just SAY.
+## How It Works
+### 1. Function Definition
+```javascript
+getCurrentTime = {
+  description: "Get the current time",
+  handler: () => new Date().toLocaleTimeString()
+}
+```
+### 2. LLM Sees Available Tools
+```
+Available functions:
+- getCurrentTime: "Get the current time"
+- getWeather: "Get weather for a city"
+- calculate: "Perform math"
+```
+### 3. LLM Decides When to Use
+```
+"What time?" → getCurrentTime() ✓
+"What's 5+5?" → calculate() ✓
+"Tell a joke" → No tool needed
+```
+## Real-World Applications
+**Personal Assistant**: Calendar, email, reminders
+**Research Agent**: Web search, document reading
+**Coding Assistant**: File operations, code execution
+**Data Analyst**: Database queries, calculations
+## Key Takeaway
+Function calling is THE feature that enables AI agents. Without it, LLMs can only talk. With it, they can act.
+This is the foundation of all modern agent systems.

examples/07_simple-agent/simple-agent.js ADDED Viewed

	@@ -0,0 +1,62 @@

+import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+import {PromptDebugger} from "../../helper/prompt-debugger.js";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const debug = false;
+const llama = await getLlama({debug});
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'Qwen3-1.7B-Q8_0.gguf'
+    )
+});
+const context = await model.createContext({contextSize: 2000});
+const systemPrompt = `You are a professional chronologist who standardizes time representations across different systems.
+Always convert times from 12-hour format (e.g., "1:46:36 PM") to 24-hour format (e.g., "13:46") without seconds
+before returning them.`;
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt,
+});
+const getCurrentTime = defineChatSessionFunction({
+    description: "Get the current time",
+    params: {
+        type: "object",
+        properties: {}
+    },
+    async handler() {
+        return new Date().toLocaleTimeString();
+    }
+});
+const functions = {getCurrentTime};
+const prompt = `What time is it right now?`;
+// Execute the prompt
+const a1 = await session.prompt(prompt, {functions});
+console.log("AI: " + a1);
+// Debug after the prompt execution
+const promptDebugger = new PromptDebugger({
+    outputDir: './logs',
+    filename: 'qwen_prompts.txt',
+    includeTimestamp: true,  // adds timestamp to filename
+    appendMode: false        // overwrites file each time
+});
+await promptDebugger.debugContextState({session, model});
+// Clean up
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()

examples/08_simple-agent-with-memory/CODE.md ADDED Viewed

	@@ -0,0 +1,247 @@

+# Code Explanation: simple-agent-with-memory.js
+This example extends the simple agent with **persistent memory**, enabling it to remember information across sessions while intelligently avoiding duplicate saves.
+## Key Components
+### 1. MemoryManager Import
+```javascript
+import {MemoryManager} from "./memory-manager.js";
+```
+Custom class for persisting agent memories to JSON files with unified memory storage.
+### 2. Initialize Memory Manager
+```javascript
+const memoryManager = new MemoryManager('./agent-memory.json');
+const memorySummary = await memoryManager.getMemorySummary();
+```
+- Loads existing memories from file
+- Generates formatted summary for system prompt
+- Handles migration from old memory schemas
+### 3. Memory-Aware System Prompt with Reasoning
+```javascript
+const systemPrompt = `
+You are a helpful assistant with long-term memory.
+Before calling any function, always follow this reasoning process:
+1. **Compare** new user statements against existing memories below.
+2. **If the same key and value already exist**, do NOT call saveMemory again.
+   - Instead, simply acknowledge the known information.
+   - Example: if the user says "My name is Malua" and memory already says "user_name: Malua", reply "Yes, I remember your name is Malua."
+3. **If the user provides an updated value** (e.g., "I actually prefer sushi now"),
+   then call saveMemory once to update the value.
+4. **Only call saveMemory for genuinely new information.**
+When saving new data, call saveMemory with structured fields:
+- type: "fact" or "preference"
+- key: short descriptive identifier (e.g., "user_name", "favorite_food")
+- value: the specific information (e.g., "Malua", "chinua")
+Examples:
+saveMemory({ type: "fact", key: "user_name", value: "Malua" })
+saveMemory({ type: "preference", key: "favorite_food", value: "chinua" })
+${memorySummary}
+`;
+```
+**What this does:**
+- Includes existing memories in the prompt
+- Provides explicit reasoning guidelines to prevent duplicate saves
+- Teaches the agent to compare before saving
+- Instructs when to update vs. acknowledge existing data
+### 4. saveMemory Function
+```javascript
+const saveMemory = defineChatSessionFunction({
+    description: "Save important information to long-term memory (user preferences, facts, personal details)",
+    params: {
+        type: "object",
+        properties: {
+            type: {
+                type: "string",
+                enum: ["fact", "preference"]
+            },
+            key: { type: "string" },
+            value: { type: "string" }
+        },
+        required: ["type", "key", "value"]
+    },
+    async handler({ type, key, value }) {
+        await memoryManager.addMemory({ type, key, value });
+        return `Memory saved: ${key} = ${value}`;
+    }
+});
+```
+**What it does:**
+- Uses structured key-value format for all memories
+- Saves both facts and preferences with the same method
+- Automatically handles duplicates (updates if value changes)
+- Persists to JSON file
+- Returns confirmation message
+**Parameter Structure:**
+- `type`: Either "fact" or "preference"
+- `key`: Short identifier (e.g., "user_name", "favorite_food")
+- `value`: The actual information (e.g., "Alex", "pizza")
+### 5. Example Conversation
+```javascript
+const prompt1 = "Hi! My name is Alex and I love pizza.";
+const response1 = await session.prompt(prompt1, {functions});
+// Agent calls saveMemory twice:
+// - saveMemory({ type: "fact", key: "user_name", value: "Alex" })
+// - saveMemory({ type: "preference", key: "favorite_food", value: "pizza" })
+const prompt2 = "What's my favorite food?";
+const response2 = await session.prompt(prompt2, {functions});
+// Agent recalls from memory: "Pizza"
+```
+## How Memory Works
+### Flow Diagram
+```
+Session 1:
+User: "My name is Alex and I love pizza"
+  ↓
+Agent calls: saveMemory({ type: "fact", key: "user_name", value: "Alex" })
+Agent calls: saveMemory({ type: "preference", key: "favorite_food", value: "pizza" })
+  ↓
+Saved to: agent-memory.json
+Session 2 (after restart):
+1. Load memories from agent-memory.json
+2. Add to system prompt
+3. Agent sees: "user_name: Alex" and "favorite_food: pizza"
+4. Can use this information in responses
+Session 3:
+User: "My name is Alex"
+  ↓
+Agent compares: user_name already = "Alex"
+  ↓
+No function call! Just acknowledges: "Yes, I remember your name is Alex."
+```
+## The MemoryManager Class
+Located in `memory-manager.js`:
+```javascript
+class MemoryManager {
+  async loadMemories()           // Load from JSON (handles schema migration)
+  async saveMemories()           // Write to JSON
+  async addMemory()              // Unified method for all memory types
+  async getMemorySummary()       // Format memories for system prompt
+  extractKey()                   // Helper for migration
+  extractValue()                 // Helper for migration
+}
+```
+**Benefits:**
+- Single unified method for all memory types
+- Automatic duplicate detection and prevention
+- Automatic value updates when information changes
+## Key Concepts
+### 1. Structured Memory Format
+All memories now use a consistent structure:
+```javascript
+{
+  type: "fact" | "preference",
+  key: "user_name",           // Identifier
+  value: "Alex",              // The actual data
+  source: "user",             // Where it came from
+  timestamp: "2025-10-29..."  // When it was saved/updated
+}
+```
+### 2. Intelligent Duplicate Prevention
+The agent is trained to:
+- **Compare** before saving
+- **Skip** if data is identical
+- **Update** if value changed
+- **Acknowledge** existing memories instead of re-saving
+### 3. Persistent State
+- Memories survive script restarts
+- Stored in JSON file with metadata
+- Loaded at startup and injected into prompt
+### 4. Memory Integration in System Prompt
+Memories are automatically formatted and injected:
+```
+=== LONG-TERM MEMORY ===
+Known Facts:
+- user_name: Alex
+- location: Paris
+User Preferences:
+- favorite_food: pizza
+- preferred_language: French
+```
+## Why This Matters
+**Without memory:** Agent starts fresh every time, asks same questions repeatedly
+**With basic memory:** Agent remembers, but may save duplicates wastefully
+**With smart memory:** Agent remembers AND avoids redundant saves by reasoning first
+This enables:
+- **Personalized responses** based on user history
+- **Efficient memory usage** (no duplicate entries)
+- **Natural conversations** that feel continuous
+- **Stateful agents** that maintain context
+- **Automatic updates** when information changes
+## Expected Output
+**First run:**
+```
+User: "Hi! My name is Alex and I love pizza."
+AI: "Nice to meet you, Alex! I've noted that you love pizza."
+[Calls saveMemory twice - new information saved]
+```
+**Second run (after restart):**
+```
+User: "What's my favorite food?"
+AI: "Your favorite food is pizza! You mentioned that you love it."
+[No function calls - recalls from loaded memory]
+```
+**Third run (duplicate statement):**
+```
+User: "My name is Alex."
+AI: "Yes, I remember your name is Alex!"
+[No function call - recognizes duplicate, just acknowledges]
+```
+**Fourth run (updated information):**
+```
+User: "I actually prefer sushi now."
+AI: "Got it! I've updated your favorite food to sushi."
+[Calls saveMemory once - updates existing value]
+```
+## Reasoning Process
+The system prompt explicitly guides the agent through this decision tree:
+```
+New user statement
+    ↓
+Compare to existing memories
+    ↓
+    ├─→ Exact match? → Acknowledge only (no save)
+    ├─→ Updated value? → Save to update
+    └─→ New information? → Save as new
+```
+This reasoning-first approach makes the agent more intelligent and efficient with memory operations!

examples/08_simple-agent-with-memory/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,249 @@

+# Concept: Persistent Memory & State Management
+## Overview
+Adding persistent memory transforms agents from stateless responders into systems that can maintain context and relationships across sessions.
+## The Memory Problem
+```
+Without Memory              With Memory
+──────────────             ─────────────
+Session 1:                  Session 1:
+"I'm Alex"                 "I'm Alex" → Saved
+"I love pizza"             "I love pizza" → Saved
+Session 2:                  Session 2:
+"What's my name?"          "What's my name?"
+"I don't know"             "Alex!" ✓
+```
+## Architecture
+```
+┌─────────────────────────────────┐
+│         Agent Session           │
+├─────────────────────────────────┤
+│  System Prompt                  │
+│  + Loaded Memories              │
+│  + saveMemory Tool              │
+└────────┬────────────────────────┘
+         │
+         ↓
+┌─────────────────────────────────┐
+│      Memory Manager             │
+├─────────────────────────────────┤
+│  • Load from storage            │
+│  • Save to storage              │
+│  • Format for prompt            │
+└────────┬────────────────────────┘
+         │
+         ↓
+┌─────────────────────────────────┐
+│   Persistent Storage            │
+│   (agent-memory.json)           │
+└─────────────────────────────────┘
+```
+## How It Works
+### 1. Startup
+```
+1. Load agent-memory.json
+2. Extract facts and preferences
+3. Add to system prompt
+4. Agent "remembers" past information
+```
+### 2. During Conversation
+```
+User shares information
+       ↓
+Agent recognizes important fact
+       ↓
+Agent calls saveMemory()
+       ↓
+Saved to JSON file
+       ↓
+Available in future sessions
+```
+### 3. Memory Types
+**Facts**: General information
+```json
+{
+  "memories": [
+    {
+      "type": "fact",
+      "key": "user_name",
+      "value": "Alex",
+      "source": "user",
+      "timestamp": "2025-10-29T11:22:57.372Z"
+    }
+  ]
+}
+```
+**Preferences**:
+```json
+{
+  "memories": [
+    {
+      "type": "preference",
+      "key": "favorite_food",
+      "value": "pizza",
+      "source": "user",
+      "timestamp": "2025-10-29T11:22:58.022Z"
+    }
+  ]
+}
+```
+## Memory Integration Pattern
+### System Prompt Enhancement
+```
+Base Prompt:
+"You are a helpful assistant."
+Enhanced with Memory:
+"You are a helpful assistant with long-term memory.
+=== LONG-TERM MEMORY ===
+Known Facts:
+- User's name is Alex
+- User loves pizza"
+```
+### Tool-Assisted Saving
+```
+Agent decides when to save:
+User: "My favorite color is blue"
+      ↓
+Agent: "I should remember this"
+      ↓
+Calls: saveMemory(type="preference", key="color", content="blue")
+```
+## Real-World Applications
+**Personal Assistant**
+- Remember appointments, preferences, contacts
+- Personalized responses based on history
+**Customer Service**
+- Past interactions and issues
+- Customer preferences and context
+**Learning Tutor**
+- Student progress and weak areas
+- Adapted teaching based on history
+**Healthcare Assistant**
+- Medical history
+- Medication reminders
+- Health tracking
+## Memory Strategies
+### 1. Episodic Memory
+Store specific events and conversations:
+```
+- "On 2025-01-15, user asked about Python"
+- "User struggled with async concepts"
+```
+### 2. Semantic Memory
+Store facts and knowledge:
+```
+- "User is a software engineer"
+- "User prefers TypeScript over JavaScript"
+```
+### 3. Procedural Memory
+Store how-to information:
+```
+- "User's workflow: design → code → test"
+- "User's preferred tools: VS Code, Git"
+```
+## Challenges & Solutions
+### Challenge 1: Memory Bloat
+**Problem**: Too many memories slow down agent
+**Solution**:
+- Importance scoring
+- Periodic cleanup
+- Summary compression
+### Challenge 2: Conflicting Information
+**Problem**: "User likes pizza" vs "User is vegan"
+**Solution**:
+- Timestamps for recency
+- Explicit updates
+- Conflict resolution logic
+### Challenge 3: Privacy
+**Problem**: Sensitive information in memory
+**Solution**:
+- Encryption at rest
+- Access controls
+- Expiration policies
+## Key Concepts
+### 1. Persistence
+Memory survives:
+- Application restarts
+- System reboots
+- Time gaps
+### 2. Context Augmentation
+Memories enhance system prompt:
+```
+Prompt = Base + Memories + User Input
+```
+### 3. Agent-Driven Storage
+Agent decides what to remember:
+```
+Important? → Save
+Trivial? → Ignore
+```
+## Evolution Path
+```
+1. Stateless → Each interaction independent
+2. Session memory → Remember during conversation
+3. Persistent memory → Remember across sessions
+4. Distributed memory → Share across instances
+5. Semantic search → Find relevant memories
+```
+## Best Practices
+1. **Structure memory**: Use types (facts, preferences, events)
+2. **Add timestamps**: Know when information was saved
+3. **Enable updates**: Allow overwriting old information
+4. **Implement search**: Find relevant memories efficiently
+5. **Monitor size**: Prevent unbounded growth
+## Comparison
+```
+Feature              Simple Agent    Memory Agent
+───────────────────  ─────────────   ──────────────
+Remembers names      ✗               ✓
+Recalls preferences  ✗               ✓
+Personalization      ✗               ✓
+Context continuity   ✗               ✓
+Cross-session state  ✗               ✓
+```
+## Key Takeaway
+Memory transforms agents from tools into assistants. They can build relationships, provide personalized experiences, and maintain context over time.
+This is essential for production AI agent systems.

examples/08_simple-agent-with-memory/agent-memory.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "memories": [
+    {
+      "type": "fact",
+      "key": "user_name",
+      "value": "Alex",
+      "source": "user",
+      "timestamp": "2025-11-05T20:24:58.220Z"
+    },
+    {
+      "type": "preference",
+      "key": "favorite_food",
+      "value": "pizza",
+      "source": "user",
+      "timestamp": "2025-11-05T20:24:58.848Z"
+    }
+  ],
+  "conversationHistory": []
+}

examples/08_simple-agent-with-memory/memory-manager.js ADDED Viewed

	@@ -0,0 +1,137 @@

+import fs from 'fs/promises';
+import path from 'path';
+import {fileURLToPath} from 'url';
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+export class MemoryManager {
+    constructor(memoryFileName = './memory.json') {
+        this.memoryFilePath = path.resolve(__dirname, memoryFileName);
+    }
+    async loadMemories() {
+        try {
+            const data = await fs.readFile(this.memoryFilePath, 'utf-8');
+            const json = JSON.parse(data);
+            // 🔧 Migrate old schema if needed
+            if (!json.memories) {
+                const upgraded = {memories: [], conversationHistory: []};
+                if (Array.isArray(json.facts)) {
+                    for (const f of json.facts) {
+                        upgraded.memories.push({
+                            type: 'fact',
+                            key: this.extractKey(f.content),
+                            value: this.extractValue(f.content),
+                            source: 'migration',
+                            timestamp: f.timestamp || new Date().toISOString()
+                        });
+                    }
+                }
+                if (json.preferences && typeof json.preferences === 'object') {
+                    for (const [key, val] of Object.entries(json.preferences)) {
+                        upgraded.memories.push({
+                            type: 'preference',
+                            key,
+                            value: this.extractValue(val),
+                            source: 'migration',
+                            timestamp: new Date().toISOString()
+                        });
+                    }
+                }
+                await this.saveMemories(upgraded);
+                return upgraded;
+            }
+            if (!Array.isArray(json.memories)) json.memories = [];
+            if (!Array.isArray(json.conversationHistory)) json.conversationHistory = [];
+            return json;
+        } catch {
+            return {memories: [], conversationHistory: []};
+        }
+    }
+    async saveMemories(memories) {
+        await fs.writeFile(this.memoryFilePath, JSON.stringify(memories, null, 2));
+    }
+    // Add or update memory without duplicates
+    async addMemory({type, key, value, source = 'user'}) {
+        const data = await this.loadMemories();
+        // Normalize for comparison
+        const normType = type.trim().toLowerCase();
+        const normKey = key.trim().toLowerCase();
+        const normValue = value.trim();
+        // Check if same key+type already exists
+        const existingIndex = data.memories.findIndex(
+            m => m.type === normType && m.key.toLowerCase() === normKey
+        );
+        if (existingIndex >= 0) {
+            const existing = data.memories[existingIndex];
+            // Update value if changed
+            if (existing.value !== normValue) {
+                existing.value = normValue;
+                existing.timestamp = new Date().toISOString();
+                existing.source = source;
+                console.log(`Updated memory: ${normKey} → ${normValue}`);
+            } else {
+                console.log(`Skipped duplicate memory: ${normKey}`);
+            }
+        } else {
+            // Add new memory
+            data.memories.push({
+                type: normType,
+                key: normKey,
+                value: normValue,
+                source,
+                timestamp: new Date().toISOString()
+            });
+            console.log(`Added memory: ${normKey} = ${normValue}`);
+        }
+        await this.saveMemories(data);
+    }
+    async getMemorySummary() {
+        const data = await this.loadMemories();
+        const facts = Array.isArray(data.memories)
+            ? data.memories.filter(m => m.type === 'fact')
+            : [];
+        const prefs = Array.isArray(data.memories)
+            ? data.memories.filter(m => m.type === 'preference')
+            : [];
+        let summary = "\n=== LONG-TERM MEMORY ===\n";
+        if (facts.length > 0) {
+            summary += "\nKnown Facts:\n";
+            for (const f of facts) summary += `- ${f.key}: ${f.value}\n`;
+        }
+        if (prefs.length > 0) {
+            summary += "\nUser Preferences:\n";
+            for (const p of prefs) summary += `- ${p.key}: ${p.value}\n`;
+        }
+        return summary;
+    }
+    extractKey(content) {
+        if (typeof content !== 'string') return 'unknown';
+        const [key] = content.split(':').map(s => s.trim());
+        return key || 'unknown';
+    }
+    extractValue(content) {
+        if (typeof content !== 'string') return '';
+        const parts = content.split(':').map(s => s.trim());
+        return parts.length > 1 ? parts.slice(1).join(':') : content;
+    }
+}

examples/08_simple-agent-with-memory/simple-agent-with-memory.js ADDED Viewed

	@@ -0,0 +1,93 @@

+import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+import {MemoryManager} from "./memory-manager.js";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const llama = await getLlama({debug: false});
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'Qwen3-1.7B-Q8_0.gguf'
+    )
+});
+const context = await model.createContext({contextSize: 2000});
+// Initialize memory manager
+const memoryManager = new MemoryManager('./agent-memory.json');
+// Load existing memories and add to system prompt
+const memorySummary = await memoryManager.getMemorySummary();
+const systemPrompt = `
+You are a helpful assistant with long-term memory.
+Before calling any function, always follow this reasoning process:
+1. **Compare** new user statements against existing memories below.
+2. **If the same key and value already exist**, do NOT call saveMemory again.
+   - Instead, simply acknowledge the known information.
+   - Example: if the user says "My name is Malua" and memory already says "user_name: Malua", reply "Yes, I remember your name is Malua."
+3. **If the user provides an updated value** (e.g., "I actually prefer sushi now"),
+   then call saveMemory once to update the value.
+4. **Only call saveMemory for genuinely new information.**
+When saving new data, call saveMemory with structured fields:
+- type: "fact" or "preference"
+- key: short descriptive identifier (e.g., "user_name", "favorite_food")
+- value: the specific information (e.g., "Malua", "chinua")
+Examples:
+saveMemory({ type: "fact", key: "user_name", value: "Malua" })
+saveMemory({ type: "preference", key: "favorite_food", value: "chinua" })
+${memorySummary}
+`;
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt,
+});
+// Function to save memories
+const saveMemory = defineChatSessionFunction({
+    description: "Save important information to long-term memory (user preferences, facts, personal details)",
+    params: {
+        type: "object",
+        properties: {
+            type: {
+                type: "string",
+                enum: ["fact", "preference"]
+            },
+            key: {type: "string"},
+            value: {type: "string"}
+        },
+        required: ["type", "key", "value"]
+    },
+    async handler({type, key, value}) {
+        await memoryManager.addMemory({type, key, value});
+        return `Memory saved: ${key} = ${value}`;
+    }
+});
+const functions = {saveMemory};
+// Example conversation
+const prompt1 = "Hi! My name is Alex and I love pizza.";
+const response1 = await session.prompt(prompt1, {functions});
+console.log("AI: " + response1);
+// Later conversation (even after restarting the script)
+const prompt2 = "What's my favorite food?";
+const response2 = await session.prompt(prompt2, {functions});
+console.log("AI: " + response2);
+// Clean up
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()

examples/09_react-agent/CODE.md ADDED Viewed

	@@ -0,0 +1,278 @@

+# Code Explanation: react-agent.js
+This example implements the **ReAct pattern** (Reasoning + Acting), a powerful approach for multi-step problem-solving with tools.
+## What is ReAct?
+ReAct = **Rea**soning + **Act**ing
+The agent alternates between:
+1. **Thinking** (reasoning about what to do)
+2. **Acting** (using tools)
+3. **Observing** (seeing tool results)
+4. Repeat until problem is solved
+## Key Components
+### 1. ReAct System Prompt (Lines 20-52)
+```javascript
+const systemPrompt = `You are a mathematical assistant that uses the ReAct approach.
+CRITICAL: You must follow this EXACT pattern:
+Thought: [Explain what calculation you need]
+Action: [Call ONE tool]
+Observation: [Wait for result]
+Thought: [Analyze result]
+Action: [Call another tool if needed]
+...
+Thought: [Once you have all information]
+Answer: [Final answer and STOP]
+```
+**Key instructions:**
+- Explicit step-by-step pattern
+- One tool call at a time
+- Continue until final answer
+- Stop after "Answer:"
+### 2. Calculator Tools (Lines 60-159)
+Four basic math operations:
+```javascript
+const add = defineChatSessionFunction({...});
+const multiply = defineChatSessionFunction({...});
+const subtract = defineChatSessionFunction({...});
+const divide = defineChatSessionFunction({...});
+```
+Each tool:
+- Takes two numbers (a, b)
+- Performs operation
+- Logs the call
+- Returns result as string
+### 3. ReAct Agent Loop (Lines 164-212)
+```javascript
+async function reactAgent(userPrompt, maxIterations = 10) {
+    let iteration = 0;
+    let fullResponse = "";
+    while (iteration < maxIterations) {
+        iteration++;
+        // Prompt the LLM
+        const response = await session.prompt(
+            iteration === 1 ? userPrompt : "Continue your reasoning.",
+            {
+                functions,
+                maxTokens: 300,
+                onTextChunk: (chunk) => {
+                    process.stdout.write(chunk);  // Stream output
+                    currentChunk += chunk;
+                }
+            }
+        );
+        fullResponse += currentChunk;
+        // Check if final answer reached
+        if (response.toLowerCase().includes("answer:")) {
+            return fullResponse;
+        }
+    }
+}
+```
+**How it works:**
+1. Loop up to maxIterations times
+2. On first iteration: send user's question
+3. On subsequent iterations: ask to continue
+4. Stream output in real-time
+5. Stop when "Answer:" appears
+6. Return full reasoning trace
+### 4. Example Query (Lines 215-220)
+```javascript
+const queries = [
+    "A store sells 15 items Monday at $8 each, 20 items Tuesday at $8 each,
+     10 items Wednesday at $8 each. What's the average items per day and total revenue?"
+];
+```
+Complex problem requiring multiple calculations:
+- 15 × 8
+- 20 × 8
+- 10 × 8
+- Sum results
+- Calculate average
+- Format answer
+## The ReAct Flow
+### Example Execution
+```
+USER: "A store sells 15 items at $8 each and 20 items at $8 each. Total revenue?"
+Iteration 1:
+Thought: First I need to calculate 15 × 8
+Action: multiply(15, 8)
+Observation: 120
+Iteration 2:
+Thought: Now I need to calculate 20 × 8
+Action: multiply(20, 8)
+Observation: 160
+Iteration 3:
+Thought: Now I need to add both results
+Action: add(120, 160)
+Observation: 280
+Iteration 4:
+Thought: I have the total revenue
+Answer: The total revenue is $280
+```
+**Loop stops** because "Answer:" was detected.
+## Why ReAct Works
+### Traditional Approach (Fails)
+```
+User: "Complex math problem"
+LLM: [Tries to calculate in head]
+→ Often wrong due to arithmetic errors
+```
+### ReAct Approach (Succeeds)
+```
+User: "Complex math problem"
+LLM: "I need to calculate X"
+  → Calls calculator tool
+  → Gets accurate result
+  → Uses result for next step
+  → Continues until solved
+```
+## Key Concepts
+### 1. Explicit Reasoning
+The agent must "show its work":
+```
+Thought: What do I need to do?
+Action: Do it
+Observation: What happened?
+```
+### 2. Tool Use at Each Step
+```
+Don't calculate: 15 × 8 = 120 (may be wrong)
+Do calculate: multiply(15, 8) → 120 (always correct)
+```
+### 3. Iterative Problem Solving
+```
+Complex Problem → Break into steps → Solve each step → Combine results
+```
+### 4. Self-Correction
+Agent can observe bad results and try again:
+```
+Thought: That doesn't look right
+Action: Let me recalculate
+```
+## Debug Output
+The code includes PromptDebugger (lines 228-234):
+```javascript
+const promptDebugger = new PromptDebugger({
+    outputDir: './logs',
+    filename: 'react_calculator.txt',
+    includeTimestamp: true
+});
+await promptDebugger.debugContextState({session, model});
+```
+Saves complete prompt history to logs for debugging.
+## Expected Output
+```
+========================================================
+USER QUESTION: [Problem statement]
+========================================================
+--- Iteration 1 ---
+Thought: First I need to multiply 15 by 8
+Action: multiply(15, 8)
+   🔧 TOOL CALLED: multiply(15, 8)
+   📊 RESULT: 120
+Observation: 120
+--- Iteration 2 ---
+Thought: Now I need to multiply 20 by 8
+Action: multiply(20, 8)
+   🔧 TOOL CALLED: multiply(20, 8)
+   📊 RESULT: 160
+... continues ...
+--- Iteration N ---
+Thought: I have all the information
+Answer: [Final answer]
+========================================================
+FINAL ANSWER REACHED
+========================================================
+```
+## Why This Matters
+### Enables Complex Tasks
+- Multi-step reasoning
+- Accurate calculations
+- Self-correction
+- Transparent process
+### Foundation of Modern Agents
+This pattern powers:
+- LangChain agents
+- AutoGPT
+- BabyAGI
+- Most production agent frameworks
+### Observable Reasoning
+Unlike "black box" LLMs, you see:
+- What the agent is thinking
+- Which tools it uses
+- Why it makes decisions
+- Where it might fail
+## Best Practices
+1. **Clear system prompt**: Define exact pattern
+2. **One tool per action**: Don't combine operations
+3. **Limit iterations**: Prevent infinite loops
+4. **Stream output**: Show progress
+5. **Debug thoroughly**: Use PromptDebugger
+## Comparison
+```
+Simple Agent vs ReAct Agent
+────────────────────────────
+Single prompt/response      Multi-step iteration
+One tool call (maybe)       Multiple tool calls
+No visible reasoning        Explicit reasoning
+Works for simple tasks      Handles complex problems
+```
+This is the state-of-the-art pattern for building capable AI agents!

examples/09_react-agent/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,372 @@

+# Concept: ReAct Pattern for AI Agents
+## What is ReAct?
+**ReAct** (Reasoning + Acting) is a framework that combines:
+- **Reasoning**: Thinking through problems step-by-step
+- **Acting**: Using tools to accomplish subtasks
+- **Observing**: Learning from tool results
+This creates agents that can solve complex, multi-step problems reliably.
+## The Core Pattern
+```
+┌─────────────┐
+│   Problem   │
+└──────┬──────┘
+       │
+       ▼
+┌─────────────────────────────────────┐
+│          ReAct Loop                 │
+│                                     │
+│  ┌──────────────────────────────┐  │
+│  │  1. THOUGHT                  │  │
+│  │  "What do I need to do?"     │  │
+│  └─────────────┬────────────────┘  │
+│                ▼                    │
+│  ┌──────────────────────────────┐  │
+│  │  2. ACTION                   │  │
+│  │  Call tool with parameters   │  │
+│  └─────────────┬────────────────┘  │
+│                ▼                    │
+│  ┌──────────────────────────────┐  │
+│  │  3. OBSERVATION              │  │
+│  │  Receive tool result         │  │
+│  └─────────────┬────────────────┘  │
+│                │                    │
+│                └──► Repeat or      │
+│                     Final Answer   │
+└─────────────────────────────────────┘
+```
+## Why ReAct Matters
+### Traditional LLMs Struggle With:
+1. **Complex calculations** - arithmetic errors
+2. **Multi-step problems** - lose track of progress
+3. **Using tools** - don't know when/how
+4. **Explaining decisions** - black box reasoning
+### ReAct Solves This:
+1. **Reliable calculations** - delegates to tools
+2. **Structured progress** - explicit steps
+3. **Tool orchestration** - knows when to use what
+4. **Transparent reasoning** - visible thought process
+## The Three Components
+### 1. Thought (Reasoning)
+The agent reasons about:
+- What information is needed
+- Which tool to use
+- Whether the result makes sense
+- What to do next
+Example:
+```
+Thought: I need to calculate 15 × 8 to find revenue
+```
+### 2. Action (Tool Use)
+The agent calls a tool with specific parameters:
+Example:
+```
+Action: multiply(15, 8)
+```
+### 3. Observation (Learning)
+The agent receives and interprets the tool result:
+Example:
+```
+Observation: 120
+```
+## Complete Example
+```
+Problem: "If 15 items cost $8 each and 20 items cost $8 each,
+          what's the total revenue?"
+Thought: First I need to calculate revenue from 15 items
+Action: multiply(15, 8)
+Observation: 120
+Thought: Now I need revenue from 20 items
+Action: multiply(20, 8)
+Observation: 160
+Thought: Now I add both revenues
+Action: add(120, 160)
+Observation: 280
+Thought: I have the final answer
+Answer: The total revenue is $280
+```
+## Key Benefits
+### 1. Reliability
+- Tools provide accurate results
+- No arithmetic mistakes
+- Verifiable calculations
+### 2. Transparency
+- See each reasoning step
+- Understand decision-making
+- Debug easily
+### 3. Scalability
+- Handle complex problems
+- Break into manageable steps
+- Add more tools as needed
+### 4. Flexibility
+- Works with any tools
+- Adapts to problem complexity
+- Self-corrects when needed
+## Comparison with Other Approaches
+### Zero-Shot Prompting
+```
+User: "Calculate 15×8 + 20×8"
+LLM: "The answer is 279"  ❌ Wrong!
+```
+**Problem**: LLM calculates in head, makes errors
+### Chain-of-Thought
+```
+User: "Calculate 15×8 + 20×8"
+LLM: "Let me think step by step:
+     15×8 = 120
+     20×8 = 160
+     120+160 = 279"  ❌ Still wrong!
+```
+**Problem**: Shows work but still miscalculates
+### ReAct (This Implementation)
+```
+User: "Calculate 15×8 + 20×8"
+Agent:
+  Thought: Calculate 15×8
+  Action: multiply(15, 8)
+  Observation: 120
+  Thought: Calculate 20×8
+  Action: multiply(20, 8)
+  Observation: 160
+  Thought: Add results
+  Action: add(120, 160)
+  Observation: 280
+  Answer: 280  ✅ Correct!
+```
+**Success**: Uses tools, gets accurate results
+## Architecture Diagram
+```
+┌──────────────────────────────────────┐
+���          User Question               │
+└──────────────┬───────────────────────┘
+               │
+               ▼
+┌──────────────────────────────────────┐
+│      LLM with ReAct Prompt           │
+│                                      │
+│  "Think, Act, Observe pattern"       │
+└──────┬───────────────────────────────┘
+       │
+       ├──► Generates: "Thought: ..."
+       │
+       ├──► Generates: "Action: tool(params)"
+       │         │
+       │         ▼
+       │    ┌─────────────────┐
+       │    │  Tool Executor  │
+       │    │                 │
+       │    │  - multiply()   │
+       │    │  - add()        │
+       │    │  - divide()     │
+       │    │  - subtract()   │
+       │    └─────────┬───────┘
+       │              │
+       │              ▼
+       └───────── "Observation: result"
+       │
+       ├──► Next iteration or Final Answer
+       │
+       ▼
+┌──────────────────────────────────────┐
+│         Final Answer                 │
+└──────────────────────────────────────┘
+```
+## Implementation Strategies
+### 1. Explicit Pattern Enforcement
+Force the LLM to follow structure:
+```javascript
+systemPrompt: `CRITICAL: Follow this EXACT pattern:
+Thought: [reasoning]
+Action: [tool call]
+Observation: [result]
+...
+Answer: [final answer]`
+```
+### 2. Iteration Control
+Prevent infinite loops:
+```javascript
+maxIterations = 10  // Safety limit
+```
+### 3. Streaming Output
+Show progress in real-time:
+```javascript
+onTextChunk: (chunk) => {
+    process.stdout.write(chunk);
+}
+```
+### 4. Answer Detection
+Know when to stop:
+```javascript
+if (response.includes("Answer:")) {
+    return fullResponse;  // Done!
+}
+```
+## Real-World Applications
+### 1. Math & Science
+- Complex calculations
+- Multi-step derivations
+- Unit conversions
+### 2. Data Analysis
+- Query databases
+- Process results
+- Generate reports
+### 3. Research Assistants
+- Search multiple sources
+- Synthesize information
+- Cite sources
+### 4. Coding Agents
+- Read code
+- Run tests
+- Fix bugs
+- Refactor
+### 5. Customer Support
+- Query knowledge base
+- Check order status
+- Process refunds
+- Escalate issues
+## Limitations & Considerations
+### 1. Iteration Cost
+Each thought/action/observation cycle costs tokens and time.
+**Solution**: Use efficient models, limit iterations
+### 2. Tool Quality
+ReAct is only as good as its tools.
+**Solution**: Build robust, well-tested tools
+### 3. Prompt Engineering
+System prompt must be very clear.
+**Solution**: Test extensively, iterate on prompt
+### 4. Error Handling
+Tools can fail or return unexpected results.
+**Solution**: Add error handling, validation
+## Advanced Patterns
+### Self-Correction
+```
+Thought: That result seems wrong
+Action: verify(previous_result)
+Observation: Error detected
+Thought: Let me recalculate
+Action: multiply(15, 8)  # Try again
+```
+### Meta-Reasoning
+```
+Thought: I've used 5 iterations, I should finish soon
+Action: summarize_progress()
+Observation: Still need to add final numbers
+Thought: One more step should do it
+```
+### Dynamic Tool Selection
+```
+Thought: This is a division problem
+Action: divide(10, 2)  # Chooses right tool
+Thought: Now I need to add
+Action: add(5, 3)  # Switches tools
+```
+## Research Origins
+ReAct was introduced in:
+> **"ReAct: Synergizing Reasoning and Acting in Language Models"**
+> Yao et al., 2022
+> Paper: https://arxiv.org/abs/2210.03629
+Key insight: Combining reasoning traces with task-specific actions creates more powerful agents than either alone.
+## Modern Frameworks Using ReAct
+1. **LangChain** - AgentExecutor with ReAct
+2. **AutoGPT** - Autonomous task execution
+3. **BabyAGI** - Task management system
+4. **GPT Engineer** - Code generation
+5. **ChatGPT Plugins** - Tool-using chatbots
+## Why Learn This Pattern?
+### 1. Foundation of Modern Agents
+Nearly all production agent systems use ReAct or similar patterns.
+### 2. Understandable AI
+Unlike black-box models, you see exactly what's happening.
+### 3. Extendable
+Easy to add new tools and capabilities.
+### 4. Debuggable
+When things go wrong, you can see where and why.
+### 5. Production-Ready
+This pattern scales from demos to real applications.
+## Summary
+ReAct transforms LLMs from:
+- **Brittle calculators** → Reliable problem solvers
+- **Black boxes** → Transparent reasoners
+- **Single-shot answerers** → Iterative thinkers
+- **Isolated models** → Tool-using agents
+It's the bridge between language models and autonomous agents that can actually accomplish complex tasks reliably.

examples/09_react-agent/react-agent.js ADDED Viewed

	@@ -0,0 +1,241 @@

+import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
+import {fileURLToPath} from "url";
+import path from "path";
+import {PromptDebugger} from "../../helper/prompt-debugger.js";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const debug = false;
+const llama = await getLlama({debug});
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'hf_giladgd_gpt-oss-20b.MXFP4.gguf'
+    )
+});
+const context = await model.createContext({contextSize: 2000});
+// ReAct-style system prompt for mathematical reasoning
+const systemPrompt = `You are a mathematical assistant that uses the ReAct (Reasoning + Acting) approach.
+CRITICAL: You must follow this EXACT pattern for every problem:
+Thought: [Explain what calculation you need to do next and why]
+Action: [Call ONE tool with specific numbers]
+Observation: [Wait for the tool result]
+Thought: [Analyze the result and decide next step]
+Action: [Call another tool if needed]
+Observation: [Wait for the tool result]
+... (repeat as many times as needed)
+Thought: [Once you have ALL the information needed to answer the question]
+Answer: [Give the final answer and STOP]
+RULES:
+1. Only write "Answer:" when you have the complete final answer to the user's question
+2. After writing "Answer:", DO NOT continue calculating or thinking
+3. Break complex problems into the smallest possible steps
+4. Use tools for ALL calculations - never calculate in your head
+5. Each Action should call exactly ONE tool
+EXAMPLE:
+User: "What is 5 + 3, then multiply that by 2?"
+Thought: First I need to add 5 and 3
+Action: add(5, 3)
+Observation: 8
+Thought: Now I need to multiply that result by 2
+Action: multiply(8, 2)
+Observation: 16
+Thought: I now have the final result
+Answer: 16`;
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt,
+});
+// Simple calculator tools that force step-by-step reasoning
+const add = defineChatSessionFunction({
+    description: "Add two numbers together",
+    params: {
+        type: "object",
+        properties: {
+            a: {
+                type: "number",
+                description: "First number"
+            },
+            b: {
+                type: "number",
+                description: "Second number"
+            }
+        },
+        required: ["a", "b"]
+    },
+    async handler(params) {
+        const result = params.a + params.b;
+        console.log(`\n   🔧 TOOL CALLED: add(${params.a}, ${params.b})`);
+        console.log(`   📊 RESULT: ${result}\n`);
+        return result.toString();
+    }
+});
+const multiply = defineChatSessionFunction({
+    description: "Multiply two numbers together",
+    params: {
+        type: "object",
+        properties: {
+            a: {
+                type: "number",
+                description: "First number"
+            },
+            b: {
+                type: "number",
+                description: "Second number"
+            }
+        },
+        required: ["a", "b"]
+    },
+    async handler(params) {
+        const result = params.a * params.b;
+        console.log(`\n   🔧 TOOL CALLED: multiply(${params.a}, ${params.b})`);
+        console.log(`   📊 RESULT: ${result}\n`);
+        return result.toString();
+    }
+});
+const subtract = defineChatSessionFunction({
+    description: "Subtract second number from first number",
+    params: {
+        type: "object",
+        properties: {
+            a: {
+                type: "number",
+                description: "Number to subtract from"
+            },
+            b: {
+                type: "number",
+                description: "Number to subtract"
+            }
+        },
+        required: ["a", "b"]
+    },
+    async handler(params) {
+        const result = params.a - params.b;
+        console.log(`\n   🔧 TOOL CALLED: subtract(${params.a}, ${params.b})`);
+        console.log(`   📊 RESULT: ${result}\n`);
+        return result.toString();
+    }
+});
+const divide = defineChatSessionFunction({
+    description: "Divide first number by second number",
+    params: {
+        type: "object",
+        properties: {
+            a: {
+                type: "number",
+                description: "Dividend (number to be divided)"
+            },
+            b: {
+                type: "number",
+                description: "Divisor (number to divide by)"
+            }
+        },
+        required: ["a", "b"]
+    },
+    async handler(params) {
+        if (params.b === 0) {
+            console.log(`\n   🔧 TOOL CALLED: divide(${params.a}, ${params.b})`);
+            console.log(`   ❌ ERROR: Division by zero\n`);
+            return "Error: Cannot divide by zero";
+        }
+        const result = params.a / params.b;
+        console.log(`\n   🔧 TOOL CALLED: divide(${params.a}, ${params.b})`);
+        console.log(`   📊 RESULT: ${result}\n`);
+        return result.toString();
+    }
+});
+const functions = {add, multiply, subtract, divide};
+// ReAct Agent execution loop with proper output handling
+async function reactAgent(userPrompt, maxIterations = 10) {
+    console.log("\n" + "=".repeat(70));
+    console.log("USER QUESTION:", userPrompt);
+    console.log("=".repeat(70) + "\n");
+    let iteration = 0;
+    let fullResponse = "";
+    while (iteration < maxIterations) {
+        iteration++;
+        console.log(`--- Iteration ${iteration} ---`);
+        // Prompt with onTextChunk to capture streaming output
+        let currentChunk = "";
+        const response = await session.prompt(
+            iteration === 1 ? userPrompt : "Continue your reasoning. What's the next step?",
+            {
+                functions,
+                maxTokens: 300,
+                onTextChunk: (chunk) => {
+                    // Print each chunk as it arrives
+                    process.stdout.write(chunk);
+                    currentChunk += chunk;
+                }
+            }
+        );
+        console.log(); // New line after streaming
+        fullResponse += currentChunk;
+        // If no output was generated in this iteration, something's wrong
+        if (!currentChunk.trim() && !response.trim()) {
+            console.log("   (No output generated this iteration)\n");
+        }
+        // Check if we have a final answer
+        if (response.toLowerCase().includes("answer:") ||
+            fullResponse.toLowerCase().includes("answer:")) {
+            console.log("\n" + "=".repeat(70));
+            console.log("FINAL ANSWER REACHED");
+            console.log("=".repeat(70));
+            return fullResponse;
+        }
+    }
+    console.log("\n⚠️  Max iterations reached without final answer");
+    return fullResponse || "Could not complete reasoning within iteration limit.";
+}
+// Test queries that require multi-step reasoning
+const queries = [
+    // "If I buy 3 apples at $2 each and 4 oranges at $3 each, how much do I spend in total?",
+    // "Calculate: (15 + 7) × 3 - 10",
+    //"A pizza costs $20. If 4 friends split it equally, how much does each person pay?",
+    "A store sells 15 items on Monday at $8 each, 20 items on Tuesday at $8 each, and 10 items on Wednesday at $8 each. What's the average number of items sold per day, and what's the total revenue?",
+];
+for (const query of queries) {
+    await reactAgent(query, 3);
+    console.log("\n");
+}
+// Debug
+const promptDebugger = new PromptDebugger({
+    outputDir: './logs',
+    filename: 'react_calculator.txt',
+    includeTimestamp: true,
+    appendMode: false
+});
+await promptDebugger.debugContextState({session, model});
+// Clean up
+session.dispose()
+context.dispose()
+model.dispose()
+llama.dispose()

examples/10_aot-agent/CODE.md ADDED Viewed

	@@ -0,0 +1,178 @@

+# Code Explanation: aot-agent.js
+This example demonstrates the **Atom of Thought** prompting pattern using a mathematical calculator as the domain.
+## Three-Phase Architecture
+### Phase 1: Planning (LLM)
+```javascript
+async function generatePlan(userPrompt) {
+    const grammar = await llama.createGrammarForJsonSchema(planSchema);
+    const planText = await session.prompt(userPrompt, { grammar });
+    return grammar.parse(planText);
+}
+```
+**Key points:**
+- LLM outputs **structured JSON** (enforced by grammar)
+- LLM does NOT execute calculations
+- Each atom represents one operation
+- Dependencies are explicit (`dependsOn` array)
+**Example output:**
+```json
+{
+  "atoms": [
+    {"id": 1, "kind": "tool", "name": "add", "input": {"a": 15, "b": 7}},
+    {"id": 2, "kind": "tool", "name": "multiply", "input": {"a": "<result_of_1>", "b": 3}},
+    {"id": 3, "kind": "tool", "name": "subtract", "input": {"a": "<result_of_2>", "b": 10}},
+    {"id": 4, "kind": "final", "name": "report", "dependsOn": [3]}
+  ]
+}
+```
+### Phase 2: Validation (System)
+```javascript
+function validatePlan(plan) {
+    const allowedTools = new Set(Object.keys(tools));
+    for (const atom of plan.atoms) {
+        if (ids.has(atom.id)) throw new Error(`Duplicate ID`);
+        if (atom.kind === "tool" && !allowedTools.has(atom.name)) {
+            throw new Error(`Unknown tool: ${atom.name}`);
+        }
+    }
+}
+```
+**Validates:**
+- No duplicate atom IDs
+- Only allowed tools are referenced
+- Dependencies make sense
+- JSON structure is correct
+### Phase 3: Execution (System)
+```javascript
+function executePlan(plan) {
+    const state = {};
+    for (const atom of sortedAtoms) {
+        // Resolve dependencies
+        let resolvedInput = {};
+        for (const [key, value] of Object.entries(atom.input)) {
+            if (value.startsWith('<result_of_')) {
+                const refId = parseInt(value.match(/\d+/)[0]);
+                resolvedInput[key] = state[refId];
+            }
+        }
+        // Execute
+        state[atom.id] = tools[atom.name](resolvedInput.a, resolvedInput.b);
+    }
+}
+```
+**Key behaviors:**
+- Executes atoms in order (sorted by ID)
+- Resolves `<result_of_N>` references from state
+- Each atom stores its result in `state[atom.id]`
+- Execution is **deterministic** (same plan + same state = same result)
+## Why This Matters
+### Comparison with ReAct
+| Aspect | ReAct | Atom of Thought |
+|--------|-------|-----------------|
+| **Planning** | Implicit (in LLM reasoning) | Explicit (JSON structure) |
+| **Execution** | LLM decides next step | System follows plan |
+| **Validation** | None | Before execution |
+| **Debugging** | Hard (trace through text) | Easy (inspect atoms) |
+| **Testing** | Hard (mock LLM) | Easy (test executor) |
+| **Failures** | May hallucinate | Fail at specific atom |
+### Benefits
+1. **No hidden reasoning**: Every operation is an explicit atom
+2. **Testable**: Execute plan without LLM involvement
+3. **Debuggable**: Know exactly which atom failed
+4. **Auditable**: Plan is a data structure, not text
+5. **Deterministic**: Same input = same output (given same plan)
+## Tool Implementation
+Tools are **pure functions** with no side effects:
+```javascript
+const tools = {
+    add: (a, b) => {
+        const result = a + b;
+        console.log(`EXECUTING: add(${a}, ${b}) = ${result}`);
+        return result;
+    },
+    // ... more tools
+};
+```
+**Why pure functions?**
+- Easy to test
+- Easy to replay
+- No hidden state
+- Composable
+## State Flow
+```
+User Question
+      ↓
+[LLM generates plan]
+      ↓
+{atoms: [...]} ← JSON plan
+      ↓
+[System validates]
+      ↓
+Plan valid
+      ↓
+[System executes atom 1] → state[1] = result
+      ↓
+[System executes atom 2] → state[2] = result (uses state[1])
+      ↓
+[System executes atom 3] → state[3] = result (uses state[2])
+      ↓
+Final Answer
+```
+## Error Handling
+```javascript
+// Atom validation fails → re-prompt LLM
+validatePlan(plan); // throws if invalid
+// Tool execution fails → stop at that atom
+if (b === 0) throw new Error("Division by zero");
+// Dependency missing → clear error message
+if (!(depId in state)) {
+    throw new Error(`Atom ${atom.id} depends on incomplete atom ${depId}`);
+}
+```
+## When to Use AoT
+✅ **Use AoT when:**
+- Execution must be auditable
+- Failures must be recoverable
+- Multiple steps with dependencies
+- Testing is important
+- Compliance matters
+❌ **Don't use AoT when:**
+- Single-step tasks
+- Creative/exploratory tasks
+- Brainstorming
+- Natural conversation
+## Extension Ideas
+1. **Add compensation atoms** for rollback
+2. **Add retry logic** per atom
+3. **Parallelize independent atoms** (atoms with no shared dependencies)
+4. **Persist plan** for debugging
+5. **Visualize atom graph** (dependency tree)

examples/10_aot-agent/CONCEPT.md ADDED Viewed

	@@ -0,0 +1,265 @@

+# Concept: Atom of Thought (AoT) Pattern for AI Agents
+## The Core Idea
+**Atom of Thought = "SQL for Reasoning"**
+Just as SQL breaks complex data operations into atomic, composable statements, AoT breaks reasoning into minimal, executable steps.
+## What is an Atom?
+An atom is the **smallest unit of reasoning** that:
+1. Expresses exactly **one** idea
+2. Can be **validated independently**
+3. Can be **executed deterministically**
+4. **Cannot hide** a mistake
+### Examples
+❌ **Not atomic** (compound statement):
+```
+"Search for rooms in Graz and filter by capacity"
+```
+✅ **Atomic** (separate steps):
+```
+1. Search for rooms in Graz
+2. Filter rooms by minimum capacity of 30
+```
+## The Three Layers
+```
+┌─────────────────────────────────┐
+│   LLM (Planning Layer)          │
+│   - Proposes atomic plan        │
+│   - Does NOT execute            │
+└─────────────────────────────────┘
+              ↓
+┌─────────────────────────────────┐
+│   Validator (Safety Layer)      │
+│   - Checks plan structure       │
+│   - Validates dependencies      │
+└─────────────────────────────────┘
+              ↓
+┌─────────────────────────────────┐
+│   Executor (Execution Layer)    │
+│   - Runs atoms deterministically│
+│   - Manages state               │
+└─────────────────────────────────┘
+```
+## Why Separation Matters
+### Traditional LLM Approach (ReAct)
+```
+LLM thinks → LLM acts → LLM thinks → LLM acts
+```
+**Problem:** Execution logic lives inside the model (black box)
+### Atom of Thought Approach
+```
+LLM plans → System validates → System executes
+```
+**Benefit:** Execution logic lives in code (white box)
+## Mental Model
+Think of AoT as the difference between:
+| Cooking | Programming |
+|---------|------------|
+| **Recipe** (AoT plan) | **Algorithm** |
+| "Boil water" | `boilWater()` |
+| "Add pasta" | `addPasta()` |
+| "Cook 8 minutes" | `cook(8)` |
+vs.
+| Improvising | Natural Language |
+|-------------|------------------|
+| "Make dinner" | "Figure it out" |
+| (figure it out) | (hallucinate) |
+## The Atom Structure
+```javascript
+{
+  "id": 2,
+  "kind": "tool",           // tool | decision | final
+  "name": "multiply",       // operation name
+  "input": {                // explicit inputs
+    "a": "<result_of_1>",   // reference to previous result
+    "b": 3
+  },
+  "dependsOn": [1]          // must wait for atom 1
+}
+```
+**Why this structure?**
+- `id`: Establishes order
+- `kind`: Categorizes operation type
+- `name`: References executable function
+- `input`: Makes data flow explicit
+- `dependsOn`: Declares dependencies
+## Dependency Graph
+Atoms form a **directed acyclic graph (DAG)**:
+```
+     ┌─────┐
+     │  1  │ add(15, 7)
+     └──┬──┘
+        │
+     ┌──▼──┐
+     │  2  │ multiply(result_1, 3)
+     └──┬──┘
+        │
+     ┌──▼──┐
+     │  3  │ subtract(result_2, 10)
+     └──┬──┘
+        │
+     ┌──▼──┐
+     │  4  │ final
+     └─────┘
+```
+**Properties:**
+- Can be executed in topological order
+- Can parallelize independent branches
+- Failures stop at failed node
+- Easy to visualize and debug
+## State Management
+```javascript
+const state = {};
+// After atom 1
+state[1] = 22;  // result of add(15, 7)
+// After atom 2
+state[2] = 66;  // result of multiply(22, 3)
+// After atom 3
+state[3] = 56;  // result of subtract(66, 10)
+```
+**State is:**
+- Explicit (key-value map)
+- Immutable per atom (no overwrites)
+- Traceable (full history)
+- Inspectable (debugging)
+## Comparison: AoT vs ReAct
+### Question: "What is (15 + 7) × 3 - 10?"
+#### ReAct Output (text):
+```
+Thought: I need to add 15 and 7 first
+Action: add(15, 7)
+Observation: 22
+Thought: Now multiply by 3
+Action: multiply(22, 3)
+Observation: 66
+Thought: Finally subtract 10
+Action: subtract(66, 10)
+Observation: 56
+Answer: 56
+```
+#### AoT Output (JSON):
+```json
+{
+  "atoms": [
+    {"id": 1, "kind": "tool", "name": "add", "input": {"a": 15, "b": 7}},
+    {"id": 2, "kind": "tool", "name": "multiply", "input": {"a": "<result_of_1>", "b": 3}, "dependsOn": [1]},
+    {"id": 3, "kind": "tool", "name": "subtract", "input": {"a": "<result_of_2>", "b": 10}, "dependsOn": [2]},
+    {"id": 4, "kind": "final", "name": "report", "dependsOn": [3]}
+  ]
+}
+```
+### Key Differences
+| Aspect | ReAct | AoT |
+|--------|-------|-----|
+| **Format** | Natural language | Structured data |
+| **Validation** | Impossible | Before execution |
+| **Testing** | Mock entire LLM | Test executor independently |
+| **Debugging** | Read through text | Inspect atom N |
+| **Replay** | Re-run entire conversation | Re-run from any atom |
+| **Audit trail** | Conversational history | Data structure |
+## When AoT Shines
+### ✅ Perfect for:
+- **Multi-step workflows** (booking, pipelines)
+- **API orchestration** (call A, then B with A's result)
+- **Financial transactions** (auditable, reversible)
+- **Compliance-sensitive systems** (every step logged)
+- **Production agents** (failures must be clean)
+### ❌ Not ideal for:
+- **Creative writing**
+- **Open-ended exploration**
+- **Brainstorming**
+- **Single-step queries**
+## Real-World Analogy
+**ReAct is like a chef improvising:**
+- Flexible
+- Creative
+- Hard to replicate exactly
+- Mistakes hidden in process
+**AoT is like following a recipe:**
+- Repeatable
+- Testable
+- Step X failed? Start from step X-1
+- Every ingredient and action is explicit
+## The Hidden Benefit: Debuggability
+When something goes wrong:
+**ReAct:**
+```
+"The model said something weird in iteration 7"
+→ Re-read entire conversation
+→ Guess where it went wrong
+→ Hope it doesn't happen again
+```
+**AoT:**
+```
+"Atom 3 failed with 'Division by zero'"
+→ Look at atom 3's inputs
+→ Check where those inputs came from (atom 1, 2)
+→ Fix tool or add validation
+→ Re-run from atom 3
+```
+## Implementation Checklist
+✅ **LLM side:**
+- [ ] System prompt enforces JSON output
+- [ ] Grammar constrains to valid schema
+- [ ] Atoms are minimal (one operation each)
+- [ ] Dependencies are explicit
+✅ **System side:**
+- [ ] Validator checks tool names
+- [ ] Validator checks dependencies
+- [ ] Executor resolves references
+- [ ] Executor is deterministic
+- [ ] State is immutable
+## The Bottom Line
+**ReAct asks:**
+"What would an intelligent agent say next?"
+**AoT asks:**
+"What is the minimal, executable plan?"
+For production systems, you want the second question.

examples/10_aot-agent/aot-agent.js ADDED Viewed

	@@ -0,0 +1,416 @@

+import { getLlama, LlamaChatSession } from "node-llama-cpp";
+import { fileURLToPath } from "url";
+import path from "path";
+import { PromptDebugger } from "../../helper/prompt-debugger.js";
+import { JsonParser } from "../../helper/json-parser.js";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const debug = false;
+const llama = await getLlama({ debug });
+const model = await llama.loadModel({
+    modelPath: path.join(
+        __dirname,
+        '..',
+        '..',
+        'models',
+        'Qwen3-1.7B-Q8_0.gguf'
+    )
+});
+const context = await model.createContext({ contextSize: 2000 });
+// Atom of Thought system prompt - LLM only plans, doesn't execute
+const systemPrompt = `You are a mathematical planning assistant using Atom of Thought methodology.
+CRITICAL RULES:
+1. Extract every number from the user's question and put it in the "input" field.
+2. Each atom expresses EXACTLY ONE operation: add, subtract, multiply, divide.
+3. NEVER combine operations in one atom. For example, "(5 + 3) × 2" → must be TWO atoms: one for add, one for multiply.
+4. The "final" atom reports only the result of the last computational atom; it must NOT have its own input. Do not include an "input" field in final atoms.
+5. Use "<result_of_N>" to reference previous atom results; never invent calculations in the final atom.
+6. Output ONLY valid JSON matching the schema, with no explanation or extra text.
+CORRECT EXAMPLE for "What is (15 + 7) × 3 - 10?":
+{
+  "atoms": [
+    {"id": 1, "kind": "tool", "name": "add", "input": {"a": 15, "b": 7}, "dependsOn": []},
+    {"id": 2, "kind": "tool", "name": "multiply", "input": {"a": "<result_of_1>", "b": 3}, "dependsOn": [1]},
+    {"id": 3, "kind": "tool", "name": "subtract", "input": {"a": "<result_of_2>", "b": 10}, "dependsOn": [2]},
+    {"id": 4, "kind": "final", "name": "report", "dependsOn": [3]}
+  ]
+}
+WRONG EXAMPLES:
+- Empty input: {"input": {}}
+- Missing numbers: {"input": {"a": "<result_of_1>"}}
+- Combined operations: "add then multiply" → must be TWO atoms
+- Final atom with input: {"kind": "final", "input": {"a": 5}} is INVALID
+Available tools: add, subtract, multiply, divide
+- Each tool requires: {"a": <number or reference>, "b": <number or reference>}
+- kind options: "tool", "decision", "final"
+- dependsOn: array of atom IDs that must complete first
+Always extract the actual numbers from the question and put them in the input fields! Never combine operations or invent calculations in final atoms.`;
+// Define JSON schema for plan validation
+const planSchema = {
+    type: "object",
+    properties: {
+        atoms: {
+            type: "array",
+            items: {
+                type: "object",
+                properties: {
+                    id: { type: "number" },
+                    kind: { enum: ["tool", "decision", "final"] },
+                    name: { type: "string" },
+                    input: {
+                        type: "object",
+                        properties: {
+                            a: {
+                                oneOf: [
+                                    { type: "number" },
+                                    { type: "string", pattern: "^<result_of_\\d+>$" }
+                                ]
+                            },
+                            b: {
+                                oneOf: [
+                                    { type: "number" },
+                                    { type: "string", pattern: "^<result_of_\\d+>$" }
+                                ]
+                            }
+                        }
+                    },
+                    dependsOn: {
+                        type: "array",
+                        items: { type: "number" }
+                    }
+                },
+                required: ["id", "kind", "name"]
+            }
+        }
+    },
+    required: ["atoms"]
+};
+const session = new LlamaChatSession({
+    contextSequence: context.getSequence(),
+    systemPrompt,
+});
+// Tool implementations (pure functions, deterministic)
+const tools = {
+    add: (a, b) => {
+        const result = a + b;
+        console.log(`EXECUTING: add(${a}, ${b}) = ${result}`);
+        return result;
+    },
+    subtract: (a, b) => {
+        const result = a - b;
+        console.log(`EXECUTING: subtract(${a}, ${b}) = ${result}`);
+        return result;
+    },
+    multiply: (a, b) => {
+        const result = a * b;
+        console.log(`EXECUTING: multiply(${a}, ${b}) = ${result}`);
+        return result;
+    },
+    divide: (a, b) => {
+        if (b === 0) {
+            console.log(`ERROR: divide(${a}, ${b}) - Division by zero`);
+            throw new Error("Division by zero");
+        }
+        const result = a / b;
+        console.log(`EXECUTING: divide(${a}, ${b}) = ${result}`);
+        return result;
+    }
+};
+// Decision handlers (for complex logic)
+const decisions = {
+    average: (values) => {
+        const sum = values.reduce((acc, v) => acc + v, 0);
+        const avg = sum / values.length;
+        console.log(`DECISION: average([${values}]) = ${avg}`);
+        return avg;
+    },
+    chooseCheapest: (values) => {
+        const min = Math.min(...values);
+        console.log(`DECISION: chooseCheapest([${values}]) = ${min}`);
+        return min;
+    }
+};
+// Phase 1: LLM generates atomic plan
+async function generatePlan(userPrompt) {
+    console.log("\n" + "=".repeat(70));
+    console.log("PHASE 1: PLANNING (LLM generates atomic plan)");
+    console.log("=".repeat(70));
+    console.log("USER QUESTION:", userPrompt);
+    console.log("-".repeat(70) + "\n");
+    const grammar = await llama.createGrammarForJsonSchema(planSchema);
+    // Add reminder about extracting numbers
+    const enhancedPrompt = `${userPrompt}
+Remember: Extract the actual numbers from this question and put them in the input fields!`;
+    const planText = await session.prompt(enhancedPrompt, {
+        grammar,
+        maxTokens: 1000
+    });
+    let plan;
+    try {
+        // Use the robust JSON parser
+        plan = JsonParser.parse(planText, {
+            debug: debug,
+            expectObject: true,
+            repairAttempts: true
+        });
+        // Validate the plan structure
+        JsonParser.validatePlan(plan, debug);
+        // Pretty print the plan
+        if (debug) {
+            JsonParser.prettyPrint(plan);
+        } else {
+            console.log("GENERATED PLAN:");
+            console.log(JSON.stringify(plan, null, 2));
+            console.log();
+        }
+    } catch (error) {
+        console.error("Failed to parse plan:", error.message);
+        console.log("\nRaw LLM output:");
+        console.log(planText);
+        throw error;
+    }
+    return plan;
+}
+// Phase 2: System validates plan
+function validatePlan(plan) {
+    console.log("\n" + "=".repeat(70));
+    console.log("PHASE 2: VALIDATION (System checks plan)");
+    console.log("=".repeat(70) + "\n");
+    const allowedTools = new Set(Object.keys(tools));
+    const allowedDecisions = new Set(Object.keys(decisions));
+    const ids = new Set();
+    for (const atom of plan.atoms) {
+        // Check for duplicate IDs
+        if (ids.has(atom.id)) {
+            throw new Error(`Validation failed: Duplicate atom ID ${atom.id}`);
+        }
+        ids.add(atom.id);
+        // Check tool names
+        if (atom.kind === "tool" && !allowedTools.has(atom.name)) {
+            throw new Error(`Validation failed: Unknown tool "${atom.name}" in atom ${atom.id}`);
+        }
+        // Check decision names
+        if (atom.kind === "decision" && !allowedDecisions.has(atom.name)) {
+            throw new Error(`Validation failed: Unknown decision "${atom.name}" in atom ${atom.id}`);
+        }
+        // NEW: Validate tool inputs have actual values
+        if (atom.kind === "tool") {
+            if (!atom.input || typeof atom.input !== 'object') {
+                throw new Error(
+                    `Validation failed: Tool atom ${atom.id} (${atom.name}) must have an input object\n` +
+                    ` Current: ${JSON.stringify(atom.input)}`
+                );
+            }
+            // Check if a and b are present
+            if (atom.input.a === undefined || atom.input.b === undefined) {
+                throw new Error(
+                    `Validation failed: Tool atom ${atom.id} (${atom.name}) missing required parameters\n` +
+                    `  Expected: {"a": <number or reference>, "b": <number or reference>}\n` +
+                    `  Current: ${JSON.stringify(atom.input)}\n` +
+                    `  Tip: The LLM must extract numbers from the user's question`
+                );
+            }
+            // For first operations, ensure we have concrete numbers (not references)
+            if (atom.dependsOn.length === 0) {
+                const hasConcreteNumbers =
+                    (typeof atom.input.a === 'number') &&
+                    (typeof atom.input.b === 'number');
+                if (!hasConcreteNumbers) {
+                    throw new Error(
+                        `Validation failed: First atom ${atom.id} must have concrete numbers\n` +
+                        `  Expected: {"a": <number>, "b": <number>}\n` +
+                        `  Current: ${JSON.stringify(atom.input)}\n` +
+                        `  The LLM failed to extract numbers from the question`
+                    );
+                }
+            }
+        }
+        // Check dependencies exist
+        if (atom.dependsOn) {
+            for (const depId of atom.dependsOn) {
+                if (!ids.has(depId) && depId < atom.id) {
+                    console.warn(`Warning: atom ${atom.id} depends on ${depId} which hasn't been validated yet`);
+                }
+            }
+        }
+        console.log(`Atom ${atom.id} (${atom.kind}:${atom.name}) validated`);
+    }
+    console.log("\nPlan validation successful\n");
+    return true;
+}
+// Phase 3: System executes plan deterministically
+function executePlan(plan) {
+    console.log("\n" + "=".repeat(70));
+    console.log("PHASE 3: EXECUTION (System runs atoms)");
+    console.log("=".repeat(70) + "\n");
+    const state = {};
+    const sortedAtoms = [...plan.atoms].sort((a, b) => a.id - b.id);
+    for (const atom of sortedAtoms) {
+        console.log(`\nExecuting atom ${atom.id} (${atom.kind}:${atom.name})`);
+        // Check dependencies
+        if (atom.dependsOn && atom.dependsOn.length > 0) {
+            const missingDeps = atom.dependsOn.filter(id => !(id in state));
+            if (missingDeps.length > 0) {
+                throw new Error(`Atom ${atom.id} depends on incomplete atoms: ${missingDeps}`);
+            }
+            console.log(`Dependencies satisfied: ${atom.dependsOn.join(', ')}`);
+        }
+        // Resolve input values (replace <result_of_N> references)
+        let resolvedInput = { a: undefined, b: undefined };
+        if (atom.input) {
+            // Deep clone to avoid mutations
+            resolvedInput = JSON.parse(JSON.stringify(atom.input));
+            for (const [key, value] of Object.entries(resolvedInput)) {
+                if (typeof value === 'string' && value.startsWith('<result_of_')) {
+                    const refId = parseInt(value.match(/\d+/)[0]);
+                    if (!(refId in state)) {
+                        throw new Error(
+                            `Atom ${atom.id} references <result_of_${refId}> but atom ${refId} hasn't executed yet`
+                        );
+                    }
+                    resolvedInput[key] = state[refId];
+                    console.log(`Resolved ${key}: ${value} → ${state[refId]}`);
+                }
+            }
+        }
+        // Execute based on kind
+        if (atom.kind === "tool") {
+            const tool = tools[atom.name];
+            if (!tool) {
+                throw new Error(`Tool not found: ${atom.name}`);
+            }
+            // Show input before execution
+            console.log(`Input: a=${resolvedInput.a}, b=${resolvedInput.b}`);
+            // Safety check
+            if (resolvedInput.a === undefined || resolvedInput.b === undefined) {
+                throw new Error(
+                    `Cannot execute ${atom.name}: undefined input values\n` +
+                    `  This means the LLM didn't extract numbers from your question.\n` +
+                    `  Original input: ${JSON.stringify(atom.input)}`
+                );
+            }
+            state[atom.id] = tool(resolvedInput.a, resolvedInput.b);
+        }
+        else if (atom.kind === "decision") {
+            const decision = decisions[atom.name];
+            if (!decision) {
+                throw new Error(`Decision not found: ${atom.name}`);
+            }
+            // Collect results from dependencies
+            const depResults = atom.dependsOn.map(id => state[id]);
+            state[atom.id] = decision(depResults);
+        }
+        else if (atom.kind === "final") {
+            const finalValue = state[atom.dependsOn[0]];
+            console.log(`\n FINAL RESULT: ${finalValue}`);
+            state[atom.id] = finalValue;
+        }
+    }
+    return state;
+}
+// Main AoT Agent execution
+async function aotAgent(userPrompt) {
+    try {
+        // Phase 1: Plan
+        const plan = await generatePlan(userPrompt);
+        // Phase 2: Validate
+        validatePlan(plan);
+        // Phase 3: Execute
+        const result = executePlan(plan);
+        console.log("\n" + "=".repeat(70));
+        console.log("EXECUTION COMPLETE");
+        console.log("=".repeat(70));
+        // Find final atom
+        const finalAtom = plan.atoms.find(a => a.kind === "final");
+        if (finalAtom) {
+            console.log(`\nANSWER: ${result[finalAtom.id]}\n`);
+        }
+        return result;
+    } catch (error) {
+        console.error("\nEXECUTION FAILED:", error.message);
+        throw error;
+    }
+}
+// Test queries
+const queries = [
+    // "What is (15 + 7) multiplied by 3 minus 10?",
+    // "A pizza costs 20 dollars. If 4 friends split it equally, how much does each person pay?",
+    "Calculate: 100 divided by 5, then add 3, then multiply by 2",
+];
+for (const query of queries) {
+    await aotAgent(query);
+    console.log("\n");
+}
+// Debug
+const promptDebugger = new PromptDebugger({
+    outputDir: './logs',
+    filename: 'aot_calculator.txt',
+    includeTimestamp: true,
+    appendMode: false
+});
+await promptDebugger.debugContextState({ session, model });
+// Clean up
+session.dispose();
+context.dispose();
+model.dispose();
+llama.dispose();

helper/json-parser.js ADDED Viewed

	@@ -0,0 +1,282 @@

+/**
+ * Robust JSON parser for LLM outputs
+ * Handles common issues like:
+ * - Missing opening/closing braces
+ * - Markdown code blocks
+ * - Extra text before/after JSON
+ * - Escaped quotes
+ * - Trailing commas
+ */
+export class JsonParser {
+    /**
+     * Extract and parse JSON from potentially messy LLM output
+     * @param {string} text - Raw text from LLM
+     * @param {object} options - Parsing options
+     * @returns {object} Parsed JSON object
+     */
+    static parse(text, options = {}) {
+        const {
+            debug = false,
+            expectArray = false,
+            expectObject = true,
+            repairAttempts = true
+        } = options;
+        if (debug) {
+            console.log("\nRAW LLM OUTPUT:");
+            console.log("-".repeat(70));
+            console.log(text);
+            console.log("-".repeat(70) + "\n");
+        }
+        // Step 1: Clean the text
+        let cleaned = this.cleanText(text, debug);
+        // Step 2: Extract JSON
+        let extracted = this.extractJson(cleaned, expectArray, expectObject, debug);
+        // Step 3: Attempt to parse
+        try {
+            const parsed = JSON.parse(extracted);
+            if (debug) console.log("Successfully parsed JSON\n");
+            return parsed;
+        } catch (firstError) {
+            if (debug) {
+                console.log("First parse attempt failed:", firstError.message);
+            }
+            if (!repairAttempts) {
+                throw new Error(`JSON parse failed: ${firstError.message}\n\nExtracted text:\n${extracted}`);
+            }
+            // Step 4: Attempt repairs
+            return this.attemptRepairs(extracted, debug);
+        }
+    }
+    /**
+     * Clean text from common LLM artifacts
+     */
+    static cleanText(text, debug = false) {
+        let cleaned = text;
+        // Remove markdown code blocks
+        cleaned = cleaned.replace(/```json\s*/gi, '');
+        cleaned = cleaned.replace(/```\s*/g, '');
+        // Remove common prefixes
+        cleaned = cleaned.replace(/^(Here's the plan:|JSON output:|Plan:|Output:)\s*/i, '');
+        // Trim whitespace
+        cleaned = cleaned.trim();
+        if (debug && cleaned !== text) {
+            console.log("Cleaned text (removed markdown/prefixes)\n");
+        }
+        return cleaned;
+    }
+    /**
+     * Extract JSON from text (handles text before/after JSON)
+     */
+    static extractJson(text, expectArray = false, expectObject = true, debug = false) {
+        // Try to find JSON boundaries
+        const startChar = expectArray ? '[' : '{';
+        const endChar = expectArray ? ']' : '}';
+        const startIdx = text.indexOf(startChar);
+        const lastIdx = text.lastIndexOf(endChar);
+        if (startIdx === -1 || lastIdx === -1 || startIdx >= lastIdx) {
+            if (debug) {
+                console.log(`Could not find valid ${startChar}...${endChar} boundaries`);
+                console.log(`Start index: ${startIdx}, End index: ${lastIdx}`);
+            }
+            // Maybe it's missing braces - try to add them
+            if (expectObject && !text.trim().startsWith('{')) {
+                const withBraces = '{' + text.trim() + '}';
+                if (debug) console.log("Added missing opening brace");
+                return withBraces;
+            }
+            return text;
+        }
+        const extracted = text.substring(startIdx, lastIdx + 1);
+        if (debug && extracted !== text) {
+            console.log("Extracted JSON from surrounding text:");
+            console.log(extracted.substring(0, 100) + (extracted.length > 100 ? '...' : ''));
+            console.log();
+        }
+        return extracted;
+    }
+    /**
+     * Attempt various repair strategies
+     */
+    static attemptRepairs(jsonString, debug = false) {
+        const repairs = [
+            // Repair 1: Remove trailing commas
+            (str) => {
+                const fixed = str.replace(/,(\s*[}\]])/g, '$1');
+                if (debug && fixed !== str) console.log("Repair 1: Removed trailing commas");
+                return fixed;
+            },
+            // Repair 2: Fix missing quotes around property names
+            (str) => {
+                const fixed = str.replace(/([{,]\s*)([a-zA-Z_][a-zA-Z0-9_]*)\s*:/g, '$1"$2":');
+                if (debug && fixed !== str) console.log("Repair 2: Added quotes around property names");
+                return fixed;
+            },
+            // Repair 3: Fix single quotes to double quotes
+            (str) => {
+                const fixed = str.replace(/'/g, '"');
+                if (debug && fixed !== str) console.log("Repair 3: Converted single quotes to double quotes");
+                return fixed;
+            },
+            // Repair 4: Add missing closing braces
+            (str) => {
+                const openBraces = (str.match(/{/g) || []).length;
+                const closeBraces = (str.match(/}/g) || []).length;
+                if (openBraces > closeBraces) {
+                    const fixed = str + '}'.repeat(openBraces - closeBraces);
+                    if (debug) console.log(`Repair 4: Added ${openBraces - closeBraces} missing closing brace(s)`);
+                    return fixed;
+                }
+                return str;
+            },
+            // Repair 5: Add missing closing brackets
+            (str) => {
+                const openBrackets = (str.match(/\[/g) || []).length;
+                const closeBrackets = (str.match(/]/g) || []).length;
+                if (openBrackets > closeBrackets) {
+                    const fixed = str + ']'.repeat(openBrackets - closeBrackets);
+                    if (debug) console.log(`Repair 5: Added ${openBrackets - closeBrackets} missing closing bracket(s)`);
+                    return fixed;
+                }
+                return str;
+            },
+            // Repair 6: Fix escaped quotes that shouldn't be escaped
+            (str) => {
+                const fixed = str.replace(/\\"/g, '"');
+                if (debug && fixed !== str) console.log("Repair 6: Fixed escaped quotes");
+                return fixed;
+            },
+            // Repair 7: Remove control characters
+            (str) => {
+                // eslint-disable-next-line no-control-regex
+                const fixed = str.replace(/[\x00-\x1F\x7F]/g, '');
+                if (debug && fixed !== str) console.log("Repair 7: Removed control characters");
+                return fixed;
+            }
+        ];
+        let current = jsonString;
+        // Try each repair in sequence
+        for (const repair of repairs) {
+            current = repair(current);
+        }
+        // Try parsing after all repairs
+        try {
+            const parsed = JSON.parse(current);
+            if (debug) console.log("Successfully parsed after repairs\n");
+            return parsed;
+        } catch (error) {
+            // Last resort: try to extract just the atoms array if it's there
+            const atomsMatch = current.match(/"atoms"\s*:\s*(\[[\s\S]*\])/);
+            if (atomsMatch) {
+                try {
+                    const atomsOnly = { atoms: JSON.parse(atomsMatch[1]) };
+                    if (debug) console.log("Extracted and parsed atoms array\n");
+                    return atomsOnly;
+                } catch (innerError) {
+                    // Fall through to final error
+                }
+            }
+            // If all repairs fail, throw detailed error
+            throw new Error(
+                `JSON parse failed after all repair attempts.\n\n` +
+                `Original error: ${error.message}\n\n` +
+                `Attempted repairs:\n${current.substring(0, 500)}${current.length > 500 ? '...' : ''}\n\n` +
+                `Tip: Check if the LLM is following the JSON schema correctly.`
+            );
+        }
+    }
+    /**
+     * Validate parsed plan structure
+     */
+    static validatePlan(plan, debug = false) {
+        if (!plan || typeof plan !== 'object') {
+            throw new Error('Plan must be an object');
+        }
+        if (!Array.isArray(plan.atoms)) {
+            throw new Error('Plan must have an "atoms" array');
+        }
+        if (plan.atoms.length === 0) {
+            throw new Error('Plan must have at least one atom');
+        }
+        for (const atom of plan.atoms) {
+            if (typeof atom.id !== 'number') {
+                throw new Error(`Atom missing or invalid id: ${JSON.stringify(atom)}`);
+            }
+            if (!atom.kind || !['tool', 'decision', 'final'].includes(atom.kind)) {
+                throw new Error(`Atom ${atom.id} has invalid kind: ${atom.kind}`);
+            }
+            if (!atom.name || typeof atom.name !== 'string') {
+                throw new Error(`Atom ${atom.id} missing or invalid name`);
+            }
+            if (atom.dependsOn && !Array.isArray(atom.dependsOn)) {
+                throw new Error(`Atom ${atom.id} dependsOn must be an array`);
+            }
+        }
+        if (debug) {
+            console.log(`Plan structure validated: ${plan.atoms.length} atoms\n`);
+        }
+        return true;
+    }
+    /**
+     * Pretty print plan for debugging
+     */
+    static prettyPrint(plan) {
+        console.log("\nPLAN STRUCTURE:");
+        console.log("=".repeat(70));
+        for (const atom of plan.atoms) {
+            const deps = atom.dependsOn && atom.dependsOn.length > 0
+                ? ` (depends on: ${atom.dependsOn.join(', ')})`
+                : '';
+            console.log(`  ${atom.id}. [${atom.kind}] ${atom.name}${deps}`);
+            if (atom.input && Object.keys(atom.input).length > 0) {
+                console.log(`     Input: ${JSON.stringify(atom.input)}`);
+            }
+        }
+        console.log("=".repeat(70) + "\n");
+    }
+}

helper/prompt-debugger.js ADDED Viewed

	@@ -0,0 +1,350 @@

+import {LlamaText} from "node-llama-cpp";
+import path from "path";
+import fs from "fs/promises";
+/**
+ * Output types for debugging
+ */
+const OutputTypes = {
+    EXACT_PROMPT: 'exactPrompt',
+    CONTEXT_STATE: 'contextState',
+    STRUCTURED: 'structured'
+};
+/**
+ * Helper class for debugging and logging LLM prompts
+ */
+export class PromptDebugger {
+    constructor(options = {}) {
+        this.outputDir = options.outputDir || './';
+        this.filename = options.filename;
+        this.includeTimestamp = options.includeTimestamp ?? false;
+        this.appendMode = options.appendMode ?? false;
+        // Configure which outputs to include
+        this.outputTypes = options.outputTypes || [OutputTypes.EXACT_PROMPT];
+        // Ensure outputTypes is always an array
+        if (!Array.isArray(this.outputTypes)) {
+            this.outputTypes = [this.outputTypes];
+        }
+    }
+    /**
+     * Captures only the exact prompt (user input + system + functions)
+     * @param {Object} params
+     * @param {Object} params.session - The chat session
+     * @param {string} params.prompt - The user prompt
+     * @param {string} params.systemPrompt - System prompt (optional)
+     * @param {Object} params.functions - Available functions (optional)
+     * @returns {Object} The exact prompt data
+     */
+    captureExactPrompt(params) {
+        const { session, prompt, systemPrompt, functions } = params;
+        const chatWrapper = session.chatWrapper;
+        // Build minimal history for exact prompt
+        const history = [{ type: 'user', text: prompt }];
+        if (systemPrompt) {
+            history.unshift({ type: 'system', text: systemPrompt });
+        }
+        // Generate the context state with just the current prompt
+        const state = chatWrapper.generateContextState({
+            chatHistory: history,
+            availableFunctions: functions,
+            systemPrompt: systemPrompt
+        });
+        const formattedPrompt = state.contextText.toString();
+        return {
+            exactPrompt: formattedPrompt,
+            timestamp: new Date().toISOString(),
+            prompt,
+            systemPrompt,
+            functions: functions ? Object.keys(functions) : []
+        };
+    }
+    /**
+     * Captures the full context state (includes assistant responses)
+     * @param {Object} params
+     * @param {Object} params.session - The chat session
+     * @param {Object} params.model - The loaded model
+     * @returns {Object} The context state data
+     */
+    captureContextState(params) {
+        const { session, model } = params;
+        // Get the actual context from the session after responses
+        const contextState = model.detokenize(session.sequence.contextTokens, true);
+        return {
+            contextState,
+            timestamp: new Date().toISOString(),
+            tokenCount: session.sequence.contextTokens.length
+        };
+    }
+    /**
+     * Captures the structured token representation
+     * @param {Object} params
+     * @param {Object} params.session - The chat session
+     * @param {Object} params.model - The loaded model
+     * @returns {Object} The structured token data
+     */
+    captureStructured(params) {
+        const { session, model } = params;
+        const structured = LlamaText.fromTokens(model.tokenizer, session.sequence.contextTokens);
+        return {
+            structured,
+            timestamp: new Date().toISOString(),
+            tokenCount: session.sequence.contextTokens.length
+        };
+    }
+    /**
+     * Captures all configured output types
+     * @param {Object} params - Contains all possible parameters
+     * @returns {Object} Combined captured data based on configuration
+     */
+    captureAll(params) {
+        const result = {
+            timestamp: new Date().toISOString()
+        };
+        if (this.outputTypes.includes(OutputTypes.EXACT_PROMPT)) {
+            const exactData = this.captureExactPrompt(params);
+            result.exactPrompt = exactData.exactPrompt;
+            result.prompt = exactData.prompt;
+            result.systemPrompt = exactData.systemPrompt;
+            result.functions = exactData.functions;
+        }
+        if (this.outputTypes.includes(OutputTypes.CONTEXT_STATE)) {
+            const contextData = this.captureContextState(params);
+            result.contextState = contextData.contextState;
+            result.contextTokenCount = contextData.tokenCount;
+        }
+        if (this.outputTypes.includes(OutputTypes.STRUCTURED)) {
+            const structuredData = this.captureStructured(params);
+            result.structured = structuredData.structured;
+            result.structuredTokenCount = structuredData.tokenCount;
+        }
+        return result;
+    }
+    /**
+     * Formats the captured data based on configuration
+     * @param {Object} capturedData - Data from capture methods
+     * @returns {string} Formatted output
+     */
+    formatOutput(capturedData) {
+        let output = `\n========== PROMPT DEBUG OUTPUT ==========\n`;
+        output += `Timestamp: ${capturedData.timestamp}\n`;
+        if (capturedData.prompt) {
+            output += `Original Prompt: ${capturedData.prompt}\n`;
+        }
+        if (capturedData.systemPrompt) {
+            output += `System Prompt: ${capturedData.systemPrompt.substring(0, 50)}...\n`;
+        }
+        if (capturedData.functions && capturedData.functions.length > 0) {
+            output += `Functions: ${capturedData.functions.join(', ')}\n`;
+        }
+        if (capturedData.exactPrompt) {
+            output += `\n=== EXACT PROMPT ===\n`;
+            output += capturedData.exactPrompt;
+            output += `\n`;
+        }
+        if (capturedData.contextState) {
+            output += `Token Count: ${capturedData.contextTokenCount || 'N/A'}\n`;
+            output += `\n=== CONTEXT STATE ===\n`;
+            output += capturedData.contextState;
+            output += `\n`;
+        }
+        if (capturedData.structured) {
+            output += `\n=== STRUCTURED ===\n`;
+            output += `Token Count: ${capturedData.structuredTokenCount || 'N/A'}\n`;
+            output += JSON.stringify(capturedData.structured, null, 2);
+            output += `\n`;
+        }
+        output += `==========================================\n`;
+        return output;
+    }
+    /**
+     * Saves data to file
+     * @param {Object} capturedData - Data to save
+     * @param {null} customFilename - Optional custom filename
+     */
+    async saveToFile(capturedData, customFilename = null) {
+        const content = this.formatOutput(capturedData);
+        let filename = customFilename || this.filename;
+        if (this.includeTimestamp) {
+            const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
+            const ext = path.extname(filename);
+            const base = path.basename(filename, ext);
+            filename = `${base}_${timestamp}${ext}`;
+        }
+        const filepath = path.join(this.outputDir, filename);
+        if (this.appendMode) {
+            await fs.appendFile(filepath, content, 'utf8');
+        } else {
+            await fs.writeFile(filepath, content, 'utf8');
+        }
+        console.log(`Prompt debug output written to ${filepath}`);
+        return filepath;
+    }
+    /**
+     * Debug exact prompt only - minimal params needed
+     * @param {Object} params - session, prompt, systemPrompt (optional), functions (optional)
+     * @param customFilename
+     */
+    async debugExactPrompt(params, customFilename = null) {
+        const oldOutputTypes = this.outputTypes;
+        this.outputTypes = [OutputTypes.EXACT_PROMPT];
+        const capturedData = this.captureAll(params);
+        const filepath = await this.saveToFile(capturedData, customFilename);
+        this.outputTypes = oldOutputTypes;
+        return { capturedData, filepath };
+    }
+    /**
+     * Debug context state only - needs session and model
+     * @param {Object} params - session, model
+     * @param customFilename
+     */
+    async debugContextState(params, customFilename = null) {
+        const oldOutputTypes = this.outputTypes;
+        this.outputTypes = [OutputTypes.CONTEXT_STATE];
+        const capturedData = this.captureAll(params);
+        const filepath = await this.saveToFile(capturedData, customFilename);
+        this.outputTypes = oldOutputTypes;
+        return { capturedData, filepath };
+    }
+    /**
+     * Debug structured only - needs session and model
+     * @param {Object} params - session, model
+     * @param customFilename
+     */
+    async debugStructured(params, customFilename = null) {
+        const oldOutputTypes = this.outputTypes;
+        this.outputTypes = [OutputTypes.STRUCTURED];
+        const capturedData = this.captureAll(params);
+        const filepath = await this.saveToFile(capturedData, customFilename);
+        this.outputTypes = oldOutputTypes;
+        return { capturedData, filepath };
+    }
+    /**
+     * Debug with configured output types
+     * @param {Object} params - All parameters (session, model, prompt, etc.)
+     * @param customFilename
+     */
+    async debug(params, customFilename = null) {
+        const capturedData = this.captureAll(params);
+        //const filepath = await this.saveToFile(capturedData, customFilename);
+        return { capturedData };
+    }
+    /**
+     * Log to console only
+     * @param {Object} params - Parameters based on configured output types
+     */
+    logToConsole(params) {
+        const capturedData = this.captureAll(params);
+        console.log(this.formatOutput(capturedData));
+        return capturedData;
+    }
+    /**
+     * Log exact prompt to console
+     */
+    logExactPrompt(params) {
+        const capturedData = this.captureExactPrompt(params);
+        console.log(this.formatOutput(capturedData));
+        return capturedData;
+    }
+    /**
+     * Log context state to console
+     */
+    logContextState(params) {
+        const capturedData = this.captureContextState(params);
+        console.log(this.formatOutput(capturedData));
+        return capturedData;
+    }
+    /**
+     * Log structured to console
+     */
+    logStructured(params) {
+        const capturedData = this.captureStructured(params);
+        console.log(this.formatOutput(capturedData));
+        return capturedData;
+    }
+}
+/**
+ * Quick function to debug exact prompt only
+ */
+async function debugExactPrompt(params, options = {}) {
+    const promptDebugger = new PromptDebugger({
+        ...options,
+        outputTypes: [OutputTypes.EXACT_PROMPT]
+    });
+    return await promptDebugger.debug(params);
+}
+/**
+ * Quick function to debug context state only
+ */
+async function debugContextState(params, options = {}) {
+    const promptDebugger = new PromptDebugger({
+        ...options,
+        outputTypes: [OutputTypes.CONTEXT_STATE]
+    });
+    return await promptDebugger.debug(params);
+}
+/**
+ * Quick function to debug structured only
+ */
+async function debugStructured(params, options = {}) {
+    const promptDebugger = new PromptDebugger({
+        ...options,
+        outputTypes: [OutputTypes.STRUCTURED]
+    });
+    return await promptDebugger.debug(params);
+}
+/**
+ * Quick function to debug all outputs
+ */
+async function debugAll(params, options = {}) {
+    const promptDebugger = new PromptDebugger({
+        ...options,
+        outputTypes: [OutputTypes.EXACT_PROMPT, OutputTypes.CONTEXT_STATE, OutputTypes.STRUCTURED]
+    });
+    return await promptDebugger.debug(params);
+}

logs/.gitkeep ADDED Viewed

File without changes

package-lock.json ADDED Viewed

The diff for this file is too large to render. See raw diff

package.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "name": "ai-agents",
+  "version": "1.0.0",
+  "description": "",
+  "main": "index.js",
+  "scripts": {
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "type": "module",
+  "keywords": [],
+  "author": "",
+  "license": "ISC",
+  "dependencies": {
+    "dotenv": "^17.2.3",
+    "node-llama-cpp": "^3.14.0",
+    "openai": "^6.7.0"
+  }
+}

run_classifier.js ADDED Viewed

	@@ -0,0 +1,349 @@

+/**
+ * Part 1 Capstone Solution: Smart Email Classifier
+ *
+ * Build an AI system that organizes your inbox by classifying emails into categories.
+ *
+ * Skills Used:
+ * - Runnables for processing pipeline
+ * - Messages for structured classification
+ * - LLM wrapper for flexible model switching
+ * - Context for classification history
+ *
+ * Difficulty: ⭐⭐☆☆☆
+ */
+import { SystemMessage, HumanMessage, Runnable, LlamaCppLLM } from './src/index.js';
+import { BaseCallback } from './src/utils/callbacks.js';
+import { readFileSync } from 'fs';
+// ============================================================================
+// EMAIL CLASSIFICATION CATEGORIES
+// ============================================================================
+const CATEGORIES = {
+    SPAM: 'Spam',
+    INVOICE: 'Invoice',
+    MEETING: 'Meeting Request',
+    URGENT: 'Urgent',
+    PERSONAL: 'Personal',
+    OTHER: 'Other'
+};
+// ============================================================================
+// Email Parser Runnable
+// ============================================================================
+/**
+ * Parses raw email text into structured format
+ *
+ * Input: { subject: string, body: string, from: string }
+ * Output: { subject, body, from, timestamp }
+ */
+class EmailParserRunnable extends Runnable {
+    async _call(input, config) {
+        // Validate required fields
+        if (!input.subject || !input.body || !input.from) {
+            throw new Error('Email must have subject, body, and from fields');
+        }
+        // Parse and structure the email
+        return {
+            subject: input.subject.trim(),
+            body: input.body.trim(),
+            from: input.from.trim(),
+            timestamp: new Date().toISOString()
+        };
+    }
+}
+// ============================================================================
+// Email Classifier Runnable
+// ============================================================================
+/**
+ * Classifies email using LLM
+ *
+ * Input: { subject, body, from, timestamp }
+ * Output: { ...email, category, confidence, reason }
+ */
+class EmailClassifierRunnable extends Runnable {
+    constructor(llm) {
+        super();
+        this.llm = llm;
+    }
+    async _call(input, config) {
+        // Build the classification prompt
+        const messages = this._buildPrompt(input);
+        // Call the LLM
+        const response = await this.llm.invoke(messages, config);
+        // Parse the LLM response
+        const classification = this._parseClassification(response.content);
+        // Return email with classification
+        return {
+            ...input,
+            category: classification.category,
+            confidence: classification.confidence,
+            reason: classification.reason
+        };
+    }
+    _buildPrompt(email) {
+        const systemPrompt = new SystemMessage(`You are an email classification assistant. Your task is to classify emails into one of these categories:
+Categories:
+- Spam: Unsolicited promotional emails, advertisements with excessive punctuation/caps, phishing attempts, scams
+- Invoice: Bills, payment requests, financial documents, receipts
+- Meeting Request: Meeting invitations, calendar requests, scheduling, availability inquiries
+- Urgent: Time-sensitive matters requiring immediate attention, security alerts, critical notifications
+- Personal: Personal correspondence from friends/family (look for personal tone and familiar email addresses)
+- Other: Legitimate newsletters, updates, informational content, everything else that doesn't fit above
+Important distinctions:
+- Legitimate newsletters (tech updates, subscriptions) should be "Other", not Spam
+- Spam has excessive punctuation (!!!, ALL CAPS), pushy language, or suspicious intent
+- Personal emails have familiar sender addresses and casual tone
+Respond in this exact JSON format:
+{
+  "category": "Category Name",
+  "confidence": 0.95,
+  "reason": "Brief explanation"
+}
+Confidence should be between 0 and 1.`);
+        const userPrompt = new HumanMessage(`Classify this email:
+From: ${email.from}
+Subject: ${email.subject}
+Body: ${email.body}
+Provide your classification in JSON format.`);
+        return [systemPrompt, userPrompt];
+    }
+    _parseClassification(response) {
+        try {
+            // Try to find JSON in the response
+            const jsonMatch = response.match(/\{[\s\S]*\}/);
+            if (!jsonMatch) {
+                throw new Error('No JSON found in response');
+            }
+            const parsed = JSON.parse(jsonMatch[0]);
+            // Validate the parsed response
+            if (!parsed.category || parsed.confidence === undefined || !parsed.reason) {
+                throw new Error('Invalid classification format');
+            }
+            // Ensure confidence is a number between 0 and 1
+            const confidence = Math.max(0, Math.min(1, parseFloat(parsed.confidence)));
+            return {
+                category: parsed.category,
+                confidence: confidence,
+                reason: parsed.reason
+            };
+        } catch (error) {
+            // Fallback classification if parsing fails
+            console.warn('Failed to parse LLM response, using fallback:', error.message);
+            return {
+                category: CATEGORIES.OTHER,
+                confidence: 0.5,
+                reason: 'Failed to parse classification'
+            };
+        }
+    }
+}
+// ============================================================================
+// Classification History Callback
+// ============================================================================
+/**
+ * Tracks classification history using callbacks
+ */
+class ClassificationHistoryCallback extends BaseCallback {
+    constructor() {
+        super();
+        this.history = [];
+    }
+    async onEnd(runnable, output, config) {
+        // Only track EmailClassifierRunnable results
+        if (runnable.name === 'EmailClassifierRunnable' && output.category) {
+            this.history.push({
+                timestamp: output.timestamp,
+                from: output.from,
+                subject: output.subject,
+                category: output.category,
+                confidence: output.confidence,
+                reason: output.reason
+            });
+        }
+    }
+    getHistory() {
+        return this.history;
+    }
+    getStatistics() {
+        if (this.history.length === 0) {
+            return {
+                total: 0,
+                byCategory: {},
+                averageConfidence: 0
+            };
+        }
+        // Count by category
+        const byCategory = {};
+        let totalConfidence = 0;
+        for (const entry of this.history) {
+            byCategory[entry.category] = (byCategory[entry.category] || 0) + 1;
+            totalConfidence += entry.confidence;
+        }
+        return {
+            total: this.history.length,
+            byCategory: byCategory,
+            averageConfidence: totalConfidence / this.history.length
+        };
+    }
+    printHistory() {
+        console.log('\n📧 Classification History:');
+        console.log('─'.repeat(70));
+        for (const entry of this.history) {
+            console.log(`\n✉️  From: ${entry.from}`);
+            console.log(`   Subject: ${entry.subject}`);
+            console.log(`   Category: ${entry.category}`);
+            console.log(`   Confidence: ${(entry.confidence * 100).toFixed(1)}%`);
+            console.log(`   Reason: ${entry.reason}`);
+        }
+    }
+    printStatistics() {
+        const stats = this.getStatistics();
+        console.log('\n📊 Classification Statistics:');
+        console.log('─'.repeat(70));
+        console.log(`Total Emails: ${stats.total}\n`);
+        if (stats.total > 0) {
+            console.log('By Category:');
+            for (const [category, count] of Object.entries(stats.byCategory)) {
+                const percentage = ((count / stats.total) * 100).toFixed(1);
+                console.log(`  ${category}: ${count} (${percentage}%)`);
+            }
+            console.log(`\nAverage Confidence: ${(stats.averageConfidence * 100).toFixed(1)}%`);
+        }
+    }
+}
+// ============================================================================
+// Email Classification Pipeline
+// ============================================================================
+/**
+ * Complete pipeline: Parse → Classify → Store
+ */
+class EmailClassificationPipeline {
+    constructor(llm) {
+        this.parser = new EmailParserRunnable();
+        this.classifier = new EmailClassifierRunnable(llm);
+        this.historyCallback = new ClassificationHistoryCallback();
+        // Build the pipeline: parser -> classifier
+        this.pipeline = this.parser.pipe(this.classifier);
+    }
+    async classify(email) {
+        // Run the email through the pipeline with history callback
+        const config = {
+            callbacks: [this.historyCallback]
+        };
+        return await this.pipeline.invoke(email, config);
+    }
+    getHistory() {
+        return this.historyCallback.getHistory();
+    }
+    getStatistics() {
+        return this.historyCallback.getStatistics();
+    }
+    printHistory() {
+        this.historyCallback.printHistory();
+    }
+    printStatistics() {
+        this.historyCallback.printStatistics();
+    }
+}
+// ============================================================================
+// TEST DATA
+// ============================================================================
+const TEST_EMAILS = JSON.parse(
+    readFileSync(new URL('./test-emails.json', import.meta.url), 'utf-8')
+);
+// ============================================================================
+// MAIN FUNCTION
+// ============================================================================
+async function main() {
+    console.log('=== Part 1 Capstone: Smart Email Classifier ===\n');
+    // Initialize the LLM
+    const llm = new LlamaCppLLM({
+        modelPath: './models/Qwen3-1.7B-Q8_0.gguf', // Adjust to your model
+        temperature: 0.1, // Low temperature for consistent classification
+        maxTokens: 200
+    });
+    // Create the classification pipeline
+    const pipeline = new EmailClassificationPipeline(llm);
+    console.log('📬 Processing emails...\n');
+    // Classify each test email
+    for (const email of TEST_EMAILS) {
+        try {
+            const result = await pipeline.classify(email);
+            console.log(`✉️  Email from: ${result.from}`);
+            console.log(`   Subject: ${result.subject}`);
+            console.log(`   Category: ${result.category}`);
+            console.log(`   Confidence: ${(result.confidence * 100).toFixed(1)}%`);
+            console.log(`   Reason: ${result.reason}\n`);
+        } catch (error) {
+            console.error(`❌ Error classifying email from ${email.from}:`, error.message);
+        }
+    }
+    // Print history and statistics
+    pipeline.printHistory();
+    pipeline.printStatistics();
+    // Cleanup
+    await llm.dispose();
+    console.log('\n✓ Capstone Project Complete!');
+}
+// Run the project
+main().catch(console.error);

secrets.local.md ADDED Viewed

	@@ -0,0 +1,22 @@

+# 🔐 ANTIGRAVITY SECURE VAULT (LOCAL ONLY)
+> [!CAUTION]
+> **KHÔNG BAO GIỜ PUSH FILE NÀY LÊN GITHUB.**
+> File này chứa thông tin nhạy cảm. Đã được cấu hình để Git bỏ qua.
+## 🔑 GITHUB TOKENS
+- **dahanhstudio**: `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`
+- **NungLon01**: `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`
+- **lenzcomvth**: `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`
+---
+## ☁️ CLOUDFLARE TOKENS
+- **API Token (Workers)**: `UE6zJ6_3uwlSZbrqhxeJieEnsHO01frIjxRyBlA7`
+##HuggingFace  TOKEN :
+- ** lenzcom account ** : `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`
+---
+*Generated by Antigravity Secure Protocol*