lenzcom commited on
Commit
e706de2
·
verified ·
1 Parent(s): b5167d8

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .env.example +1 -0
  2. .gitignore +11 -0
  3. CONTRIBUTING.md +118 -0
  4. DOWNLOAD.md +24 -0
  5. Dockerfile +28 -0
  6. LICENSE.md +21 -0
  7. PROMPTING.md +160 -0
  8. README.md +504 -10
  9. SUMMARY_COMPOSITION.md +26 -0
  10. SUMMARY_FOUNDATION.md +46 -0
  11. SUMMARY_FULL.md +56 -0
  12. examples/01_intro/CODE.md +112 -0
  13. examples/01_intro/CONCEPT.md +175 -0
  14. examples/01_intro/intro.js +36 -0
  15. examples/02_openai-intro/CODE.md +394 -0
  16. examples/02_openai-intro/CONCEPT.md +950 -0
  17. examples/02_openai-intro/openai-intro.js +205 -0
  18. examples/03_translation/CODE.md +231 -0
  19. examples/03_translation/CONCEPT.md +302 -0
  20. examples/03_translation/translation.js +82 -0
  21. examples/04_think/CODE.md +257 -0
  22. examples/04_think/CONCEPT.md +368 -0
  23. examples/04_think/think.js +49 -0
  24. examples/05_batch/CODE.md +323 -0
  25. examples/05_batch/CONCEPT.md +365 -0
  26. examples/05_batch/batch.js +60 -0
  27. examples/06_coding/CODE.md +380 -0
  28. examples/06_coding/CONCEPT.md +400 -0
  29. examples/06_coding/coding.js +47 -0
  30. examples/07_simple-agent/CODE.md +368 -0
  31. examples/07_simple-agent/CONCEPT.md +69 -0
  32. examples/07_simple-agent/simple-agent.js +62 -0
  33. examples/08_simple-agent-with-memory/CODE.md +247 -0
  34. examples/08_simple-agent-with-memory/CONCEPT.md +249 -0
  35. examples/08_simple-agent-with-memory/agent-memory.json +19 -0
  36. examples/08_simple-agent-with-memory/memory-manager.js +137 -0
  37. examples/08_simple-agent-with-memory/simple-agent-with-memory.js +93 -0
  38. examples/09_react-agent/CODE.md +278 -0
  39. examples/09_react-agent/CONCEPT.md +372 -0
  40. examples/09_react-agent/react-agent.js +241 -0
  41. examples/10_aot-agent/CODE.md +178 -0
  42. examples/10_aot-agent/CONCEPT.md +265 -0
  43. examples/10_aot-agent/aot-agent.js +416 -0
  44. helper/json-parser.js +282 -0
  45. helper/prompt-debugger.js +350 -0
  46. logs/.gitkeep +0 -0
  47. package-lock.json +0 -0
  48. package.json +18 -0
  49. run_classifier.js +349 -0
  50. secrets.local.md +22 -0
.env.example ADDED
@@ -0,0 +1 @@
 
 
1
+ OPENAI_API_KEY=your_api_key_here
.gitignore ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ models
2
+ node_modules
3
+ .idea
4
+ .env
5
+ internal
6
+ ui
7
+ *.txt
8
+ node-llama-docs
9
+
10
+ frontend*
11
+ VIDEO_SCRIPT.md
CONTRIBUTING.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing Guidelines
2
+
3
+ Thank you for considering contributing to AI Agents from Scratch!
4
+
5
+ ## Project Philosophy
6
+
7
+ This repository teaches AI agent fundamentals by building from scratch. Every contribution should support this learning mission.
8
+
9
+ **Core Principles:**
10
+ - **Clarity over cleverness** - Code should be easy to understand
11
+ - **Fundamentals first** - No black boxes or magic
12
+ - **Progressive learning** - Each example builds on the previous
13
+ - **Local-first** - No API dependencies
14
+
15
+ ## Types of Contributions
16
+
17
+ ### Bug Reports
18
+ Found something broken? Open an issue with:
19
+ - Which example (`intro/`, `react-agent/`, etc.)
20
+ - What you expected vs. what happened
21
+ - Your environment (Node version, OS, model used)
22
+ - Steps to reproduce
23
+
24
+ ### Documentation Improvements
25
+ - Typos and grammar fixes
26
+ - Clearer explanations
27
+ - Better code comments
28
+ - Additional examples in documentation
29
+ - Diagrams and visualizations
30
+
31
+ ### New Examples
32
+ Want to add a new agent pattern? Great! Please:
33
+ 1. **Open an issue first** - let's discuss if it fits
34
+ 2. Follow the existing structure:
35
+ - `pattern-name/pattern-name.js` - Working code
36
+ - `pattern-name/CODE.md` - Detailed code walkthrough
37
+ - `pattern-name/CONCEPT.md` - Why it matters, use cases
38
+ 3. Keep it simple and well-commented
39
+ 4. Test thoroughly with at least one model
40
+
41
+ ### Code Improvements
42
+ - Performance optimizations (with benchmarks)
43
+ - Better error handling
44
+ - Clearer variable names
45
+ - More helpful console output
46
+
47
+ ## What We're Not Looking For
48
+
49
+ - Framework integrations (LangChain, etc.) - this repo teaches what they do
50
+ - Cloud API examples - keep it local
51
+ - Production features (monitoring, scaling) - this is educational
52
+ - Complex abstractions - keep it beginner-friendly
53
+
54
+ ## Contribution Process
55
+
56
+ 1. **Fork** the repository
57
+ 2. **Create a branch**: `git checkout -b fix/issue-description`
58
+ 3. **Make changes** and test thoroughly
59
+ 4. **Commit** with clear messages: `git commit -m "Fix: clarify ReAct loop explanation"`
60
+ 5. **Push**: `git push origin fix/issue-description`
61
+ 6. **Open a Pull Request** with:
62
+ - Clear title
63
+ - Description of what changed and why
64
+ - Which issue it addresses (if any)
65
+
66
+ ## Code Standards
67
+
68
+ - Use clear, descriptive variable names
69
+ - Add comments explaining *why*, not just *what*
70
+ - Follow existing code style (no linter, just match the patterns)
71
+ - Keep examples self-contained (one file when possible)
72
+ - Test with Qwen or Llama models before submitting
73
+
74
+ ## Documentation Standards
75
+
76
+ - Use clear, simple language
77
+ - Explain concepts before code
78
+ - Include diagrams where helpful (ASCII art is fine!)
79
+ - Provide real-world use cases
80
+ - Link to related examples
81
+
82
+ ## Example Structure
83
+ ```
84
+ new-pattern/
85
+ ├── new-pattern.js # The working code
86
+ ├── CODE.md # Line-by-line walkthrough
87
+ └── CONCEPT.md # High-level explanation
88
+ ```
89
+
90
+ **CODE.md should include:**
91
+ - Prerequisites
92
+ - Step-by-step code breakdown
93
+ - How to run it
94
+ - Expected output
95
+
96
+ **CONCEPT.md should include:**
97
+ - What problem it solves
98
+ - Why this pattern matters
99
+ - Real-world applications
100
+ - Simple diagrams
101
+
102
+ ## Getting Help
103
+
104
+ - Not sure if your idea fits? **Open an issue to discuss**
105
+ - Stuck on implementation? **Ask in the issue**
106
+ - Want to pair on something? **Reach out!**
107
+
108
+ ## License
109
+
110
+ By contributing, you agree that your contributions will be licensed under the same license as the project (MIT).
111
+
112
+ ## Recognition
113
+
114
+ All contributors will be recognized in the README. Thank you for helping others learn!
115
+
116
+ ---
117
+
118
+ **Questions?** Open an issue or reach out. Happy to help guide your contribution!
DOWNLOAD.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Download the models used in this repository
2
+
3
+ You can adjust the quantization level to balance model precision and file size:
4
+ Use `:Q8_0` for higher precision and better output quality, but note that it requires more memory and storage.
5
+ Use `:Q6_K` for a good balance between size and accuracy (recommended default).
6
+ Use `:Q5_K_S` for a smaller model that loads faster and uses less memory, but with slightly lower precision.
7
+
8
+ ```
9
+ npx --no node-llama-cpp pull --dir ./models hf:Qwen/Qwen3-1.7B-GGUF:Q8_0 --filename Qwen3-1.7B-Q8_0.gguf
10
+ ```
11
+
12
+ ```
13
+ npx --no node-llama-cpp pull --dir ./models hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
14
+ ```
15
+
16
+ ```
17
+ npx --no node-llama-cpp pull --dir ./models hf:unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q6_K --filename DeepSeek-R1-0528-Qwen3-8B-Q6_K.gguf
18
+ ```
19
+
20
+ ```
21
+ npx --no node-llama-cpp pull --dir ./models hf:giladgd/Apertus-8B-Instruct-2509-GGUF:Q6_K
22
+ ```
23
+
24
+
Dockerfile ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM node:18-slim
2
+
3
+ # Install dependencies for building node-llama-cpp
4
+ RUN apt-get update && apt-get install -y python3 make g++ curl
5
+
6
+ WORKDIR /app
7
+
8
+ # Copy package files
9
+ COPY package*.json ./
10
+
11
+ # Install npm dependencies
12
+ RUN npm install
13
+
14
+ # Copy source code
15
+ COPY . .
16
+
17
+ # Create models directory
18
+ RUN mkdir -p models
19
+
20
+ # Download the model during build (so it's baked into the image)
21
+ # Using direct download URL for speed if possible, or use node-llama-cpp pull
22
+ RUN npx --no node-llama-cpp pull --dir ./models hf:Qwen/Qwen3-1.7B-GGUF:Q8_0 --filename Qwen3-1.7B-Q8_0.gguf
23
+
24
+ # Expose the port HF expects
25
+ EXPOSE 7860
26
+
27
+ # Start the server
28
+ CMD ["node", "server.js"]
LICENSE.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 [Your Name]
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
PROMPTING.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prompt Engineering
2
+
3
+ Prompt engineering offers the quickest and most straightforward method for shaping how an agent behaves—defining
4
+ its personality, function, and choices (such as when it should utilize tools). Agents operate using two prompt categories:
5
+ system-level and user-level prompts.
6
+
7
+ User-level prompts consist of the messages individuals enter during conversation. These vary with each interaction
8
+ and remain outside the developer's control.
9
+
10
+ System-level prompts contain instructions established by developers that remain constant throughout the dialogue.
11
+ These define the agent's tone, capabilities, limitations, and guidelines for tool usage.
12
+
13
+ Look into the system prompts from Anthropic
14
+
15
+ https://docs.claude.com/en/release-notes/system-prompts#september-29-2025
16
+
17
+ ## Prompt Design
18
+
19
+ When creating prompts for agents, you need to achieve two things:
20
+
21
+ 1. Make the agent solve problems well
22
+
23
+ - Help it complete complex tasks correctly
24
+ - Enable clear, logical thinking
25
+ - Reduce mistakes
26
+
27
+ 2. Keep the agent's personality consistent
28
+
29
+ - Define who the agent is and how it speaks
30
+ - Match your brand's voice
31
+ - Respond with appropriate emotion for each situation
32
+
33
+ Both goals matter equally. An accurate answer delivered rudely hurts the user experience. A friendly answer that
34
+ doesn't actually help is useless.
35
+
36
+ ## Prompt Strategies
37
+
38
+ ### Agents Role
39
+
40
+ Giving the LLM a specific role improves its responses - it naturally adopts that role's vocabulary and expertise.
41
+ Examples:
42
+
43
+ "You are a pediatrician" → Uses medical terms, discusses child development, recommends age-appropriate treatments
44
+ "You are a chef" → Explains cooking techniques, suggests ingredient substitutions, discusses flavor profiles
45
+ "You are a high school math teacher" → Breaks down problems step-by-step, uses simple language, provides practice examples
46
+ "You are a startup founder" → Focuses on growth, uses business metrics, thinks about scalability
47
+
48
+ Make roles specific:
49
+ Instead of: "You are a writer"
50
+ Better: "You are a tech blogger who simplifies complex AI concepts for beginners"
51
+
52
+ Roles work best for specialized questions and should be set in system prompts.
53
+
54
+ ### Be Specific, Not Vague
55
+
56
+ LLMs interpret instructions literally. Vague prompts produce random results. Specific prompts produce consistent outputs.
57
+ Vague vs Specific Examples:
58
+
59
+ ❌ Vague: "Write something about dogs"
60
+ ✅ Specific: "Write a 3-paragraph guide on training a puppy to sit"
61
+
62
+ ❌ Vague: "Make it better"
63
+ ✅ Specific: "Fix grammar errors and shorten to under 100 words"
64
+
65
+ ❌ Vague: "Be professional"
66
+ ✅ Specific: "Use formal language, avoid contractions, address the reader as 'you'"
67
+
68
+ ❌ Vague: "Analyze this data"
69
+ ✅ Specific: "Find the top 3 trends and explain what caused each one"
70
+
71
+ Why it matters: The LLM has thousands of ways to interpret vague instructions. It will guess what you want—and often
72
+ guess wrong. Clear instructions eliminate guesswork and give you control over the output.
73
+
74
+ Rule of thumb: If a human assistant would need to ask clarifying questions, your prompt is too vague.
75
+
76
+ ### Structuring LLM Inputs with JSON
77
+ Using JSON to structure your input helps LLMs understand tasks more clearly and makes integration easier. Instead of
78
+ sending a blob of text, break your request into labeled parts like task, input, constraints, and output_format.
79
+
80
+ Benefits
81
+ - Clarity: JSON keys show the model what each part means.
82
+ - Reliability: Easier to parse and validate responses.
83
+ - Consistency: Reduces random or narrative answers.
84
+ - Integration: Works well with APIs and schemas.
85
+
86
+ Best Practices
87
+ - Keep it simple and shallow — avoid deep nesting.
88
+ - Use descriptive keys ("task", "context", "constraints").
89
+ - Tell the model the exact output format (e.g., “Respond with valid JSON only”).
90
+ - Optionally define a JSON Schema to enforce structure.
91
+ - Always validate the response in your code.
92
+
93
+ Example
94
+ ````
95
+ {
96
+ "task": "summarize",
97
+ "input_text": " - Article text here. - ",
98
+ "constraints": {
99
+ "max_words": 100,
100
+ "audience": "non-technical"
101
+ },
102
+ "output_format": {
103
+ "type": "JSON",
104
+ "schema": {
105
+ "summary": "string",
106
+ "key_points": ["string"]
107
+ }
108
+ }
109
+ }
110
+ ````
111
+
112
+ This structured format helps the model separate what to do, what data to use, and how to reply, resulting in
113
+ more consistent, machine-readable outputs.
114
+
115
+ ### Few-Shot Prompting
116
+
117
+ Few-shot prompting means giving the LLM a few examples of what you want before asking it to do a new task.
118
+ It’s like showing a student two or three solved problems so they understand the pattern.
119
+
120
+ Example
121
+ ```
122
+ Example 1:
123
+ Feedback: "The room was clean and quiet."
124
+ Category: Positive
125
+
126
+ Example 2:
127
+ Feedback: "The staff were rude and unhelpful."
128
+ Category: Negative
129
+
130
+ Example 3:
131
+ Feedback: "Breakfast was okay, but the coffee was cold."
132
+ Category: Neutral
133
+
134
+ Now categorize this:
135
+ Feedback: "The view from the balcony was amazing!"
136
+ Category:
137
+ ```
138
+
139
+ The model learns from the examples and continues in the same style — here, it would answer:
140
+ "Good morning"
141
+
142
+ Few-shot prompts are useful when you want consistent tone, format, or logic without retraining the model.
143
+
144
+ ### Chain of Thought
145
+
146
+ Chain of thought means asking the LLM to think step by step instead of jumping straight to the answer.
147
+ It helps the model reason better, especially for logic, math, or multi-step problems.
148
+
149
+ Example
150
+
151
+ Question: If 3 apples cost $6, how much do 5 apples cost?
152
+ Let's think step by step.
153
+
154
+ Model reasoning:
155
+ 3 apples → $6 → each apple costs $2.
156
+ 5 apples × $2 = $10.
157
+
158
+ Answer: $10
159
+
160
+ By encouraging step-by-step thinking, you help the model make fewer mistakes and explain its reasoning clearly.
README.md CHANGED
@@ -1,10 +1,504 @@
1
- ---
2
- title: Email
3
- emoji: 🦀
4
- colorFrom: yellow
5
- colorTo: gray
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ > **Read the full interactive version:**
2
+ > This repository is part of **AI Agents From Scratch** - a hands-on learning series where we build AI agents *step by step*, explain every design decision, and visualize what’s happening under the hood.
3
+ >
4
+ > 👉 **https://agentsfromscratch.com**
5
+ >
6
+ > If you prefer **long-form explanations, diagrams, and conceptual deep dives**, start there - then come back here to explore the code.
7
+
8
+
9
+ # AI Agents From Scratch
10
+
11
+ Learn to build AI agents locally without frameworks. Understand what happens under the hood before using production frameworks.
12
+
13
+ ## Purpose
14
+
15
+ This repository teaches you to build AI agents from first principles using **local LLMs** and **node-llama-cpp**. By working through these examples, you'll understand:
16
+
17
+ - How LLMs work at a fundamental level
18
+ - What agents really are (LLM + tools + patterns)
19
+ - How different agent architectures function
20
+ - Why frameworks make certain design choices
21
+
22
+ **Philosophy**: Learn by building. Understand deeply, then use frameworks wisely.
23
+
24
+ ## Related Projects
25
+
26
+ ### [AI Product from Scratch](https://github.com/pguso/ai-product-from-scratch)
27
+
28
+ [![TypeScript](https://img.shields.io/badge/TypeScript-007ACC?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
29
+ [![React](https://img.shields.io/badge/React-20232A?logo=react&logoColor=61DAFB)](https://reactjs.org/)
30
+ [![Node.js](https://img.shields.io/badge/Node.js-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
31
+
32
+ Learn AI product development fundamentals with local LLMs. Covers prompt engineering, structured output, multi-step reasoning, API design, and frontend integration through 10 comprehensive lessons with visual diagrams.
33
+
34
+ ### [AI Agents from Scratch in Python](https://github.com/pguso/agents-from-scratch)
35
+
36
+ ![Python](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=white)
37
+
38
+ ## Next Phase: Build LangChain & LangGraph Concepts From Scratch
39
+
40
+ > After mastering the fundamentals, the next stage of this project walks you through **re-implementing the core parts of LangChain and LangGraph** in plain JavaScript using local models.
41
+ > This is **not** about building a new framework, it’s about understanding *how frameworks work*.
42
+
43
+ ## Phase 1: Agent Fundamentals - From LLMs to ReAct
44
+
45
+ ### Prerequisites
46
+ - Node.js 18+
47
+ - At least 8GB RAM (16GB recommended)
48
+ - Download models and place in `./models/` folder, details in [DOWNLOAD.md](DOWNLOAD.md)
49
+
50
+ ### Installation
51
+ ```bash
52
+ npm install
53
+ ```
54
+
55
+ ### Run Examples
56
+ ```bash
57
+ node intro/intro.js
58
+ node simple-agent/simple-agent.js
59
+ node react-agent/react-agent.js
60
+ ```
61
+
62
+ ## Learning Path
63
+
64
+ Follow these examples in order to build understanding progressively:
65
+
66
+ ### 1. **Introduction** - Basic LLM Interaction
67
+ `intro/` | [Code](examples/01_intro/intro.js) | [Code Explanation](examples/01_intro/CODE.md) | [Concepts](examples/01_intro/CONCEPT.md)
68
+
69
+ **What you'll learn:**
70
+ - Loading and running a local LLM
71
+ - Basic prompt/response cycle
72
+
73
+ **Key concepts**: Model loading, context, inference pipeline, token generation
74
+
75
+ ---
76
+
77
+ ### 2. (Optional) **OpenAI Intro** - Using Proprietary Models
78
+ `openai-intro/` | [Code](examples/02_openai-intro/openai-intro.js) | [Code Explanation](examples/02_openai-intro/CODE.md) | [Concepts](examples/02_openai-intro/CONCEPT.md)
79
+
80
+ **What you'll learn:**
81
+ - How to call hosted LLMs (like GPT-4)
82
+ - Temperature Control
83
+ - Token Usage
84
+
85
+ **Key concepts**: Inference endpoints, network latency, cost vs control, data privacy, vendor dependence
86
+
87
+ ---
88
+
89
+ ### 3. **Translation** - System Prompts & Specialization
90
+ `translation/` | [Code](examples/03_translation/translation.js) | [Code Explanation](examples/03_translation/CODE.md) | [Concepts](examples/03_translation/CONCEPT.md)
91
+
92
+ **What you'll learn:**
93
+ - Using system prompts to specialize agents
94
+ - Output format control
95
+ - Role-based behavior
96
+ - Chat wrappers for different models
97
+
98
+ **Key concepts**: System prompts, agent specialization, behavioral constraints, prompt engineering
99
+
100
+ ---
101
+
102
+ ### 4. **Think** - Reasoning & Problem Solving
103
+ `think/` | [Code](examples/04_think/think.js) | [Code Explanation](examples/04_think/CODE.md) | [Concepts](examples/04_think/CONCEPT.md)
104
+
105
+ **What you'll learn:**
106
+ - Configuring LLMs for logical reasoning
107
+ - Complex quantitative problems
108
+ - Limitations of pure LLM reasoning
109
+ - When to use external tools
110
+
111
+ **Key concepts**: Reasoning agents, problem decomposition, cognitive tasks, reasoning limitations
112
+
113
+ ---
114
+
115
+ ### 5. **Batch** - Parallel Processing
116
+ `batch/` | [Code](examples/05_batch/batch.js) | [Code Explanation](examples/05_batch/CODE.md) | [Concepts](examples/05_batch/CONCEPT.md)
117
+
118
+ **What you'll learn:**
119
+ - Processing multiple requests concurrently
120
+ - Context sequences for parallelism
121
+ - GPU batch processing
122
+ - Performance optimization
123
+
124
+ **Key concepts**: Parallel execution, sequences, batch size, throughput optimization
125
+
126
+ ---
127
+
128
+ ### 6. **Coding** - Streaming & Response Control
129
+ `coding/` | [Code](examples/06_coding/coding.js) | [Code Explanation](examples/06_coding/CODE.md) | [Concepts](examples/06_coding/CONCEPT.md)
130
+
131
+ **What you'll learn:**
132
+ - Real-time streaming responses
133
+ - Token limits and budget management
134
+ - Progressive output display
135
+ - User experience optimization
136
+
137
+ **Key concepts**: Streaming, token-by-token generation, response control, real-time feedback
138
+
139
+ ---
140
+
141
+ ### 7. **Simple Agent** - Function Calling (Tools)
142
+ `simple-agent/` | [Code](examples/07_simple-agent/simple-agent.js) | [Code Explanation](examples/07_simple-agent/CODE.md) | [Concepts](examples/07_simple-agent/CONCEPT.md)
143
+
144
+ **What you'll learn:**
145
+ - Function calling / tool use fundamentals
146
+ - Defining tools the LLM can use
147
+ - JSON Schema for parameters
148
+ - How LLMs decide when to use tools
149
+
150
+ **Key concepts**: Function calling, tool definitions, agent decision making, action-taking
151
+
152
+ **This is where text generation becomes agency!**
153
+
154
+ ---
155
+
156
+ ### 8. **Simple Agent with Memory** - Persistent State
157
+ `simple-agent-with-memory/` | [Code](examples/08_simple-agent-with-memory/simple-agent-with-memory.js) | [Code Explanation](examples/08_simple-agent-with-memory/CODE.md) | [Concepts](examples/08_simple-agent-with-memory/CONCEPT.md)
158
+
159
+ **What you'll learn:**
160
+ - Persisting information across sessions
161
+ - Long-term memory management
162
+ - Facts and preferences storage
163
+ - Memory retrieval strategies
164
+
165
+ **Key concepts**: Persistent memory, state management, memory systems, context augmentation
166
+
167
+ ---
168
+
169
+ ### 9. **ReAct Agent** - Reasoning + Acting
170
+ `react-agent/` | [Code](examples/09_react-agent/react-agent.js) | [Code Explanation](examples/09_react-agent/CODE.md) | [Concepts](examples/09_react-agent/CONCEPT.md)
171
+
172
+ **What you'll learn:**
173
+ - ReAct pattern (Reason → Act → Observe)
174
+ - Iterative problem solving
175
+ - Step-by-step tool use
176
+ - Self-correction loops
177
+
178
+ **Key concepts**: ReAct pattern, iterative reasoning, observation-action cycles, multi-step agents
179
+
180
+ **This is the foundation of modern agent frameworks!**
181
+
182
+ ---
183
+
184
+ ### 10. **AoT Agent** - Atom of Thought Planning
185
+ `aot-agent/` | [Code](examples/10_aot-agent/aot-agent.js) | [Code Explanation](examples/10_aot-agent/CODE.md) | [Concepts](examples/10_aot-agent/CONCEPT.md)
186
+
187
+ **What you'll learn:**
188
+ - Atom of Thought methodology
189
+ - Atomic planning for multi-step computations
190
+ - Dependency management between operations
191
+ - Structured JSON output for reasoning plans
192
+ - Deterministic execution of plans
193
+
194
+ **Key concepts**: AoT planning, atomic operations, dependency resolution, plan validation, structured reasoning
195
+
196
+ ---
197
+
198
+ ## Documentation Structure
199
+
200
+ Each example folder contains:
201
+
202
+ - **`<name>.js`** - The working code example
203
+ - **`CODE.md`** - Step-by-step code explanation
204
+ - Line-by-line breakdowns
205
+ - What each part does
206
+ - How it works
207
+ - **`CONCEPT.md`** - High-level concepts
208
+ - Why it matters for agents
209
+ - Architectural patterns
210
+ - Real-world applications
211
+ - Simple diagrams
212
+
213
+ ## Core Concepts
214
+
215
+ ### What is an AI Agent?
216
+
217
+ ```
218
+ AI Agent = LLM + System Prompt + Tools + Memory + Reasoning Pattern
219
+ ─┬─ ──────┬────── ──┬── ──┬─── ────────┬────────
220
+ │ │ │ │ │
221
+ Brain Identity Hands State Strategy
222
+ ```
223
+
224
+ ### Evolution of Capabilities
225
+
226
+ ```
227
+ 1. intro → Basic LLM usage
228
+ 2. translation → Specialized behavior (system prompts)
229
+ 3. think → Reasoning ability
230
+ 4. batch → Parallel processing
231
+ 5. coding → Streaming & control
232
+ 6. simple-agent → Tool use (function calling)
233
+ 7. memory-agent → Persistent state
234
+ 8. react-agent → Strategic reasoning + tool use
235
+ ```
236
+
237
+ ### Architecture Patterns
238
+
239
+ **Simple Agent (Steps 1-5)**
240
+ ```
241
+ User → LLM → Response
242
+ ```
243
+
244
+ **Tool-Using Agent (Step 6)**
245
+ ```
246
+ User → LLM ⟷ Tools → Response
247
+ ```
248
+
249
+ **Memory Agent (Step 7)**
250
+ ```
251
+ User → LLM ⟷ Tools → Response
252
+
253
+ Memory
254
+ ```
255
+
256
+ **ReAct Agent (Step 8)**
257
+ ```
258
+ User → LLM → Think → Act → Observe
259
+ ↑ ↓ ↓ ↓
260
+ └──────┴──────┴──────┘
261
+ Iterate until solved
262
+ ```
263
+
264
+ ## ️ Helper Utilities
265
+
266
+ ### PromptDebugger
267
+ `helper/prompt-debugger.js`
268
+
269
+ Utility for debugging prompts sent to the LLM. Shows exactly what the model sees, including:
270
+ - System prompts
271
+ - Function definitions
272
+ - Conversation history
273
+ - Context state
274
+
275
+ Usage example in `simple-agent/simple-agent.js`
276
+
277
+ ## ️ Project Structure - Fundamentals
278
+
279
+ ```
280
+ ai-agents/
281
+ ├── README.md ← You are here
282
+ ├─ examples/
283
+ ├── 01_intro/
284
+ │ ├── intro.js
285
+ │ ├── CODE.md
286
+ │ └── CONCEPT.md
287
+ ├── 02_openai-intro/
288
+ │ ├── openai-intro.js
289
+ │ ├── CODE.md
290
+ │ └── CONCEPT.md
291
+ ├── 03_translation/
292
+ │ ├── translation.js
293
+ │ ├── CODE.md
294
+ │ └── CONCEPT.md
295
+ ├── 04_think/
296
+ │ ├── think.js
297
+ │ ├── CODE.md
298
+ │ └── CONCEPT.md
299
+ ├── 05_batch/
300
+ │ ├── batch.js
301
+ │ ├── CODE.md
302
+ │ └── CONCEPT.md
303
+ ├── 06_coding/
304
+ │ ├── coding.js
305
+ │ ├── CODE.md
306
+ │ └── CONCEPT.md
307
+ ├── 07_simple-agent/
308
+ │ ├── simple-agent.js
309
+ │ ├── CODE.md
310
+ │ └── CONCEPT.md
311
+ ├── 08_simple-agent-with-memory/
312
+ │ ├── simple-agent-with-memory.js
313
+ │ ├── memory-manager.js
314
+ │ ├── CODE.md
315
+ │ └── CONCEPT.md
316
+ ├── 09_react-agent/
317
+ │ ├── react-agent.js
318
+ │ ├── CODE.md
319
+ │ └── CONCEPT.md
320
+ ├── helper/
321
+ │ └── prompt-debugger.js
322
+ ├── models/ ← Place your GGUF models here
323
+ └── logs/ ← Debug outputs
324
+ ```
325
+
326
+ ## Phase 2: Building a Production Framework (Tutorial)
327
+
328
+ After mastering the fundamentals above, **Phase 2** takes you from scratch examples to production-grade framework design. You'll rebuild core concepts from **LangChain** and **LangGraph** to understand how real frameworks work internally.
329
+
330
+ ### What You'll Build
331
+
332
+ A lightweight but complete agent framework with:
333
+ - **Runnable Interface**, The composability pattern that powers everything
334
+ - **Message System**, Typed conversation structures (Human, AI, System, Tool)
335
+ - **Chains**, Composing multiple operations into pipelines
336
+ - **Memory**, Persistent state across conversations
337
+ - **Tools**, Function calling and external integrations
338
+ - **Agents**, Decision-making loops (ReAct, Tool-calling)
339
+ - **Graphs**, State machines for complex workflows (LangGraph concepts)
340
+
341
+ ### Learning Approach
342
+
343
+ **Tutorial-first**: Step-by-step lessons with exercises
344
+ **Implementation-driven**: Build each component yourself
345
+ **Framework-compatible**: Learn patterns used in LangChain.js
346
+
347
+ ### Structure Overview
348
+
349
+ ```
350
+ tutorial/
351
+ ├── 01-foundation/ # 1. Core Abstractions
352
+ │ ├── 01-runnable/
353
+ │ │ ├── lesson.md # Why Runnable matters
354
+ │ │ ├── exercises/ # Hands-on practice
355
+ │ │ └── solutions/ # Reference implementations
356
+ │ ├── 02-messages/ # Structuring conversations
357
+ │ ├── 03-llm-wrapper/ # Wrapping node-llama-cpp
358
+ │ └── 04-context/ # Configuration & callbacks
359
+
360
+ ├── 02-composition/ # 2. Building Chains
361
+ │ ├── 01-prompts/ # Template system
362
+ │ ├── 02-parsers/ # Structured outputs
363
+ │ ├── 03-llm-chain/ # Your first chain
364
+ │ ├── 04-piping/ # Composition patterns
365
+ │ └── 05-memory/ # Conversation state
366
+
367
+ ├── 03-agency/ # 3. Tools & Agents
368
+ │ ├── 01-tools/ # Function definitions
369
+ │ ├── 02-tool-executor/ # Safe execution
370
+ │ ├── 03-simple-agent/ # Basic agent loop
371
+ │ ├── 04-react-agent/ # Reasoning + Acting
372
+ │ └── 05-structured-agent/ # JSON mode
373
+
374
+ └── 04-graphs/ # 4. State Machines
375
+ ├── 01-state-basics/ # Nodes & edges
376
+ ├── 02-channels/ # State management
377
+ ├── 03-conditional-edges/ # Dynamic routing
378
+ ├── 04-executor/ # Running workflows
379
+ ├── 05-checkpointing/ # Persistence
380
+ └── 06-agent-graph/ # Agents as graphs
381
+
382
+ src/
383
+ ├── core/ # Runnable, Messages, Context
384
+ ├── llm/ # LlamaCppLLM wrapper
385
+ ├── prompts/ # Template system
386
+ ├── chains/ # LLMChain, SequentialChain
387
+ ├── tools/ # BaseTool, built-in tools
388
+ ├── agents/ # AgentExecutor, ReActAgent
389
+ ├── memory/ # BufferMemory, WindowMemory
390
+ └── graph/ # StateGraph, CompiledGraph
391
+ ```
392
+
393
+ ### Why This Matters
394
+
395
+ **Understanding beats using**: When you know how frameworks work internally, you can:
396
+ - Debug issues faster
397
+ - Customize behavior confidently
398
+ - Make architectural decisions wisely
399
+ - Build your own extensions
400
+ - Read framework source code fluently
401
+
402
+ **Learn once, use everywhere**: The patterns you'll learn (Runnable, composition, state machines) apply to:
403
+ - LangChain.js - You'll understand their abstractions
404
+ - LangGraph.js - You'll grasp state management
405
+ - Any agent framework - Same core concepts
406
+ - Your own projects - Build custom solutions
407
+
408
+ ### Getting Started with Phase 2
409
+
410
+ After completing the fundamentals (intro → react-agent), start the tutorial:
411
+
412
+ [Overview](tutorial/README.md)
413
+
414
+ ```bash
415
+ # Start with the foundation
416
+ cd tutorial/01-foundation/01-runnable
417
+ lesson.md # Read the lesson
418
+ node exercises/01-*.js # Complete exercises
419
+ node solutions/01-*-solution.js # Check your work
420
+ ```
421
+
422
+ Each lesson includes:
423
+ - **Conceptual explanation**, Why it matters
424
+ - **Code walkthrough**, How to build it
425
+ - **Exercises**, Practice implementing
426
+ - **Solutions**, Reference code
427
+ - **Real-world examples**, Practical usage
428
+
429
+ **Time commitment**: ~8 weeks, 3-5 hours/week
430
+
431
+ ### What You'll Achieve
432
+
433
+ By the end, you'll have:
434
+ 1. Built a working agent framework from scratch
435
+ 2. Understood how LangChain/LangGraph work internally
436
+ 3. Mastered composability patterns
437
+ 4. Created reusable components (tools, chains, agents)
438
+ 5. Implemented state machines for complex workflows
439
+ 6. Gained confidence to use or extend any framework
440
+
441
+ **Then**: Use LangChain.js in production, knowing exactly what happens under the hood.
442
+
443
+ ---
444
+
445
+ ## Key Takeaways
446
+
447
+ ### After Phase 1 (Fundamentals), you'll understand:
448
+
449
+ 1. **LLMs are stateless**: Context must be managed explicitly
450
+ 2. **System prompts shape behavior**: Same model, different roles
451
+ 3. **Function calling enables agency**: Tools transform text generators into agents
452
+ 4. **Memory is essential**: Agents need to remember across sessions
453
+ 5. **Reasoning patterns matter**: ReAct > simple prompting for complex tasks
454
+ 6. **Performance matters**: Parallel processing, streaming, token limits
455
+ 7. **Debugging is crucial**: See exactly what the model receives
456
+
457
+ ### After Phase 2 (Framework Tutorial), you'll master:
458
+
459
+ 1. **The Runnable pattern**: Why everything in frameworks uses one interface
460
+ 2. **Composition over configuration**: Building complex systems from simple parts
461
+ 3. **Message-driven architecture**: How frameworks structure conversations
462
+ 4. **Chain abstraction**: Connecting prompts, LLMs, and parsers seamlessly
463
+ 5. **Tool orchestration**: Safe execution with timeouts and error handling
464
+ 6. **Agent execution loops**: The mechanics of decision-making agents
465
+ 7. **State machines**: Managing complex workflows with graphs
466
+ 8. **Production patterns**: Error handling, retries, streaming, and debugging
467
+
468
+ ### What frameworks give you:
469
+
470
+ Now that you understand the fundamentals, frameworks like LangChain, CrewAI, or AutoGPT provide:
471
+ - Pre-built reasoning patterns and agent templates
472
+ - Extensive tool libraries and integrations
473
+ - Production-ready error handling and retries
474
+ - Multi-agent orchestration
475
+ - Observability and monitoring
476
+ - Community extensions and plugins
477
+
478
+ **You'll use them better because you know what they're doing under the hood.**
479
+
480
+ ## Additional Resources
481
+
482
+ - **node-llama-cpp**: [GitHub](https://github.com/withcatai/node-llama-cpp)
483
+ - **Model Hub**: [Hugging Face](https://huggingface.co/models?library=gguf)
484
+ - **GGUF Format**: Quantized models for local inference
485
+
486
+ ## Contributing
487
+
488
+ This is a learning resource. Feel free to:
489
+ - Suggest improvements to documentation
490
+ - Add more example patterns
491
+ - Fix bugs or unclear explanations
492
+ - Share what you built!
493
+
494
+ ## License
495
+
496
+ Educational resource - use and modify as needed for learning.
497
+
498
+ ---
499
+
500
+ **Built with ❤️ for people who want to truly understand AI agents**
501
+
502
+ Start with `intro/` and work your way through. Each example builds on the previous one. Read both CODE.md and CONCEPT.md for full understanding.
503
+
504
+ Happy learning!
SUMMARY_COMPOSITION.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tổng hợp kiến thức: AI Agents from Scratch - Phần Composition
2
+
3
+ Tài liệu này tổng hợp các khái niệm về cách kết hợp các thành phần (Composition) để tạo nên hệ thống AI mạnh mẽ hơn.
4
+
5
+ ## 1. Prompts (Mẫu câu lệnh)
6
+ Thay vì hardcode các chuỗi văn bản, chúng ta sử dụng các **Template** để quản lý đầu vào cho LLM.
7
+
8
+ * **PromptTemplate**: Mẫu cơ bản với các biến giữ chỗ (placeholders). Giúp tách biệt logic code khỏi nội dung văn bản.
9
+ * **ChatPromptTemplate**: Mẫu chuyên dụng cho các model chat (như GPT-4, Llama 3).
10
+ * Cấu trúc hóa hội thoại thành danh sách tin nhắn: `System`, `Human`, `AI`.
11
+ * Hỗ trợ tiêm biến vào từng loại tin nhắn.
12
+ * Là tiêu chuẩn cho các ứng dụng AI hiện đại.
13
+ * **PipelinePromptTemplate**: Cho phép ghép nối nhiều template nhỏ thành một template lớn, giúp quản lý các prompt phức tạp.
14
+
15
+ ## 2. Output Parsers (Bộ phân tích đầu ra)
16
+ Chuyển đổi văn bản thô từ LLM thành cấu trúc dữ liệu mà ứng dụng có thể sử dụng (JSON, Object, Array).
17
+
18
+ * **Vấn đề:** Output của LLM thường không nhất quán và khó parse bằng Regex.
19
+ * **StructuredOutputParser**: Công cụ mạnh mẽ nhất.
20
+ * **Schema Definition**: Định nghĩa rõ ràng các trường (fields), kiểu dữ liệu (type), mô tả (description) và giá trị cho phép (enum).
21
+ * **Format Instructions**: Parser tự động sinh ra hướng dẫn định dạng (ví dụ: "Respond in JSON format...") để chèn vào prompt.
22
+ * **Validation**: Tự động kiểm tra kết quả trả về có đúng schema hay không.
23
+ * **Lợi ích:** Đảm bảo tính ổn định (reliability) cho hệ thống, biến AI từ một "chatbot" thành một "công cụ xử lý dữ liệu".
24
+
25
+ ---
26
+ *Tài liệu được tạo tự động bởi Antigravity IDE sau quá trình tự học và phân tích code.*
SUMMARY_FOUNDATION.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tổng hợp kiến thức: AI Agents from Scratch - Phần Foundation
2
+
3
+ Tài liệu này tổng hợp các khái niệm cốt lõi đã học được từ 4 bài học đầu tiên trong series "AI Agents from Scratch".
4
+
5
+ ## 1. Runnable (Đơn vị thực thi)
6
+ **Runnable** là "viên gạch LEGO" của framework, chuẩn hóa giao diện cho mọi thành phần (LLM, Parser, Tool).
7
+
8
+ * **Hợp đồng (Contract):** Mọi Runnable đều phải triển khai phương thức `_call(input, config)`.
9
+ * **3 Phương thức thực thi:**
10
+ 1. `invoke(input)`: Chạy đơn lẻ (1 input -> 1 output).
11
+ 2. `stream(input)`: Trả về kết quả dạng dòng (chunks) theo thời gian thực.
12
+ 3. `batch([inputs])`: Xử lý song song một danh sách input để tăng hiệu suất.
13
+ * **Lợi ích:** Cho phép nối các thành phần khác nhau thành một chuỗi (chain) dễ dàng bằng `.pipe()`.
14
+
15
+ ## 2. Messages (Tin nhắn & Cấu trúc dữ liệu)
16
+ Thay vì sử dụng chuỗi văn bản thuần túy, hội thoại được cấu trúc hóa thành các đối tượng để dễ quản lý và phân loại.
17
+
18
+ * **Các loại tin nhắn:**
19
+ * `SystemMessage`: Chỉ thị hệ thống, thiết lập hành vi/nhân cách cho AI.
20
+ * `HumanMessage`: Tin nhắn từ người dùng.
21
+ * `AIMessage`: Phản hồi từ AI.
22
+ * `ToolMessage`: Kết quả trả về từ việc gọi công cụ (function calling).
23
+ * **Quản lý hội thoại:** Cần có cơ chế (như `ConversationHistory`) để lưu trữ, giới hạn độ dài (sliding window) và lọc tin nhắn theo loại.
24
+
25
+ ## 3. LLM Wrapper (Bọc mô hình ngôn ngữ)
26
+ **LLM Wrapper** biến đổi một thư viện LLM thô (như `node-llama-cpp`) thành một **Runnable**.
27
+
28
+ * **Vai trò:** Đóng vai trò như một Adapter (bộ chuyển đổi).
29
+ * **Chức năng:**
30
+ * Chuyển đổi input (chuỗi hoặc danh sách Message) thành format mà model hiểu được.
31
+ * Xử lý việc gọi model (generate/stream).
32
+ * Trả về kết quả dưới dạng `AIMessage`.
33
+ * **Kết quả:** Giúp thay thế model dễ dàng mà không ảnh hưởng đến phần còn lại của hệ thống.
34
+
35
+ ## 4. Context & Configuration (Ngữ cảnh & Cấu hình)
36
+ **RunnableConfig** là cơ chế truyền thông tin xuyên suốt chuỗi xử lý mà không làm rối mã nguồn.
37
+
38
+ * **Vấn đề giải quyết:** Tránh việc phải truyền tham số cấu hình (như `userId`, `debug flag`) qua từng hàm thủ công.
39
+ * **Thành phần của Config:**
40
+ * `callbacks`: Hệ thống hook để theo dõi (log, metrics) tại các điểm bắt đầu/kết thúc/lỗi.
41
+ * `metadata`: Dữ liệu ngữ cảnh (User ID, Session ID).
42
+ * `configurable`: Các tham số thay đổi lúc chạy (Runtime overrides), ví dụ: thay đổi `temperature` của LLM cho từng request cụ thể.
43
+ * **Ứng dụng:** Rất hữu ích cho A/B testing, logging tập trung và quản lý đa người dùng.
44
+
45
+ ---
46
+ *Tài liệu được tạo tự động bởi Antigravity IDE sau quá trình tự học và phân tích code.*
SUMMARY_FULL.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tổng hợp kiến thức: AI Agents from Scratch
2
+
3
+ Tài liệu này tổng hợp toàn bộ các khái niệm cốt lõi và mẫu thiết kế đã học được từ repository "AI Agents from Scratch".
4
+
5
+ ## PHẦN 1: FOUNDATION (NỀN TẢNG)
6
+
7
+ ### 1. Runnable (Đơn vị thực thi)
8
+ **Runnable** là "viên gạch LEGO" của framework, chuẩn hóa giao diện cho mọi thành phần.
9
+ * **Hợp đồng (Contract):** Triển khai phương thức `_call(input, config)`.
10
+ * **3 Chế độ:** `invoke` (đơn), `stream` (dòng), `batch` (song song).
11
+ * **Composition:** Dễ dàng nối chuỗi bằng `.pipe()`.
12
+
13
+ ### 2. Messages (Cấu trúc hội thoại)
14
+ Sử dụng các lớp đối tượng thay vì chuỗi trần.
15
+ * **SystemMessage**: Chỉ thị, nhân cách.
16
+ * **HumanMessage**: Input người dùng.
17
+ * **AIMessage**: Output mô hình.
18
+ * **ToolMessage**: Kết quả gọi hàm.
19
+
20
+ ### 3. LLM Wrapper
21
+ Đóng gói model thô (node-llama-cpp) thành một **Runnable** để đồng bộ hóa giao diện và dễ dàng thay thế.
22
+
23
+ ### 4. Context & Configuration
24
+ Truyền `RunnableConfig` xuyên suốt pipeline.
25
+ * `callbacks`: Logging, metrics, side-effects.
26
+ * `metadata`: Context người dùng/phiên.
27
+ * `configurable`: Runtime overrides (ví dụ: thay đổi temperature động).
28
+
29
+ ---
30
+
31
+ ## PHẦN 2: COMPOSITION (KẾT HỢP)
32
+
33
+ ### 1. Prompts
34
+ Quản lý đầu vào LLM bằng Templates.
35
+ * **PromptTemplate**: Tách logic khỏi văn bản, hỗ trợ biến số.
36
+ * **ChatPromptTemplate**: Cấu trúc hóa hội thoại đa lượt (Multi-turn conversation).
37
+
38
+ ### 2. Output Parsers
39
+ Chuyển đổi văn bản thô từ LLM thành dữ liệu có cấu trúc.
40
+ * **StructuredOutputParser**: Định nghĩa Schema (JSON), tự động sinh hướng dẫn định dạng (`format_instructions`) và validate kết quả. Giải quyết vấn đề output không nhất quán của LLM.
41
+
42
+ ---
43
+
44
+ ## PHẦN 3: PROJECT PATTERNS (MẪU THIẾT KẾ THỰC TẾ)
45
+
46
+ Từ dự án **Smart Email Classifier**, rút ra mẫu kiến trúc tham khảo cho các tác vụ phân loại/xử lý văn bản:
47
+
48
+ 1. **Separation of Concerns (Phân tách mối quan tâm):**
49
+ * `ParserRunnable`: Chỉ lo việc làm sạch và chuẩn hóa dữ liệu đầu vào.
50
+ * `ClassifierRunnable`: Chỉ lo việc gọi LLM và xử lý logic phân loại.
51
+ 2. **Pipeline:** Kết nối `Parser -> Classifier`.
52
+ 3. **Side Effects via Callbacks:** Sử dụng Callback để ghi log lịch sử và tính toán thống kê (Statistics), giữ cho code chính sạch sẽ.
53
+ 4. **Strict System Prompts:** Sử dụng System Prompt chi tiết để định nghĩa danh mục và ép kiểu JSON output.
54
+
55
+ ---
56
+ *Tài liệu được tạo tự động bởi Antigravity IDE.*
examples/01_intro/CODE.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: intro.js
2
+
3
+ This file demonstrates the most basic interaction with a local LLM (Large Language Model) using node-llama-cpp.
4
+
5
+ ## Step-by-Step Code Breakdown
6
+
7
+ ### 1. Import Required Modules
8
+ ```javascript
9
+ import {
10
+ getLlama,
11
+ LlamaChatSession,
12
+ } from "node-llama-cpp";
13
+ import {fileURLToPath} from "url";
14
+ import path from "path";
15
+ ```
16
+ - **getLlama**: Main function to initialize the llama.cpp runtime
17
+ - **LlamaChatSession**: Class for managing chat conversations with the model
18
+ - **fileURLToPath** and **path**: Standard Node.js modules for handling file paths
19
+
20
+ ### 2. Set Up Directory Path
21
+ ```javascript
22
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
23
+ ```
24
+ - Since ES modules don't have `__dirname` by default, we create it manually
25
+ - This gives us the directory path of the current file
26
+ - Needed to locate the model file relative to this script
27
+
28
+ ### 3. Initialize Llama Runtime
29
+ ```javascript
30
+ const llama = await getLlama();
31
+ ```
32
+ - Creates the main llama.cpp instance
33
+ - This initializes the underlying C++ runtime for model inference
34
+ - Must be done before loading any models
35
+
36
+ ### 4. Load the Model
37
+ ```javascript
38
+ const model = await llama.loadModel({
39
+ modelPath: path.join(
40
+ __dirname,
41
+ "../",
42
+ "models",
43
+ "Qwen3-1.7B-Q8_0.gguf"
44
+ )
45
+ });
46
+ ```
47
+ - Loads a quantized model file (GGUF format)
48
+ - **Qwen3-1.7B-Q8_0.gguf**: A 1.7 billion parameter model, quantized to 8-bit
49
+ - The model is stored in the `models` folder at the repository root
50
+ - Loading the model into memory takes a few seconds
51
+
52
+ ### 5. Create a Context
53
+ ```javascript
54
+ const context = await model.createContext();
55
+ ```
56
+ - A **context** represents the model's working memory
57
+ - It holds the conversation history and current state
58
+ - Has a fixed size limit (default: model's maximum context size)
59
+ - All prompts and responses are stored in this context
60
+
61
+ ### 6. Create a Chat Session
62
+ ```javascript
63
+ const session = new LlamaChatSession({
64
+ contextSequence: context.getSequence(),
65
+ });
66
+ ```
67
+ - **LlamaChatSession**: High-level API for chat-style interactions
68
+ - Uses a sequence from the context to maintain conversation state
69
+ - Automatically handles prompt formatting and response parsing
70
+
71
+ ### 7. Define the Prompt
72
+ ```javascript
73
+ const prompt = `do you know node-llama-cpp`;
74
+ ```
75
+ - Simple question to test if the model knows about the library we're using
76
+ - This will be sent to the model for processing
77
+
78
+ ### 8. Send Prompt and Get Response
79
+ ```javascript
80
+ const a1 = await session.prompt(prompt);
81
+ console.log("AI: " + a1);
82
+ ```
83
+ - **session.prompt()**: Sends the prompt to the model and waits for completion
84
+ - The model generates a response based on its training
85
+ - We log the response to the console with "AI:" prefix
86
+
87
+ ### 9. Clean Up Resources
88
+ ```javascript
89
+ session.dispose()
90
+ context.dispose()
91
+ model.dispose()
92
+ llama.dispose()
93
+ ```
94
+ - **Important**: Always dispose of resources when done
95
+ - Frees up memory and GPU resources
96
+ - Prevents memory leaks in long-running applications
97
+ - Must be done in this order (session → context → model → llama)
98
+
99
+ ## Key Concepts Demonstrated
100
+
101
+ 1. **Basic LLM initialization**: Loading a model and creating inference context
102
+ 2. **Simple prompting**: Sending a question and receiving a response
103
+ 3. **Resource management**: Proper cleanup of allocated resources
104
+
105
+ ## Expected Output
106
+
107
+ When you run this script, you should see output like:
108
+ ```
109
+ AI: Yes, I'm familiar with node-llama-cpp. It's a Node.js binding for llama.cpp...
110
+ ```
111
+
112
+ The exact response will vary based on the model's training data and generation parameters.
examples/01_intro/CONCEPT.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Basic LLM Interaction
2
+
3
+ ## Overview
4
+
5
+ This example introduces the fundamental concepts of working with a Large Language Model (LLM) running locally on your machine. It demonstrates the simplest possible interaction: loading a model and asking it a question.
6
+
7
+ ## What is a Local LLM?
8
+
9
+ A **Local LLM** is an AI language model that runs entirely on your own computer, without requiring internet connectivity or external API calls. Key benefits:
10
+
11
+ - **Privacy**: Your data never leaves your machine
12
+ - **Cost**: No per-token API charges
13
+ - **Control**: Full control over model selection and parameters
14
+ - **Offline**: Works without internet connection
15
+
16
+ ## Core Components
17
+
18
+ ### 1. Model Files (GGUF Format)
19
+
20
+ ```
21
+ ┌─────────────────────────────┐
22
+ │ Qwen3-1.7B-Q8_0.gguf │
23
+ │ (Model Weights File) │
24
+ │ │
25
+ │ • Stores learned patterns │
26
+ │ • Quantized for efficiency │
27
+ │ • Loaded into RAM/VRAM │
28
+ └─────────────────────────────┘
29
+ ```
30
+
31
+ - **GGUF**: File format optimized for llama.cpp
32
+ - **Quantization**: Reduces model size (e.g., 8-bit instead of 16-bit)
33
+ - **Trade-off**: Smaller size and faster speed vs. slight quality loss
34
+
35
+ ### 2. The Inference Pipeline
36
+
37
+ ```
38
+ User Input → Model → Generation → Response
39
+ ↓ ↓ ↓ ↓
40
+ "Hello" Context Sampling "Hi there!"
41
+ ```
42
+
43
+ **Flow Diagram:**
44
+ ```
45
+ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
46
+ │ Prompt │ --> │ Context │ --> │ Model │ --> │ Response │
47
+ │ │ │ (Memory) │ │(Weights) │ │ (Text) │
48
+ └──────────┘ └──────────┘ └──────────┘ └──────────┘
49
+ ```
50
+
51
+ ### 3. Context Window
52
+
53
+ The **context** is the model's working memory:
54
+
55
+ ```
56
+ ┌─────────────────────────────────────────┐
57
+ │ Context Window │
58
+ │ ┌─────────────────────────────────┐ │
59
+ │ │ System Prompt (if any) │ │
60
+ │ ├─────────────────────────────────┤ │
61
+ │ │ User: "do you know node-llama?" │ │
62
+ │ ├─────────────────────────────────┤ │
63
+ │ │ AI: "Yes, I'm familiar..." │ │
64
+ │ ├─────────────────────────────────┤ │
65
+ │ │ (Space for more conversation) │ │
66
+ │ └─────────────────────────────────┘ │
67
+ └─────────────────────────────────────────┘
68
+ ```
69
+
70
+ - Limited size (e.g., 2048, 4096, or 8192 tokens)
71
+ - When full, old messages must be removed
72
+ - All previous messages influence the next response
73
+
74
+ ## How LLMs Generate Responses
75
+
76
+ ### Token-by-Token Generation
77
+
78
+ LLMs don't generate entire sentences at once. They predict one **token** (word piece) at a time:
79
+
80
+ ```
81
+ Prompt: "What is AI?"
82
+
83
+ Generation Process:
84
+ "What is AI?" → [Model] → "AI"
85
+ "What is AI? AI" → [Model] → "is"
86
+ "What is AI? AI is" → [Model] → "a"
87
+ "What is AI? AI is a" → [Model] → "field"
88
+ ... continues until stop condition
89
+ ```
90
+
91
+ **Visualization:**
92
+ ```
93
+ Input Prompt
94
+
95
+ ┌────────────┐
96
+ │ Model │ → Token 1: "AI"
97
+ │ Processes │ → Token 2: "is"
98
+ │ & Predicts│ → Token 3: "a"
99
+ └────────────┘ → Token 4: "field"
100
+ → ...
101
+ ```
102
+
103
+ ## Key Concepts for AI Agents
104
+
105
+ ### 1. Stateless Processing
106
+ - Each prompt is independent unless you maintain context
107
+ - The model has no memory between different script runs
108
+ - To build an "agent", you need to:
109
+ - Keep the context alive between prompts
110
+ - Maintain conversation history
111
+ - Add tools/functions (covered in later examples)
112
+
113
+ ### 2. Prompt Engineering Basics
114
+ The way you phrase questions affects the response:
115
+
116
+ ```
117
+ ❌ Poor: "node-llama-cpp"
118
+ ✅ Better: "do you know node-llama-cpp"
119
+ ✅ Best: "Explain what node-llama-cpp is and how it works"
120
+ ```
121
+
122
+ ### 3. Resource Management
123
+ LLMs consume significant resources:
124
+
125
+ ```
126
+ Model Loading
127
+
128
+ ┌─────────────────┐
129
+ │ RAM/VRAM Usage │ ← Models need gigabytes
130
+ │ CPU/GPU Time │ ← Inference takes time
131
+ │ Memory Leaks? │ ← Must cleanup properly
132
+ └─────────────────┘
133
+
134
+ Proper Disposal
135
+ ```
136
+
137
+ ## Why This Matters for Agents
138
+
139
+ This basic example establishes the foundation for AI agents:
140
+
141
+ 1. **Agents need LLMs to "think"**: The model processes information and generates responses
142
+ 2. **Agents need context**: To maintain state across interactions
143
+ 3. **Agents need structure**: Later examples add tools, memory, and reasoning loops
144
+
145
+ ## Next Steps
146
+
147
+ After understanding basic prompting, explore:
148
+ - **System prompts**: Giving the model a specific role or behavior
149
+ - **Function calling**: Allowing the model to use tools
150
+ - **Memory**: Persisting information across sessions
151
+ - **Reasoning patterns**: Like ReAct (Reasoning + Acting)
152
+
153
+ ## Diagram: Complete Architecture
154
+
155
+ ```
156
+ ┌──────────────────────────────────────────────────┐
157
+ │ Your Application │
158
+ │ ┌────────────────────────────────────────────┐ │
159
+ │ │ node-llama-cpp Library │ │
160
+ │ │ ┌──────────────────────────────────────┐ │ │
161
+ │ │ │ llama.cpp (C++ Runtime) │ │ │
162
+ │ │ │ ┌────────────────────────────────┐ │ │ │
163
+ │ │ │ │ Model File (GGUF) │ │ │ │
164
+ │ │ │ │ • Qwen3-1.7B-Q8_0.gguf │ │ │ │
165
+ │ │ │ └────────────────────────────────┘ │ │ │
166
+ │ │ └──────────────────────────────────────┘ │ │
167
+ │ └────────────────────────────────────────────┘ │
168
+ └──────────────────────────────────────────────────┘
169
+
170
+ ┌──────────────┐
171
+ │ CPU / GPU │
172
+ └──────────────┘
173
+ ```
174
+
175
+ This layered architecture allows you to build sophisticated AI agents on top of basic LLM interactions.
examples/01_intro/intro.js ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {
2
+ getLlama,
3
+ LlamaChatSession,
4
+ } from "node-llama-cpp";
5
+ import {fileURLToPath} from "url";
6
+ import path from "path";
7
+
8
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
9
+
10
+
11
+ const llama = await getLlama();
12
+ const model = await llama.loadModel({
13
+ modelPath: path.join(
14
+ __dirname,
15
+ '..',
16
+ '..',
17
+ 'models',
18
+ 'Qwen3-1.7B-Q8_0.gguf'
19
+ )
20
+ });
21
+
22
+ const context = await model.createContext();
23
+ const session = new LlamaChatSession({
24
+ contextSequence: context.getSequence(),
25
+ });
26
+
27
+ const prompt = `do you know node-llama-cpp`;
28
+
29
+ const a1 = await session.prompt(prompt);
30
+ console.log("AI: " + a1);
31
+
32
+
33
+ session.dispose()
34
+ context.dispose()
35
+ model.dispose()
36
+ llama.dispose()
examples/02_openai-intro/CODE.md ADDED
@@ -0,0 +1,394 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: OpenAI Intro
2
+
3
+ This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up.
4
+
5
+ ## Requirements
6
+
7
+ Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method.
8
+
9
+ ### Get API Key
10
+
11
+ https://platform.openai.com/api-keys
12
+
13
+ ### Add Billing Method
14
+
15
+ https://platform.openai.com/settings/organization/billing/overview
16
+
17
+ ### Configure environment variables
18
+
19
+ ```bash
20
+ cp .env.example .env
21
+ ```
22
+ Then edit `.env` and add your actual API key.
23
+
24
+ ## Setup and Initialization
25
+
26
+ ```javascript
27
+ import OpenAI from 'openai';
28
+ import 'dotenv/config';
29
+
30
+ const client = new OpenAI({
31
+ apiKey: process.env.OPENAI_API_KEY,
32
+ });
33
+ ```
34
+
35
+ **What's happening:**
36
+ - `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js
37
+ - `import 'dotenv/config'` - Load environment variables from `.env` file
38
+ - `new OpenAI({...})` - Create a client instance that handles API authentication and requests
39
+ - `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!)
40
+
41
+ **Why it matters:** The client object is your interface to OpenAI's models. All API calls go through this client.
42
+
43
+ ---
44
+
45
+ ## Example 1: Basic Chat Completion
46
+
47
+ ```javascript
48
+ const response = await client.chat.completions.create({
49
+ model: 'gpt-4o',
50
+ messages: [
51
+ { role: 'user', content: 'What is node-llama-cpp?' }
52
+ ],
53
+ });
54
+
55
+ console.log(response.choices[0].message.content);
56
+ ```
57
+
58
+ **What's happening:**
59
+ - `chat.completions.create()` - The primary method for sending messages to ChatGPT models
60
+ - `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model)
61
+ - `messages` array - Contains the conversation history
62
+ - `role: 'user'` - Indicates this message comes from the user (you)
63
+ - `response.choices[0]` - The API returns an array of possible responses; we take the first one
64
+ - `message.content` - The actual text response from the AI
65
+
66
+ **Response structure:**
67
+ ```javascript
68
+ {
69
+ id: 'chatcmpl-...',
70
+ object: 'chat.completion',
71
+ created: 1234567890,
72
+ model: 'gpt-4o',
73
+ choices: [
74
+ {
75
+ index: 0,
76
+ message: {
77
+ role: 'assistant',
78
+ content: 'node-llama-cpp is a...'
79
+ },
80
+ finish_reason: 'stop'
81
+ }
82
+ ],
83
+ usage: {
84
+ prompt_tokens: 10,
85
+ completion_tokens: 50,
86
+ total_tokens: 60
87
+ }
88
+ }
89
+ ```
90
+
91
+ ---
92
+
93
+ ## Example 2: System Prompts
94
+
95
+ ```javascript
96
+ const response = await client.chat.completions.create({
97
+ model: 'gpt-4o',
98
+ messages: [
99
+ { role: 'system', content: 'You are a coding assistant that talks like a pirate.' },
100
+ { role: 'user', content: 'Explain what async/await does in JavaScript.' }
101
+ ],
102
+ });
103
+ ```
104
+
105
+ **What's happening:**
106
+ - `role: 'system'` - Special message type that sets the AI's behavior and personality
107
+ - System messages are processed first and influence all subsequent responses
108
+ - The model will maintain this behavior throughout the conversation
109
+
110
+ **Why it matters:** System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.).
111
+
112
+ **Key insight:** Same model + different system prompts = completely different agents!
113
+
114
+ ---
115
+
116
+ ## Example 3: Temperature Control
117
+
118
+ ```javascript
119
+ // Focused response
120
+ const focusedResponse = await client.chat.completions.create({
121
+ model: 'gpt-4o',
122
+ messages: [{ role: 'user', content: prompt }],
123
+ temperature: 0.2,
124
+ });
125
+
126
+ // Creative response
127
+ const creativeResponse = await client.chat.completions.create({
128
+ model: 'gpt-4o',
129
+ messages: [{ role: 'user', content: prompt }],
130
+ temperature: 1.5,
131
+ });
132
+ ```
133
+
134
+ **What's happening:**
135
+ - `temperature` - Controls randomness in the output (range: 0.0 to 2.0)
136
+ - **Low temperature (0.0 - 0.3):**
137
+ - More focused and deterministic
138
+ - Same input → similar output
139
+ - Best for: factual answers, code generation, data extraction
140
+ - **Medium temperature (0.7 - 1.0):**
141
+ - Balanced creativity and coherence
142
+ - Default for most use cases
143
+ - **High temperature (1.2 - 2.0):**
144
+ - More creative and varied
145
+ - Same input → very different outputs
146
+ - Best for: creative writing, brainstorming, story generation
147
+
148
+ **Real-world usage:**
149
+ - Code completion: temperature 0.2
150
+ - Customer support: temperature 0.5
151
+ - Creative content: temperature 1.2
152
+
153
+ ---
154
+
155
+ ## Example 4: Conversation Context
156
+
157
+ ```javascript
158
+ const messages = [
159
+ { role: 'system', content: 'You are a helpful coding tutor.' },
160
+ { role: 'user', content: 'What is a Promise in JavaScript?' },
161
+ ];
162
+
163
+ const response1 = await client.chat.completions.create({
164
+ model: 'gpt-4o',
165
+ messages: messages,
166
+ });
167
+
168
+ // Add AI response to history
169
+ messages.push(response1.choices[0].message);
170
+
171
+ // Add follow-up question
172
+ messages.push({ role: 'user', content: 'Can you show me a simple example?' });
173
+
174
+ // Second request with full context
175
+ const response2 = await client.chat.completions.create({
176
+ model: 'gpt-4o',
177
+ messages: messages,
178
+ });
179
+ ```
180
+
181
+ **What's happening:**
182
+ - OpenAI models are **stateless** - they don't remember previous conversations
183
+ - We maintain context by sending the entire conversation history with each request
184
+ - Each request is independent; you must include all relevant messages
185
+
186
+ **Message order in the array:**
187
+ 1. System prompt (optional, but recommended first)
188
+ 2. Previous user message
189
+ 3. Previous assistant response
190
+ 4. Current user message
191
+
192
+ **Why it matters:** This is how chatbots remember context. The full conversation is sent every time.
193
+
194
+ **Performance consideration:**
195
+ - More messages = more tokens = higher cost
196
+ - Longer conversations eventually hit token limits
197
+ - Real applications need conversation trimming or summarization strategies
198
+
199
+ ---
200
+
201
+ ## Example 5: Streaming Responses
202
+
203
+ ```javascript
204
+ const stream = await client.chat.completions.create({
205
+ model: 'gpt-4o',
206
+ messages: [
207
+ { role: 'user', content: 'Write a haiku about programming.' }
208
+ ],
209
+ stream: true, // Enable streaming
210
+ });
211
+
212
+ for await (const chunk of stream) {
213
+ const content = chunk.choices[0]?.delta?.content || '';
214
+ process.stdout.write(content);
215
+ }
216
+ ```
217
+
218
+ **What's happening:**
219
+ - `stream: true` - Instead of waiting for the complete response, receive it token-by-token
220
+ - `for await...of` - Iterate over the stream as chunks arrive
221
+ - `delta.content` - Each chunk contains a small piece of text (often just a word or partial word)
222
+ - `process.stdout.write()` - Write without newline to display text progressively
223
+
224
+ **Streaming vs. Non-streaming:**
225
+
226
+ **Non-streaming (default):**
227
+ ```
228
+ [Request sent]
229
+ [Wait 5 seconds...]
230
+ [Full response arrives]
231
+ ```
232
+
233
+ **Streaming:**
234
+ ```
235
+ [Request sent]
236
+ Once [chunk arrives: "Once"]
237
+ upon [chunk arrives: " upon"]
238
+ a [chunk arrives: " a"]
239
+ time [chunk arrives: " time"]
240
+ ...
241
+ ```
242
+
243
+ **Why it matters:**
244
+ - Better user experience (immediate feedback)
245
+ - Appears faster even though total time is similar
246
+ - Essential for real-time chat interfaces
247
+ - Allows early processing/display of partial results
248
+
249
+ **When to use streaming:**
250
+ - Interactive chat applications
251
+ - Long-form content generation
252
+ - When user experience matters more than simplicity
253
+
254
+ **When to NOT use streaming:**
255
+ - Simple scripts or automation
256
+ - When you need the complete response before processing
257
+ - Batch processing
258
+
259
+ ---
260
+
261
+ ## Example 6: Token Usage
262
+
263
+ ```javascript
264
+ const response = await client.chat.completions.create({
265
+ model: 'gpt-4o',
266
+ messages: [
267
+ { role: 'user', content: 'Explain recursion in 3 sentences.' }
268
+ ],
269
+ max_tokens: 100,
270
+ });
271
+
272
+ console.log("Token usage:");
273
+ console.log("- Prompt tokens: " + response.usage.prompt_tokens);
274
+ console.log("- Completion tokens: " + response.usage.completion_tokens);
275
+ console.log("- Total tokens: " + response.usage.total_tokens);
276
+ ```
277
+
278
+ **What's happening:**
279
+ - `max_tokens` - Limits the length of the AI's response
280
+ - `response.usage` - Contains token consumption details
281
+ - **Prompt tokens:** Your input (messages you sent)
282
+ - **Completion tokens:** AI's output (the response)
283
+ - **Total tokens:** Sum of both (what you're billed for)
284
+
285
+ **Understanding tokens:**
286
+ - Tokens ≠ words
287
+ - 1 token ≈ 0.75 words (in English)
288
+ - "hello" = 1 token
289
+ - "chatbot" = 2 tokens ("chat" + "bot")
290
+ - Punctuation and spaces count as tokens
291
+
292
+ **Why it matters:**
293
+ 1. **Cost control:** You pay per token
294
+ 2. **Context limits:** Models have maximum token limits (e.g., gpt-4o: 128,000 tokens)
295
+ 3. **Response control:** Use `max_tokens` to prevent overly long responses
296
+
297
+ **Practical limits:**
298
+ ```javascript
299
+ // Prevent runaway responses
300
+ max_tokens: 150, // ~100 words
301
+
302
+ // Brief responses
303
+ max_tokens: 50, // ~35 words
304
+
305
+ // Longer content
306
+ max_tokens: 1000, // ~750 words
307
+ ```
308
+
309
+ **Cost estimation (approximate):**
310
+ - GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens
311
+ - GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens
312
+
313
+ ---
314
+
315
+ ## Example 7: Model Comparison
316
+
317
+ ```javascript
318
+ // GPT-4o - Most capable
319
+ const gpt4Response = await client.chat.completions.create({
320
+ model: 'gpt-4o',
321
+ messages: [{ role: 'user', content: prompt }],
322
+ });
323
+
324
+ // GPT-3.5-turbo - Faster and cheaper
325
+ const gpt35Response = await client.chat.completions.create({
326
+ model: 'gpt-3.5-turbo',
327
+ messages: [{ role: 'user', content: prompt }],
328
+ });
329
+ ```
330
+
331
+ **Available models:**
332
+
333
+ | Model | Best For | Speed | Cost | Context Window |
334
+ |-------|----------|-------|------|----------------|
335
+ | `gpt-4o` | Complex tasks, reasoning, accuracy | Medium | $$$ | 128K tokens |
336
+ | `gpt-4o-mini` | Balanced performance/cost | Fast | $$ | 128K tokens |
337
+ | `gpt-3.5-turbo` | Simple tasks, high volume | Very Fast | $ | 16K tokens |
338
+
339
+ **Choosing the right model:**
340
+ - **Use GPT-4o when:**
341
+ - Complex reasoning required
342
+ - High accuracy is critical
343
+ - Working with code or technical content
344
+ - Quality > speed/cost
345
+
346
+ - **Use GPT-4o-mini when:**
347
+ - Need good performance at lower cost
348
+ - Most general-purpose tasks
349
+
350
+ - **Use GPT-3.5-turbo when:**
351
+ - Simple classification or extraction
352
+ - High-volume, low-complexity tasks
353
+ - Speed is critical
354
+ - Budget constraints
355
+
356
+ **Pro tip:** Start with gpt-4o for development, then evaluate if cheaper models work for your use case.
357
+
358
+ ---
359
+
360
+ ## Error Handling
361
+
362
+ ```javascript
363
+ try {
364
+ await basicCompletion();
365
+ } catch (error) {
366
+ console.error("Error:", error.message);
367
+ if (error.message.includes('API key')) {
368
+ console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");
369
+ }
370
+ }
371
+ ```
372
+
373
+ **Common errors:**
374
+ - `401 Unauthorized` - Invalid or missing API key
375
+ - `429 Too Many Requests` - Rate limit exceeded
376
+ - `500 Internal Server Error` - OpenAI service issue
377
+ - `Context length exceeded` - Too many tokens in conversation
378
+
379
+ **Best practices:**
380
+ - Always use try-catch with async calls
381
+ - Check error types and provide helpful messages
382
+ - Implement retry logic for transient failures
383
+ - Monitor token usage to avoid limit errors
384
+
385
+ ---
386
+
387
+ ## Key Takeaways
388
+
389
+ 1. **Stateless Nature:** Models don't remember. You send full context each time.
390
+ 2. **Message Roles:** `system` (behavior), `user` (input), `assistant` (AI response)
391
+ 3. **Temperature:** Controls creativity (0 = focused, 2 = creative)
392
+ 4. **Streaming:** Better UX for real-time applications
393
+ 5. **Token Management:** Monitor usage for cost and limits
394
+ 6. **Model Selection:** Choose based on task complexity and budget
examples/02_openai-intro/CONCEPT.md ADDED
@@ -0,0 +1,950 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concepts: Understanding OpenAI APIs
2
+
3
+ This guide explains the fundamental concepts behind working with OpenAI's language models, which form the foundation for building AI agents.
4
+
5
+ ## What is the OpenAI API?
6
+
7
+ The OpenAI API provides programmatic access to powerful language models like GPT-4o and GPT-3.5-turbo. Instead of running models locally, you send requests to OpenAI's servers and receive responses.
8
+
9
+ **Key characteristics:**
10
+ - **Cloud-based:** Models run on OpenAI's infrastructure
11
+ - **Pay-per-use:** Charged by token consumption
12
+ - **Production-ready:** Enterprise-grade reliability and performance
13
+ - **Latest models:** Immediate access to newest model releases
14
+
15
+ **Comparison with Local LLMs (like node-llama-cpp):**
16
+
17
+ | Aspect | OpenAI API | Local LLMs |
18
+ |--------|------------|------------|
19
+ | **Setup** | API key only | Download models, need GPU/RAM |
20
+ | **Cost** | Pay per token | Free after initial setup |
21
+ | **Performance** | Consistent, high-quality | Depends on your hardware |
22
+ | **Privacy** | Data sent to OpenAI | Completely local/private |
23
+ | **Scalability** | Unlimited (with payment) | Limited by your hardware |
24
+
25
+ ---
26
+
27
+ ## The Chat Completions API
28
+
29
+ ### Request-Response Cycle
30
+
31
+ ```
32
+ You (Client) OpenAI (Server)
33
+ | |
34
+ | POST /v1/chat/completions |
35
+ | { |
36
+ | model: "gpt-4o", |
37
+ | messages: [...] |
38
+ | } |
39
+ |------------------------------->|
40
+ | |
41
+ | [Processing...] |
42
+ | [Model inference] |
43
+ | [Generate response] |
44
+ | |
45
+ | Response |
46
+ | { |
47
+ | choices: [{ |
48
+ | message: { |
49
+ | content: "..." |
50
+ | } |
51
+ | }] |
52
+ | } |
53
+ |<-------------------------------|
54
+ | |
55
+ ```
56
+
57
+ **Key point:** Each request is independent. The API doesn't store conversation history.
58
+
59
+ ---
60
+
61
+ ## Message Roles: The Conversation Structure
62
+
63
+ Every message has a `role` that determines its purpose:
64
+
65
+ ### 1. System Messages
66
+
67
+ ```javascript
68
+ { role: 'system', content: 'You are a helpful Python tutor.' }
69
+ ```
70
+
71
+ **Purpose:** Define the AI's behavior, personality, and capabilities
72
+
73
+ **Think of it as:**
74
+ - The AI's "job description"
75
+ - Invisible to the end user
76
+ - Sets constraints and guidelines
77
+
78
+ **Examples:**
79
+ ```javascript
80
+ // Specialist agent
81
+ "You are an expert SQL database administrator."
82
+
83
+ // Tone and style
84
+ "You are a friendly customer support agent. Be warm and empathetic."
85
+
86
+ // Output format control
87
+ "You are a JSON API. Always respond with valid JSON, never plain text."
88
+
89
+ // Behavioral constraints
90
+ "You are a code reviewer. Be constructive and focus on best practices."
91
+ ```
92
+
93
+ **Best practices:**
94
+ - Keep it concise but specific
95
+ - Place at the beginning of the messages array
96
+ - Update it to change agent behavior
97
+ - Use for ethical guidelines and output formatting
98
+
99
+ ### 2. User Messages
100
+
101
+ ```javascript
102
+ { role: 'user', content: 'How do I use async/await?' }
103
+ ```
104
+
105
+ **Purpose:** Represent the human's input or questions
106
+
107
+ **Think of it as:**
108
+ - What you're asking the AI
109
+ - The prompt or query
110
+ - The instruction to follow
111
+
112
+ ### 3. Assistant Messages
113
+
114
+ ```javascript
115
+ { role: 'assistant', content: 'Async/await is a way to handle promises...' }
116
+ ```
117
+
118
+ **Purpose:** Represent the AI's previous responses
119
+
120
+ **Think of it as:**
121
+ - The AI's conversation history
122
+ - Context for follow-up questions
123
+ - What the AI has already said
124
+
125
+ ### Conversation Flow Example
126
+
127
+ ```javascript
128
+ [
129
+ { role: 'system', content: 'You are a math tutor.' },
130
+
131
+ // First exchange
132
+ { role: 'user', content: 'What is 15 * 24?' },
133
+ { role: 'assistant', content: '15 * 24 = 360' },
134
+
135
+ // Follow-up (knows context)
136
+ { role: 'user', content: 'What about dividing that by 3?' },
137
+ { role: 'assistant', content: '360 ÷ 3 = 120' },
138
+ ]
139
+ ```
140
+
141
+ **Why this matters:** The role structure enables:
142
+ 1. **Context awareness:** AI understands conversation history
143
+ 2. **Behavior control:** System prompts shape responses
144
+ 3. **Multi-turn conversations:** Natural back-and-forth dialogue
145
+
146
+ ---
147
+
148
+ ## Statelessness: A Critical Concept
149
+
150
+ **Most important principle:** OpenAI's API is stateless.
151
+
152
+ ### What does stateless mean?
153
+
154
+ Each API call is independent. The model doesn't remember previous requests.
155
+
156
+ ```
157
+ Request 1: "My name is Alice"
158
+ Response 1: "Hello Alice!"
159
+
160
+ Request 2: "What's my name?"
161
+ Response 2: "I don't know your name." ← No memory!
162
+ ```
163
+
164
+ ### How to maintain context
165
+
166
+ **You must send the full conversation history:**
167
+
168
+ ```javascript
169
+ const messages = [];
170
+
171
+ // First turn
172
+ messages.push({ role: 'user', content: 'My name is Alice' });
173
+ const response1 = await client.chat.completions.create({
174
+ model: 'gpt-4o',
175
+ messages: messages // ["My name is Alice"]
176
+ });
177
+ messages.push(response1.choices[0].message);
178
+
179
+ // Second turn - include full history
180
+ messages.push({ role: 'user', content: "What's my name?" });
181
+ const response2 = await client.chat.completions.create({
182
+ model: 'gpt-4o',
183
+ messages: messages // Full conversation!
184
+ });
185
+ ```
186
+
187
+ ### Implications
188
+
189
+ **Benefits:**
190
+ - ✅ Simple architecture (no server-side state)
191
+ - ✅ Easy to scale (any server can handle any request)
192
+ - ✅ Full control over context (you decide what to include)
193
+
194
+ **Challenges:**
195
+ - ❌ You manage conversation history
196
+ - ❌ Token costs increase with conversation length
197
+ - ❌ Must implement your own memory/persistence
198
+ - ❌ Context window limits eventually hit
199
+
200
+ **Real-world solutions:**
201
+ ```javascript
202
+ // Trim old messages when too long
203
+ if (messages.length > 20) {
204
+ messages = [messages[0], ...messages.slice(-10)]; // Keep system + last 10
205
+ }
206
+
207
+ // Summarize old context
208
+ if (totalTokens > 10000) {
209
+ const summary = await summarizeConversation(messages);
210
+ messages = [systemMessage, summary, ...recentMessages];
211
+ }
212
+ ```
213
+
214
+ ---
215
+
216
+ ## Temperature: Controlling Randomness
217
+
218
+ Temperature controls how "creative" or "random" the model's output is.
219
+
220
+ ### How it works technically
221
+
222
+ When generating each token, the model assigns probabilities to possible next tokens:
223
+
224
+ ```
225
+ Input: "The sky is"
226
+ Possible next tokens:
227
+ - "blue" → 70% probability
228
+ - "clear" → 15% probability
229
+ - "dark" → 10% probability
230
+ - "purple" → 5% probability
231
+ ```
232
+
233
+ **Temperature modifies these probabilities:**
234
+
235
+ **Temperature = 0.0 (Deterministic)**
236
+ ```
237
+ Always pick the highest probability token
238
+ "The sky is blue" ← Same output every time
239
+ ```
240
+
241
+ **Temperature = 0.7 (Balanced)**
242
+ ```
243
+ Sample probabilistically with slight randomness
244
+ "The sky is blue" or "The sky is clear"
245
+ ```
246
+
247
+ **Temperature = 1.5 (Creative)**
248
+ ```
249
+ Flatten probabilities, allow unlikely choices
250
+ "The sky is purple" or "The sky is dancing" ← More surprising!
251
+ ```
252
+
253
+ ### Practical Guidelines
254
+
255
+ **Temperature 0.0 - 0.3: Focused Tasks**
256
+ - Code generation
257
+ - Data extraction
258
+ - Factual Q&A
259
+ - Classification
260
+ - Translation
261
+
262
+ Example:
263
+ ```javascript
264
+ // Extract JSON from text - needs consistency
265
+ temperature: 0.1
266
+ ```
267
+
268
+ **Temperature 0.5 - 0.9: Balanced Tasks**
269
+ - General conversation
270
+ - Customer support
271
+ - Content summarization
272
+ - Educational content
273
+
274
+ Example:
275
+ ```javascript
276
+ // Friendly chatbot
277
+ temperature: 0.7
278
+ ```
279
+
280
+ **Temperature 1.0 - 2.0: Creative Tasks**
281
+ - Story writing
282
+ - Brainstorming
283
+ - Poetry/creative content
284
+ - Generating variations
285
+
286
+ Example:
287
+ ```javascript
288
+ // Generate 10 different marketing taglines
289
+ temperature: 1.3
290
+ ```
291
+
292
+ ---
293
+
294
+ ## Streaming: Real-time Responses
295
+
296
+ ### Non-Streaming (Default)
297
+
298
+ ```
299
+ User: "Tell me a story"
300
+ [Wait...]
301
+ [Wait...]
302
+ [Wait...]
303
+ Response: "Once upon a time, there was a..." (all at once)
304
+ ```
305
+
306
+ **Pros:**
307
+ - Simple to implement
308
+ - Easy to handle errors
309
+ - Get complete response before processing
310
+
311
+ **Cons:**
312
+ - Appears slow for long responses
313
+ - No feedback during generation
314
+ - Poor user experience for chat
315
+
316
+ ### Streaming
317
+
318
+ ```
319
+ User: "Tell me a story"
320
+ "Once"
321
+ "Once upon"
322
+ "Once upon a"
323
+ "Once upon a time"
324
+ "Once upon a time there"
325
+ ...
326
+ ```
327
+
328
+ **Pros:**
329
+ - Immediate feedback
330
+ - Appears faster
331
+ - Better user experience
332
+ - Can process tokens as they arrive
333
+
334
+ **Cons:**
335
+ - More complex code
336
+ - Harder error handling
337
+ - Can't see full response before displaying
338
+
339
+ ### When to Use Each
340
+
341
+ **Use Non-Streaming:**
342
+ - Batch processing scripts
343
+ - When you need to analyze the full response
344
+ - Simple command-line tools
345
+ - API endpoints that return complete results
346
+
347
+ **Use Streaming:**
348
+ - Chat interfaces
349
+ - Interactive applications
350
+ - Long-form content generation
351
+ - Any user-facing application where UX matters
352
+
353
+ ---
354
+
355
+ ## Tokens: The Currency of LLMs
356
+
357
+ ### What are tokens?
358
+
359
+ Tokens are the fundamental units that language models process. They're not exactly words, but pieces of text.
360
+
361
+ **Tokenization examples:**
362
+ ```
363
+ "Hello world" → ["Hello", " world"] = 2 tokens
364
+ "coding" → ["coding"] = 1 token
365
+ "uncoded" → ["un", "coded"] = 2 tokens
366
+ ```
367
+
368
+ ### Why tokens matter
369
+
370
+ **1. Cost**
371
+ You pay per token (input + output):
372
+ ```
373
+ Request: 100 tokens
374
+ Response: 150 tokens
375
+ Total billed: 250 tokens
376
+ ```
377
+
378
+ **2. Context Limits**
379
+ Each model has a maximum token limit:
380
+ ```
381
+ gpt-4o: 128,000 tokens (≈96,000 words)
382
+ gpt-3.5-turbo: 16,384 tokens (≈12,000 words)
383
+ ```
384
+
385
+ **3. Performance**
386
+ More tokens = longer processing time and higher cost
387
+
388
+ ### Managing Token Usage
389
+
390
+ **Monitor usage:**
391
+ ```javascript
392
+ console.log(response.usage.total_tokens);
393
+ // Track cumulative usage for budgeting
394
+ ```
395
+
396
+ **Limit response length:**
397
+ ```javascript
398
+ max_tokens: 150 // Cap the response
399
+ ```
400
+
401
+ **Trim conversation history:**
402
+ ```javascript
403
+ // Keep only recent messages
404
+ if (messages.length > 20) {
405
+ messages = messages.slice(-20);
406
+ }
407
+ ```
408
+
409
+ **Estimate before sending:**
410
+ ```javascript
411
+ import { encode } from 'gpt-tokenizer';
412
+
413
+ const text = "Your message here";
414
+ const tokens = encode(text).length;
415
+ console.log(`Estimated tokens: ${tokens}`);
416
+ ```
417
+
418
+ ---
419
+
420
+ ## Model Selection: Choosing the Right Tool
421
+
422
+ ### GPT-4o: The Powerhouse
423
+
424
+ **Best for:**
425
+ - Complex reasoning tasks
426
+ - Code generation and debugging
427
+ - Technical content
428
+ - Tasks requiring high accuracy
429
+ - Working with structured data
430
+
431
+ **Characteristics:**
432
+ - Most capable model
433
+ - Higher cost
434
+ - Slower than GPT-3.5
435
+ - Best for quality-critical applications
436
+
437
+ **Example use cases:**
438
+ - Legal document analysis
439
+ - Complex code refactoring
440
+ - Research and analysis
441
+ - Educational tutoring
442
+
443
+ ### GPT-4o-mini: The Balanced Choice
444
+
445
+ **Best for:**
446
+ - General-purpose applications
447
+ - Good balance of cost and performance
448
+ - Most everyday tasks
449
+
450
+ **Characteristics:**
451
+ - Good performance
452
+ - Moderate cost
453
+ - Fast response times
454
+ - Sweet spot for many applications
455
+
456
+ **Example use cases:**
457
+ - Customer support chatbots
458
+ - Content summarization
459
+ - General Q&A
460
+ - Moderate complexity tasks
461
+
462
+ ### GPT-3.5-turbo: The Speed Demon
463
+
464
+ **Best for:**
465
+ - High-volume, simple tasks
466
+ - Speed-critical applications
467
+ - Budget-conscious projects
468
+ - Classification and extraction
469
+
470
+ **Characteristics:**
471
+ - Very fast
472
+ - Lowest cost
473
+ - Good for simple tasks
474
+ - Less capable reasoning
475
+
476
+ **Example use cases:**
477
+ - Sentiment analysis
478
+ - Text classification
479
+ - Simple formatting
480
+ - High-throughput processing
481
+
482
+ ### Decision Framework
483
+
484
+ ```
485
+ Is task critical and complex?
486
+ ├─ YES → GPT-4o
487
+ └─ NO
488
+ └─ Is speed important and task simple?
489
+ ├─ YES → GPT-3.5-turbo
490
+ └─ NO → GPT-4o-mini
491
+ ```
492
+
493
+ ---
494
+
495
+ ## Error Handling and Resilience
496
+
497
+ ### Common Error Scenarios
498
+
499
+ **1. Authentication Errors (401)**
500
+ ```javascript
501
+ // Invalid API key
502
+ Error: Incorrect API key provided
503
+ ```
504
+
505
+ **2. Rate Limiting (429)**
506
+ ```javascript
507
+ // Too many requests
508
+ Error: Rate limit exceeded
509
+ ```
510
+
511
+ **3. Token Limits (400)**
512
+ ```javascript
513
+ // Context too long
514
+ Error: This model's maximum context length is 16385 tokens
515
+ ```
516
+
517
+ **4. Service Errors (500)**
518
+ ```javascript
519
+ // OpenAI service issue
520
+ Error: The server had an error processing your request
521
+ ```
522
+
523
+ ### Best Practices
524
+
525
+ **1. Always use try-catch:**
526
+ ```javascript
527
+ try {
528
+ const response = await client.chat.completions.create({...});
529
+ } catch (error) {
530
+ if (error.status === 429) {
531
+ // Implement backoff and retry
532
+ } else if (error.status === 500) {
533
+ // Retry with exponential backoff
534
+ } else {
535
+ // Log and handle appropriately
536
+ }
537
+ }
538
+ ```
539
+
540
+ **2. Implement retry logic:**
541
+ ```javascript
542
+ async function retryWithBackoff(fn, maxRetries = 3) {
543
+ for (let i = 0; i < maxRetries; i++) {
544
+ try {
545
+ return await fn();
546
+ } catch (error) {
547
+ if (i === maxRetries - 1) throw error;
548
+ await sleep(Math.pow(2, i) * 1000); // Exponential backoff
549
+ }
550
+ }
551
+ }
552
+ ```
553
+
554
+ **3. Monitor token usage:**
555
+ ```javascript
556
+ let totalTokens = 0;
557
+ totalTokens += response.usage.total_tokens;
558
+
559
+ if (totalTokens > MONTHLY_BUDGET_TOKENS) {
560
+ throw new Error('Monthly token budget exceeded');
561
+ }
562
+ ```
563
+
564
+ ---
565
+
566
+ ## Architectural Patterns
567
+
568
+ ### Pattern 1: Simple Request-Response
569
+
570
+ **Use case:** One-off queries, simple automation
571
+
572
+ ```javascript
573
+ const response = await client.chat.completions.create({
574
+ model: 'gpt-4o',
575
+ messages: [{ role: 'user', content: query }]
576
+ });
577
+ ```
578
+
579
+ **Pros:** Simple, easy to understand
580
+ **Cons:** No context, no memory
581
+
582
+ ### Pattern 2: Stateful Conversation
583
+
584
+ **Use case:** Chat applications, tutoring, customer support
585
+
586
+ ```javascript
587
+ class Conversation {
588
+ constructor() {
589
+ this.messages = [
590
+ { role: 'system', content: 'Your behavior' }
591
+ ];
592
+ }
593
+
594
+ async ask(userMessage) {
595
+ this.messages.push({ role: 'user', content: userMessage });
596
+
597
+ const response = await client.chat.completions.create({
598
+ model: 'gpt-4o',
599
+ messages: this.messages
600
+ });
601
+
602
+ this.messages.push(response.choices[0].message);
603
+ return response.choices[0].message.content;
604
+ }
605
+ }
606
+ ```
607
+
608
+ **Pros:** Maintains context, natural conversation
609
+ **Cons:** Token costs grow, needs management
610
+
611
+ ### Pattern 3: Specialized Agents
612
+
613
+ **Use case:** Domain-specific applications
614
+
615
+ ```javascript
616
+ class PythonTutor {
617
+ async help(question) {
618
+ return await client.chat.completions.create({
619
+ model: 'gpt-4o',
620
+ messages: [
621
+ {
622
+ role: 'system',
623
+ content: 'You are an expert Python tutor. Explain concepts clearly with code examples.'
624
+ },
625
+ { role: 'user', content: question }
626
+ ],
627
+ temperature: 0.3 // Focused responses
628
+ });
629
+ }
630
+ }
631
+ ```
632
+
633
+ **Pros:** Consistent behavior, optimized for domain
634
+ **Cons:** Less flexible
635
+
636
+ ---
637
+
638
+ ## Hybrid Approach: Combining Proprietary and Open Source Models
639
+
640
+ In real-world projects, the best solution often isn't choosing between OpenAI and local LLMs - it's using **both strategically**.
641
+
642
+ ### Why Use a Hybrid Approach?
643
+
644
+ **Cost optimization:** Use expensive models only when necessary
645
+ **Privacy compliance:** Keep sensitive data local while leveraging cloud for general tasks
646
+ **Performance balance:** Fast local models for simple tasks, powerful cloud models for complex ones
647
+ **Reliability:** Fallback options when one service is down
648
+ **Flexibility:** Match the right tool to each specific task
649
+
650
+ ### Common Hybrid Architectures
651
+
652
+ #### Pattern 1: Tiered Processing
653
+
654
+ ```
655
+ Simple tasks → Local LLM (fast, free, private)
656
+ ↓ If complex
657
+ Complex tasks → OpenAI API (powerful, accurate)
658
+ ```
659
+
660
+ **Example workflow:**
661
+ ```javascript
662
+ async function processQuery(query) {
663
+ const complexity = await assessComplexity(query);
664
+
665
+ if (complexity < 0.5) {
666
+ // Use local model for simple queries
667
+ return await localLLM.generate(query);
668
+ } else {
669
+ // Use OpenAI for complex reasoning
670
+ return await openai.chat.completions.create({
671
+ model: 'gpt-4o',
672
+ messages: [{ role: 'user', content: query }]
673
+ });
674
+ }
675
+ }
676
+ ```
677
+
678
+ **Use cases:**
679
+ - Customer support: Local model for FAQs, GPT-4 for complex issues
680
+ - Code generation: Local for simple scripts, GPT-4 for architecture
681
+ - Content moderation: Local for obvious cases, cloud for edge cases
682
+
683
+ #### Pattern 2: Privacy-Based Routing
684
+
685
+ ```
686
+ Public data → OpenAI (best quality)
687
+ Sensitive data → Local LLM (private, secure)
688
+ ```
689
+
690
+ **Example:**
691
+ ```javascript
692
+ async function handleRequest(data, containsSensitiveInfo) {
693
+ if (containsSensitiveInfo) {
694
+ // Process locally - data never leaves your infrastructure
695
+ return await localLLM.generate(data, {
696
+ systemPrompt: "You are a HIPAA-compliant assistant"
697
+ });
698
+ } else {
699
+ // Use cloud for better quality
700
+ return await openai.chat.completions.create({
701
+ model: 'gpt-4o',
702
+ messages: [{ role: 'user', content: data }]
703
+ });
704
+ }
705
+ }
706
+ ```
707
+
708
+ **Use cases:**
709
+ - Healthcare: Patient data → Local, General medical info → OpenAI
710
+ - Finance: Transaction details → Local, Market analysis → OpenAI
711
+ - Legal: Client communications → Local, Legal research → OpenAI
712
+
713
+ #### Pattern 3: Specialized Agent Ecosystem
714
+
715
+ ```
716
+ Agent 1 (Local): Fast classifier
717
+ ↓ Routes to
718
+ Agent 2 (OpenAI): Deep analyzer
719
+ ↓ Routes to
720
+ Agent 3 (Local): Action executor
721
+ ```
722
+
723
+ **Example:**
724
+ ```javascript
725
+ class MultiModelAgent {
726
+ async process(input) {
727
+ // Step 1: Local model classifies intent (fast, cheap)
728
+ const intent = await localLLM.classify(input);
729
+
730
+ // Step 2: Route to appropriate handler
731
+ if (intent.requiresReasoning) {
732
+ // Complex reasoning with GPT-4
733
+ const analysis = await openai.chat.completions.create({
734
+ model: 'gpt-4o',
735
+ messages: [{ role: 'user', content: input }]
736
+ });
737
+ return analysis.choices[0].message.content;
738
+ } else {
739
+ // Simple response with local model
740
+ return await localLLM.generate(input);
741
+ }
742
+ }
743
+ }
744
+ ```
745
+
746
+ **Use cases:**
747
+ - Multi-stage pipelines with different complexity levels
748
+ - Agent systems where each agent has specialized capabilities
749
+ - Workflows requiring both speed and intelligence
750
+
751
+ #### Pattern 4: Development vs Production
752
+
753
+ ```
754
+ Development → OpenAI (fast iteration, best results)
755
+ ↓ Optimize
756
+ Production → Local LLM (cost-effective, private)
757
+ ```
758
+
759
+ **Workflow:**
760
+ ```javascript
761
+ const MODEL_PROVIDER = process.env.NODE_ENV === 'production'
762
+ ? 'local'
763
+ : 'openai';
764
+
765
+ async function generateResponse(prompt) {
766
+ if (MODEL_PROVIDER === 'local') {
767
+ return await localLLM.generate(prompt);
768
+ } else {
769
+ return await openai.chat.completions.create({
770
+ model: 'gpt-4o',
771
+ messages: [{ role: 'user', content: prompt }]
772
+ });
773
+ }
774
+ }
775
+ ```
776
+
777
+ **Strategy:**
778
+ 1. Develop with GPT-4 to get best results quickly
779
+ 2. Fine-tune prompts and test thoroughly
780
+ 3. Switch to local model for production
781
+ 4. Fall back to OpenAI for edge cases
782
+
783
+ #### Pattern 5: Ensemble Approach
784
+
785
+ ```
786
+ Query → [Local Model, OpenAI, Another API]
787
+ ↓ ↓ ↓
788
+ Response Response Response
789
+ ↓ ↓ ↓
790
+ Aggregator / Validator
791
+
792
+ Best Response
793
+ ```
794
+
795
+ **Example:**
796
+ ```javascript
797
+ async function ensembleGenerate(prompt) {
798
+ // Get responses from multiple sources
799
+ const [local, openai, backup] = await Promise.allSettled([
800
+ localLLM.generate(prompt),
801
+ openaiClient.chat.completions.create({
802
+ model: 'gpt-4o',
803
+ messages: [{ role: 'user', content: prompt }]
804
+ }),
805
+ backupAPI.generate(prompt)
806
+ ]);
807
+
808
+ // Use validator to pick best or combine
809
+ return validator.selectBest([local, openai, backup]);
810
+ }
811
+ ```
812
+
813
+ **Use cases:**
814
+ - Critical applications requiring high confidence
815
+ - Fact-checking and verification
816
+ - Reducing hallucinations through consensus
817
+
818
+ ### Cost-Benefit Analysis
819
+
820
+ #### Scenario: Customer Support Chatbot (10,000 queries/day)
821
+
822
+ **Option A: OpenAI Only**
823
+ ```
824
+ 10,000 queries × 500 tokens avg = 5M tokens/day
825
+ Cost: ~$25-50/day = ~$750-1500/month
826
+ Pros: Highest quality, zero infrastructure
827
+ Cons: Expensive at scale, privacy concerns
828
+ ```
829
+
830
+ **Option B: Local LLM Only**
831
+ ```
832
+ Infrastructure: $100-500/month (server/GPU)
833
+ Cost: $100-500/month
834
+ Pros: Predictable costs, private, unlimited usage
835
+ Cons: Setup complexity, maintenance, lower quality
836
+ ```
837
+
838
+ **Option C: Hybrid (80% local, 20% OpenAI)**
839
+ ```
840
+ 8,000 simple queries → Local LLM (free after setup)
841
+ 2,000 complex queries → OpenAI (~$5-10/day)
842
+ Infrastructure: $100-500/month
843
+ API costs: $150-300/month
844
+ Total: $250-800/month
845
+ Pros: Cost-effective, high quality when needed, flexible
846
+ Cons: More complex architecture
847
+ ```
848
+
849
+ **Winner for most projects: Hybrid approach** ✓
850
+
851
+ ### Decision Framework
852
+
853
+ ```
854
+ START: New query arrives
855
+
856
+ Is data sensitive/regulated?
857
+ ├─ YES → Use local model (privacy first)
858
+ └─ NO → Continue
859
+
860
+ Is task simple/repetitive?
861
+ ├─ YES → Use local model (cost-effective)
862
+ └─ NO → Continue
863
+
864
+ Is high accuracy critical?
865
+ ├─ YES → Use OpenAI (quality first)
866
+ └─ NO → Continue
867
+
868
+ Is it high volume?
869
+ ├─ YES → Use local model (cost at scale)
870
+ └─ NO → Use OpenAI (simplicity)
871
+ ```
872
+
873
+ ### The Future: Intelligent Model Selection
874
+
875
+ Advanced systems will automatically choose models based on real-time factors:
876
+
877
+ ```javascript
878
+ class IntelligentModelSelector {
879
+ async selectModel(query, context) {
880
+ const factors = {
881
+ complexity: await this.analyzeComplexity(query),
882
+ latency: context.userTolerance,
883
+ budget: context.remainingBudget,
884
+ accuracy: context.requiredConfidence,
885
+ privacy: context.dataClassification
886
+ };
887
+
888
+ // ML model predicts best provider
889
+ const selection = await this.mlSelector.predict(factors);
890
+
891
+ return {
892
+ provider: selection.provider, // 'local' | 'openai-mini' | 'openai-4'
893
+ confidence: selection.confidence,
894
+ reasoning: selection.reasoning
895
+ };
896
+ }
897
+ }
898
+ ```
899
+
900
+ ### Key Takeaway
901
+
902
+ **You don't have to choose.** Modern AI applications benefit from using the right model for each task:
903
+ - **OpenAI / Claude / Host own big open source models:** Complex reasoning, critical accuracy, rapid development
904
+ - **Local for scale:** Privacy, cost control, high volume, offline operation
905
+ - **Both for success:** Cost-effective, flexible, reliable production systems
906
+
907
+ The best architecture leverages the strengths of each approach while mitigating their weaknesses.
908
+
909
+ ---
910
+
911
+ ## Preparing for Agents
912
+
913
+ The concepts covered here are **foundational** for building AI agents:
914
+
915
+ ### You now understand:
916
+
917
+ - **How to communicate with LLMs** (API basics)
918
+ - **How to shape behavior** (system prompts)
919
+ - **How to maintain context** (message history)
920
+ - **How to control output** (temperature, tokens)
921
+ - **How to handle responses** (streaming, errors)
922
+
923
+ ### What's next for agents:
924
+
925
+ - **Function calling / Tool use** - Let the AI take actions
926
+ - **Memory systems** - Persistent state across sessions
927
+ - **ReAct patterns** - Iterative reasoning and observation
928
+
929
+ **Bottom line:** You can't build good agents without mastering these fundamentals. Every agent pattern builds on this foundation.
930
+
931
+ ---
932
+
933
+ ## Key Insights
934
+
935
+ 1. **Statelessness is power and burden:** You control context, but you must manage it
936
+ 2. **System prompts are your secret weapon:** Same model → different behaviors
937
+ 3. **Temperature changes everything:** Match it to your task type
938
+ 4. **Tokens are the real currency:** Monitor and optimize usage
939
+ 5. **Model choice matters:** Don't use a sledgehammer for a nail
940
+ 6. **Streaming improves UX:** Use it for user-facing applications
941
+ 7. **Error handling is not optional:** The network will fail, plan for it
942
+
943
+ ---
944
+
945
+ ## Further Reading
946
+
947
+ - [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
948
+ - [OpenAI Cookbook](https://cookbook.openai.com/)
949
+ - [Best Practices for Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering)
950
+ - [Token Counting](https://platform.openai.com/tokenizer)
examples/02_openai-intro/openai-intro.js ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import OpenAI from 'openai';
2
+ import 'dotenv/config';
3
+
4
+ // Initialize OpenAI client
5
+ // Create an API key at https://platform.openai.com/api-keys
6
+ const client = new OpenAI({
7
+ apiKey: process.env.OPENAI_API_KEY,
8
+ });
9
+
10
+ console.log("=== OpenAI Intro: Understanding the Basics ===\n");
11
+
12
+ // ============================================
13
+ // EXAMPLE 1: Basic Chat Completion
14
+ // ============================================
15
+ async function basicCompletion() {
16
+ console.log("--- Example 1: Basic Chat Completion ---");
17
+
18
+ const response = await client.chat.completions.create({
19
+ model: 'gpt-4o',
20
+ messages: [
21
+ { role: 'user', content: 'What is node-llama-cpp?' }
22
+ ],
23
+ });
24
+
25
+ console.log("AI: " + response.choices[0].message.content);
26
+ console.log("\n");
27
+ }
28
+
29
+ // ============================================
30
+ // EXAMPLE 2: Using System Prompts
31
+ // ============================================
32
+ async function systemPromptExample() {
33
+ console.log("--- Example 2: System Prompts (Behavioral Control) ---");
34
+
35
+ const response = await client.chat.completions.create({
36
+ model: 'gpt-4o',
37
+ messages: [
38
+ { role: 'system', content: 'You are a coding assistant that talks like a pirate.' },
39
+ { role: 'user', content: 'Explain what async/await does in JavaScript.' }
40
+ ],
41
+ });
42
+
43
+ console.log("AI: " + response.choices[0].message.content);
44
+ console.log("\n");
45
+ }
46
+
47
+ // ============================================
48
+ // EXAMPLE 3: Temperature and Creativity
49
+ // ============================================
50
+ async function temperatureExample() {
51
+ console.log("--- Example 3: Temperature Control ---");
52
+
53
+ const prompt = "Write a one-sentence tagline for a coffee shop.";
54
+
55
+ // Low temperature = more focused and deterministic
56
+ const focusedResponse = await client.chat.completions.create({
57
+ model: 'gpt-4o',
58
+ messages: [{ role: 'user', content: prompt }],
59
+ temperature: 0.2,
60
+ });
61
+
62
+ // High temperature = more creative and varied
63
+ const creativeResponse = await client.chat.completions.create({
64
+ model: 'gpt-4o',
65
+ messages: [{ role: 'user', content: prompt }],
66
+ temperature: 1.5,
67
+ });
68
+
69
+ console.log("Low temp (0.2): " + focusedResponse.choices[0].message.content);
70
+ console.log("High temp (1.5): " + creativeResponse.choices[0].message.content);
71
+ console.log("\n");
72
+ }
73
+
74
+ // ============================================
75
+ // EXAMPLE 4: Conversation with Context
76
+ // ============================================
77
+ async function conversationContext() {
78
+ console.log("--- Example 4: Multi-turn Conversation ---");
79
+
80
+ // Build conversation history
81
+ const messages = [
82
+ { role: 'system', content: 'You are a helpful coding tutor.' },
83
+ { role: 'user', content: 'What is a Promise in JavaScript?' },
84
+ ];
85
+
86
+ // First response
87
+ const response1 = await client.chat.completions.create({
88
+ model: 'gpt-4o',
89
+ messages: messages,
90
+ max_tokens: 150,
91
+ });
92
+
93
+ console.log("User: What is a Promise in JavaScript?");
94
+ console.log("AI: " + response1.choices[0].message.content);
95
+
96
+ // Add AI response to history
97
+ messages.push(response1.choices[0].message);
98
+
99
+ // Add follow-up question
100
+ messages.push({ role: 'user', content: 'Can you show me a simple example?' });
101
+
102
+ // Second response (with context)
103
+ const response2 = await client.chat.completions.create({
104
+ model: 'gpt-4o',
105
+ messages: messages,
106
+ });
107
+
108
+ console.log("\nUser: Can you show me a simple example?");
109
+ console.log("AI: " + response2.choices[0].message.content);
110
+ console.log("\n");
111
+ }
112
+
113
+ // ============================================
114
+ // EXAMPLE 5: Streaming Responses
115
+ // ============================================
116
+ async function streamingExample() {
117
+ console.log("--- Example 5: Streaming Response ---");
118
+ console.log("AI: ");
119
+
120
+ const stream = await client.chat.completions.create({
121
+ model: 'gpt-4o',
122
+ messages: [
123
+ { role: 'user', content: 'Write a haiku about programming.' }
124
+ ],
125
+ stream: true,
126
+ });
127
+
128
+ for await (const chunk of stream) {
129
+ const content = chunk.choices[0]?.delta?.content || '';
130
+ process.stdout.write(content);
131
+ }
132
+
133
+ console.log("\n\n");
134
+ }
135
+
136
+ // ============================================
137
+ // EXAMPLE 6: Token Usage and Limits
138
+ // ============================================
139
+ async function tokenUsageExample() {
140
+ console.log("--- Example 6: Understanding Token Usage ---");
141
+
142
+ const response = await client.chat.completions.create({
143
+ model: 'gpt-4o',
144
+ messages: [
145
+ { role: 'user', content: 'Explain recursion in 3 sentences.' }
146
+ ],
147
+ max_tokens: 100,
148
+ });
149
+
150
+ console.log("AI: " + response.choices[0].message.content);
151
+ console.log("\nToken usage:");
152
+ console.log("- Prompt tokens: " + response.usage.prompt_tokens);
153
+ console.log("- Completion tokens: " + response.usage.completion_tokens);
154
+ console.log("- Total tokens: " + response.usage.total_tokens);
155
+ console.log("\n");
156
+ }
157
+
158
+ // ============================================
159
+ // EXAMPLE 7: Model Comparison
160
+ // ============================================
161
+ async function modelComparison() {
162
+ console.log("--- Example 7: Different Models ---");
163
+
164
+ const prompt = "What's 25 * 47?";
165
+
166
+ // GPT-4o - Most capable
167
+ const gpt4Response = await client.chat.completions.create({
168
+ model: 'gpt-4o',
169
+ messages: [{ role: 'user', content: prompt }],
170
+ });
171
+
172
+ // GPT-3.5-turbo - Faster and cheaper
173
+ const gpt35Response = await client.chat.completions.create({
174
+ model: 'gpt-3.5-turbo',
175
+ messages: [{ role: 'user', content: prompt }],
176
+ });
177
+
178
+ console.log("GPT-4o: " + gpt4Response.choices[0].message.content);
179
+ console.log("GPT-3.5-turbo: " + gpt35Response.choices[0].message.content);
180
+ console.log("\n");
181
+ }
182
+
183
+ // ============================================
184
+ // Run all examples
185
+ // ============================================
186
+ async function main() {
187
+ try {
188
+ await basicCompletion();
189
+ await systemPromptExample();
190
+ await temperatureExample();
191
+ await conversationContext();
192
+ await streamingExample();
193
+ await tokenUsageExample();
194
+ await modelComparison();
195
+
196
+ console.log("=== All examples completed! ===");
197
+ } catch (error) {
198
+ console.error("Error:", error.message);
199
+ if (error.message.includes('API key')) {
200
+ console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");
201
+ }
202
+ }
203
+ }
204
+
205
+ main();
examples/03_translation/CODE.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: translation.js
2
+
3
+ This file demonstrates how to use **system prompts** to specialize an AI agent for a specific task - in this case, professional German translation.
4
+
5
+ ## Step-by-Step Code Breakdown
6
+
7
+ ### 1. Import Required Modules
8
+ ```javascript
9
+ import {
10
+ getLlama, LlamaChatSession,
11
+ } from "node-llama-cpp";
12
+ import {fileURLToPath} from "url";
13
+ import path from "path";
14
+ ```
15
+ - Imports are the same as the intro example
16
+
17
+ ### 2. Initialize and Load Model
18
+ ```javascript
19
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
20
+
21
+ const llama = await getLlama();
22
+ const model = await llama.loadModel({
23
+ modelPath: path.join(
24
+ __dirname,
25
+ "../",
26
+ "models",
27
+ "hf_giladgd_Apertus-8B-Instruct-2509.Q6_K.gguf"
28
+ )
29
+ });
30
+ ```
31
+
32
+ #### Why Apertus-8B?
33
+ Apertus-8B is a multilingual language model specifically trained to support over 1,000 languages, with 40% of its training data in non-English languages. This makes it an excellent choice for translation tasks because:
34
+
35
+ 1. **Massive Multilingual Coverage**: The model was trained on 15 trillion tokens across 1,811 natively supported languages, including underrepresented languages like Swiss German and Romansh
36
+ 2. **Larger Size**: With 8 billion parameters, it's larger than the intro.js example, providing better understanding and output quality
37
+ 3. **Translation-Focused Training**: The model was explicitly designed for applications including translation systems
38
+ 4. **Q6_K Quantization**: 6-bit quantization provides a good balance between quality and file size
39
+
40
+ **Experiment suggestion**: Try swapping this model with others to compare translation quality! For example:
41
+ - Use a smaller 3B model to see how size affects translation accuracy
42
+ - Use a monolingual model to demonstrate why multilingual training matters
43
+ - Use a general-purpose model without translation-specific training
44
+
45
+ Read more about Apertus [arXiv](https://arxiv.org/abs/2509.14233)
46
+
47
+ ### 3. Create Context and Chat Session with System Prompt
48
+ ```javascript
49
+ const context = await model.createContext();
50
+ const session = new LlamaChatSession({
51
+ contextSequence: context.getSequence(),
52
+ systemPrompt: `Du bist ein erfahrener wissenschaftlicher Übersetzer...`
53
+ });
54
+ ```
55
+
56
+ **Key difference from intro.js**: The **systemPrompt**!
57
+
58
+ #### What is a System Prompt?
59
+ The system prompt defines the agent's role, behavior, and rules. It's like giving the AI a job description:
60
+
61
+ ```
62
+ ┌─────────────────────────────────────┐
63
+ │ System Prompt │
64
+ │ "You are a professional translator"│
65
+ │ + Detailed instructions │
66
+ │ + Rules to follow │
67
+ └─────────────────────────────────────┘
68
+
69
+ Affects every response
70
+ ```
71
+
72
+ ### 4. The System Prompt Breakdown
73
+
74
+ The system prompt (in German) tells the model:
75
+
76
+ **Role:**
77
+ ```
78
+ "Du bist ein erfahrener wissenschaftlicher Übersetzer für technische Texte
79
+ aus dem Englischen ins Deutsche."
80
+ ```
81
+ Translation: "You are an experienced scientific translator for technical texts from English to German."
82
+
83
+ **Task:**
84
+ ```
85
+ "Deine Aufgabe: Erstelle eine inhaltlich exakte Übersetzung..."
86
+ ```
87
+ Translation: "Your task: Create a content-accurate translation that maintains full meaning and technical precision."
88
+
89
+ **Rules (Lines 33-41):**
90
+ 1. Preserve every technical statement exactly
91
+ 2. Use idiomatic, fluent German
92
+ 3. Avoid literal sentence structures
93
+ 4. Use correct terminology (e.g., "Multi-Agenten-System")
94
+ 5. Use German typography for numbers (e.g., "54 %")
95
+ 6. Adapt compound terms to German grammar
96
+ 7. Shorten overly complex sentences while preserving meaning
97
+ 8. Use neutral, scientific style
98
+
99
+ **Critical Instruction (Line 48):**
100
+ ```
101
+ "DO NOT add any addition text or explanation. ONLY respond with the translated text"
102
+ ```
103
+ - Forces the model to return ONLY the translation
104
+ - No "Here's the translation:" prefix
105
+ - No explanations or commentary
106
+
107
+ ### 5. The Translation Query
108
+ ```javascript
109
+ const q1 = `Translate this text into german:
110
+
111
+ We address the long-horizon gap in large language model (LLM) agents by en-
112
+ abling them to sustain coherent strategies in adversarial, stochastic environments.
113
+ ...
114
+ `;
115
+ ```
116
+ - Contains a scientific abstract about LLM agents (HexMachina paper)
117
+ - Complex technical content with specialized terms
118
+ - Tests the model's ability to:
119
+ - Understand technical AI/ML concepts
120
+ - Translate accurately
121
+ - Follow the detailed system prompt rules
122
+
123
+ ### 6. Execute Translation
124
+ ```javascript
125
+ const a1 = await session.prompt(q1);
126
+ console.log("AI: " + a1);
127
+ ```
128
+ - Sends the translation request to the model
129
+ - The model will:
130
+ 1. Read the system prompt (its "role")
131
+ 2. Read the user's request
132
+ 3. Apply all the rules from the system prompt
133
+ 4. Generate a German translation
134
+
135
+ ### 7. Cleanup
136
+ ```javascript
137
+ session.dispose()
138
+ context.dispose()
139
+ model.dispose()
140
+ llama.dispose()
141
+ ```
142
+ - Same cleanup as intro.js
143
+ - Always dispose resources when done
144
+
145
+ ## Key Concepts Demonstrated
146
+
147
+ ### 1. System Prompts for Specialization
148
+ System prompts transform a general-purpose LLM into a specialized agent:
149
+
150
+ ```
151
+ General LLM + System Prompt = Specialized Agent
152
+ (Translator, Coder, Analyst, etc.)
153
+ ```
154
+
155
+ ### 2. Detailed Instructions Matter
156
+ Compare these approaches:
157
+
158
+ **❌ Minimal approach:**
159
+ ```javascript
160
+ systemPrompt: "Translate to German"
161
+ ```
162
+
163
+ **✅ This example (detailed):**
164
+ ```javascript
165
+ systemPrompt: `
166
+ You are a professional translator
167
+ Follow these rules:
168
+ - Rule 1
169
+ - Rule 2
170
+ - Rule 3
171
+ ...
172
+ `
173
+ ```
174
+
175
+ The detailed approach gives much better, more consistent results.
176
+
177
+ ### 3. Constraining Output Format
178
+ The line "DO NOT add any addition text" demonstrates output control:
179
+
180
+ **Without constraint:**
181
+ ```
182
+ AI: Here's the translation of the text you provided:
183
+
184
+ [German text]
185
+
186
+ I hope this helps! Let me know if you need anything else.
187
+ ```
188
+
189
+ **With constraint:**
190
+ ```
191
+ AI: [German text only]
192
+ ```
193
+
194
+ ## What Makes This an "Agent"?
195
+
196
+ This is a **specialized agent** because:
197
+
198
+ 1. **Specific Role**: Has a defined purpose (translation)
199
+ 2. **Constrained Behavior**: Follows specific rules and guidelines
200
+ 3. **Consistent Output**: Produces predictable, formatted results
201
+ 4. **Domain Expertise**: Optimized for scientific/technical content
202
+
203
+ ## Expected Output
204
+
205
+ When run, you'll see a German translation of the English abstract, following all the rules:
206
+ - Proper German scientific style
207
+ - Correct technical terminology
208
+ - German number formatting (e.g., "54 %")
209
+ - No extra commentary
210
+
211
+ The quality depends on the model's training and size.
212
+
213
+ ## Experimentation Ideas
214
+
215
+ 1. **Try different models**:
216
+ - Swap Apertus-8B with a smaller model (3B) to see size impact
217
+ - Try a monolingual English model to demonstrate the importance of multilingual training
218
+ - Use models with different quantization levels (Q4, Q6, Q8) to compare quality vs. size
219
+
220
+ 2. **Modify the system prompt**:
221
+ - Remove specific rules one by one to see their impact
222
+ - Change the translation target language
223
+ - Adjust the style (formal vs. casual)
224
+
225
+ 3. **Test with different content**:
226
+ - Technical documentation
227
+ - Creative writing
228
+ - Business communications
229
+ - Simple vs. complex sentences
230
+
231
+ Each experiment will help you understand how system prompts, model selection, and prompt engineering work together to create effective AI agents.
examples/03_translation/CONCEPT.md ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: System Prompts & Agent Specialization
2
+
3
+ ## Overview
4
+
5
+ This example demonstrates how to transform a general-purpose LLM into a **specialized agent** using **system prompts**. The key insight: you don't need different models for different tasks—you need different instructions.
6
+
7
+ ## What is a System Prompt?
8
+
9
+ A **system prompt** is a persistent instruction that shapes the AI's behavior for an entire conversation session.
10
+
11
+ ### Analogy
12
+ Think of hiring someone for a job:
13
+
14
+ ```
15
+ Without System Prompt With System Prompt
16
+ ───────────────────── ──────────────────────
17
+ "Hi, I'm an AI." "I'm a professional translator
18
+ with expertise in scientific
19
+ What do you want?" German. I follow strict quality
20
+ guidelines and output format."
21
+ ```
22
+
23
+ ## How System Prompts Work
24
+
25
+ ### The Context Structure
26
+
27
+ ```
28
+ ┌─────────────────────────────────────────────┐
29
+ │ CONTEXT WINDOW │
30
+ │ │
31
+ │ ┌───────────────────────────────────────┐ │
32
+ │ │ SYSTEM PROMPT (Always present) │ │
33
+ │ │ "You are a professional translator..." │
34
+ │ │ "Follow these rules..." │ │
35
+ │ └───────────────────────────────────────┘ │
36
+ │ ↓ │
37
+ │ ┌───────────────────────────────────────┐ │
38
+ │ │ USER MESSAGES │ │
39
+ │ │ "Translate this text..." │ │
40
+ │ └───────────────────────────────────────┘ │
41
+ │ ↓ │
42
+ │ ┌───────────────────────────────────────┐ │
43
+ │ │ AI RESPONSES │ │
44
+ │ │ (Shaped by system prompt) │ │
45
+ │ └───────────────────────────────────────┘ │
46
+ └─────────────────────────────────────────────┘
47
+ ```
48
+
49
+ The system prompt sits at the top of the context and influences **every** response.
50
+
51
+ ## Agent Specialization Pattern
52
+
53
+ ### Transformation Flow
54
+
55
+ ```
56
+ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────────┐
57
+ │ General Model │ + │ System Prompt │ = │ Specialized Agent│
58
+ │ │ │ │ │ │
59
+ │ • Knows many │ │ • Define role │ │ • Translation │
60
+ │ things │ │ • Set rules │ │ Agent │
61
+ │ • No specific │ │ • Constrain │ │ • Coding Agent │
62
+ │ role │ │ output │ │ • Analysis Agent │
63
+ └──────────────────┘ └─────────────────┘ └──────────────────┘
64
+ ```
65
+
66
+ ### Example Specializations
67
+
68
+ **Translation Agent (this example):**
69
+ ```
70
+ System Prompt = Role + Rules + Output Format
71
+ ```
72
+
73
+ **Code Assistant:**
74
+ ```javascript
75
+ systemPrompt: "You are an expert programmer.
76
+ Always provide working code with comments.
77
+ Explain complex logic."
78
+ ```
79
+
80
+ **Data Analyst:**
81
+ ```javascript
82
+ systemPrompt: "You are a data analyst.
83
+ Always show your calculations step-by-step.
84
+ Cite data sources when available."
85
+ ```
86
+
87
+ ## Anatomy of an Effective System Prompt
88
+
89
+ ### The 5 Components
90
+
91
+ ```
92
+ ┌─────────────────────────────────────────┐
93
+ │ 1. ROLE DEFINITION │
94
+ │ "You are a [specific role]..." │
95
+ ├─────────────────────────────────────────┤
96
+ │ 2. TASK DESCRIPTION │
97
+ │ "Your goal is to..." │
98
+ ├─────────────────────────────────────────┤
99
+ │ 3. BEHAVIORAL RULES │
100
+ │ "Always do X, Never do Y..." │
101
+ ├─────────────��───────────────────────────┤
102
+ │ 4. OUTPUT FORMAT │
103
+ │ "Format your response as..." │
104
+ ├─────────────────────────────────────────┤
105
+ │ 5. CONSTRAINTS │
106
+ │ "Do NOT include..." │
107
+ └─────────────────────────────────────────┘
108
+ ```
109
+
110
+ ### This Example's Structure
111
+
112
+ ```
113
+ Role: "Professional scientific translator"
114
+ Task: "Translate English to German with precision"
115
+ Rules: 8 specific translation guidelines
116
+ Format: Idiomatic German, scientific style
117
+ Constraints: "ONLY translated text, no explanation"
118
+ ```
119
+
120
+ ## Why Detailed System Prompts Matter
121
+
122
+ ### Comparison Study
123
+
124
+ **Minimal System Prompt:**
125
+ ```javascript
126
+ systemPrompt: "Translate to German"
127
+ ```
128
+
129
+ **Result:**
130
+ - May add unnecessary explanations
131
+ - Inconsistent terminology
132
+ - Mixed formality levels
133
+ - Extra conversational text
134
+
135
+ **Detailed System Prompt (this example):**
136
+ ```javascript
137
+ systemPrompt: `You are a professional translator...
138
+ - Rule 1: Preserve technical accuracy
139
+ - Rule 2: Use idiomatic German
140
+ - Rule 3: Follow scientific conventions
141
+ ...
142
+ DO NOT add any explanations`
143
+ ```
144
+
145
+ **Result:**
146
+ - ✅ Consistent quality
147
+ - ✅ Correct terminology
148
+ - ✅ Proper formatting
149
+ - ✅ Only translation output
150
+
151
+ ### Quality Impact
152
+
153
+ ```
154
+ Detail Level Output Quality
155
+ ─────────── ─────────────────
156
+ Very minimal → Unpredictable
157
+ Basic role → Somewhat consistent
158
+ Detailed → Highly consistent ⭐
159
+ Over-detailed → May confuse model
160
+ ```
161
+
162
+ ## System Prompt Design Patterns
163
+
164
+ ### Pattern 1: Role-Playing
165
+ ```
166
+ "You are a [profession] with expertise in [domain]..."
167
+ ```
168
+ Makes the model adopt that perspective.
169
+
170
+ ### Pattern 2: Rule-Based
171
+ ```
172
+ "Follow these rules:
173
+ 1. Always...
174
+ 2. Never...
175
+ 3. When X, do Y..."
176
+ ```
177
+ Explicit constraints lead to predictable behavior.
178
+
179
+ ### Pattern 3: Output Formatting
180
+ ```
181
+ "Format your response as:
182
+ - JSON
183
+ - Markdown
184
+ - Plain text only
185
+ - Step-by-step list"
186
+ ```
187
+ Controls the structure of responses.
188
+
189
+ ### Pattern 4: Contextual Awareness
190
+ ```
191
+ "You remember: [previous facts]
192
+ You know that: [domain knowledge]
193
+ Current situation: [context]"
194
+ ```
195
+ Primes the model with relevant information.
196
+
197
+ ## How This Relates to AI Agents
198
+
199
+ ### Agent = Model + System Prompt + Tools
200
+
201
+ ```
202
+ ┌────────────────────────────────────────────┐
203
+ │ AI Agent │
204
+ │ │
205
+ │ ┌──────────────────────────────────────┐ │
206
+ │ │ System Prompt (Agent's "Identity") │ │
207
+ │ └──────────────────────────────────────┘ │
208
+ │ ↓ │
209
+ │ ┌──────────────────────────────────────┐ │
210
+ │ │ LLM (Agent's "Brain") │ │
211
+ │ └──────────────────────────────────────┘ │
212
+ │ ↓ │
213
+ │ ┌──────────────────────────────────────┐ │
214
+ │ │ Tools (Agent's "Hands") [Optional] │ │
215
+ │ └──────────────────────────────────────┘ │
216
+ └────────────────────────────────────────────┘
217
+ ```
218
+
219
+ **In this example:**
220
+ - System Prompt: "You are a translator..."
221
+ - LLM: Apertus-8B model
222
+ - Tools: None (translation is done by the model itself)
223
+
224
+ **In more complex agents:**
225
+ - System Prompt: "You are a research assistant..."
226
+ - LLM: Any model
227
+ - Tools: Web search, calculator, file access, etc.
228
+
229
+ ## Practical Applications
230
+
231
+ ### 1. Domain Specialization
232
+ ```
233
+ Medical → "You are a medical professional..."
234
+ Legal → "You are a legal expert..."
235
+ Technical → "You are an engineer..."
236
+ ```
237
+
238
+ ### 2. Output Control
239
+ ```
240
+ JSON API → "Always respond in valid JSON"
241
+ Markdown → "Format all responses as markdown"
242
+ Code → "Only output executable code"
243
+ ```
244
+
245
+ ### 3. Behavioral Constraints
246
+ ```
247
+ Concise → "Use maximum 2 sentences"
248
+ Detailed → "Explain thoroughly with examples"
249
+ Neutral → "Avoid opinions, state only facts"
250
+ ```
251
+
252
+ ### 4. Multi-Language Support
253
+ ```
254
+ systemPrompt: `You are a multilingual assistant.
255
+ Respond in the same language as the input.`
256
+ ```
257
+
258
+ ## Chat Wrappers Explained
259
+
260
+ Different models need different conversation formats:
261
+
262
+ ```
263
+ Model Type Format Needed Wrapper
264
+ ────────────── ─────────────────── ─────────────────
265
+ Llama 2/3 Llama format LlamaChatWrapper
266
+ GPT-style ChatML format ChatMLWrapper
267
+ Harmony models Harmony format HarmonyChatWrapper
268
+ ```
269
+
270
+ **What they do:**
271
+ ```
272
+ Your Message → [Chat Wrapper] → Formatted Prompt → Model
273
+
274
+ Adds special tokens:
275
+ <|system|>, <|user|>, <|assistant|>
276
+ ```
277
+
278
+ The wrapper ensures the model understands which part is the system prompt, which is the user message, etc.
279
+
280
+ ## Key Takeaways
281
+
282
+ 1. **System prompts are powerful**: They fundamentally change how the model behaves
283
+ 2. **Detailed is better**: More specific instructions = more consistent results
284
+ 3. **Structure matters**: Role + Rules + Format + Constraints
285
+ 4. **No retraining needed**: Same model, different behaviors
286
+ 5. **Foundation for agents**: System prompts are the first step in building specialized agents
287
+
288
+ ## Evolution Path
289
+
290
+ ```
291
+ 1. Basic Prompting (intro.js)
292
+
293
+ 2. System Prompts (translation.js) ← You are here
294
+
295
+ 3. System Prompts + Tools (simple-agent.js)
296
+
297
+ 4. Multi-turn reasoning (react-agent.js)
298
+
299
+ 5. Full Agent Systems
300
+ ```
301
+
302
+ This example bridges the gap between basic LLM usage and true agent behavior by showing how to specialize through instructions.
examples/03_translation/translation.js ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {
2
+ getLlama,
3
+ LlamaChatSession,
4
+ } from "node-llama-cpp";
5
+ import {fileURLToPath} from "url";
6
+ import path from "path";
7
+
8
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
9
+
10
+ const llama = await getLlama({
11
+ logLevel: 'error'
12
+ });
13
+ const model = await llama.loadModel({
14
+ modelPath: path.join(
15
+ __dirname,
16
+ '..',
17
+ '..',
18
+ 'models',
19
+ 'hf_giladgd_Apertus-8B-Instruct-2509.Q6_K.gguf'
20
+ )
21
+ });
22
+
23
+ const context = await model.createContext();
24
+ const session = new LlamaChatSession({
25
+ contextSequence: context.getSequence(),
26
+ systemPrompt: `Du bist ein erfahrener wissenschaftlicher Übersetzer für technische Texte aus dem Englischen ins
27
+ Deutsche.
28
+
29
+ Deine Aufgabe: Erstelle eine inhaltlich exakte Übersetzung, die den vollen Sinn und die technische Präzision
30
+ des Originaltexts erhält.
31
+
32
+ Gleichzeitig soll die Übersetzung klar, natürlich und leicht lesbar auf Deutsch klingen – also so, wie ein
33
+ deutscher Wissenschaftler oder Ingenieur denselben Text schreiben würde.
34
+
35
+ Befolge diese Regeln:
36
+ Bewahre jede fachliche Aussage und Nuance exakt. Kein Inhalt darf verloren gehen oder verändert werden.
37
+ Verwende idiomatisches, flüssiges Deutsch, wie es in wissenschaftlichen Abstracts (z. B. NeurIPS, ICLR, AAAI) üblich ist.
38
+ Vermeide wörtliche Satzstrukturen. Formuliere so, wie ein deutscher Wissenschaftler denselben Inhalt selbst schreiben würde.
39
+ Verwende korrekte Terminologie (z. B. Multi-Agenten-System, Adapterlayer, Baseline, Strategieverbesserung).
40
+ Verwende bei Zahlen, Einheiten und Prozentangaben deutsche Typografie (z. B. „54 %“, „3 m“, „2 000“).
41
+ Passe zusammengesetzte Begriffe an die deutsche Grammatik an (z. B. „kontinuierlich lernendes System“ statt „kontinuierliches Lernen System“).
42
+ Kürze lange oder verschachtelte Sätze behutsam, ohne Bedeutung zu verändern, um Lesbarkeit zu verbessern.
43
+ Verwende einen neutralen, wissenschaftlichen Stil, ohne Werbesprache oder unnötige Ausschmückung.
44
+
45
+ Zusatzinstruktion:
46
+ Wenn der Originaltext englische Satzlogik enthält, restrukturiere den Satz so, dass er auf Deutsch elegant und klar klingt, aber denselben Inhalt vermittelt.
47
+
48
+ Zielqualität: Eine Übersetzung, die sich wie ein Originaltext liest – technisch präzise, flüssig und grammatikalisch einwandfrei.
49
+
50
+ DO NOT add any addition text or explanation. ONLY respond with the translated text
51
+ `
52
+ });
53
+
54
+ const q1 = `Translate this text into german:
55
+
56
+ We address the long-horizon gap in large language model (LLM) agents by en-
57
+ abling them to sustain coherent strategies in adversarial, stochastic environments.
58
+ Settlers of Catan provides a challenging benchmark: success depends on balanc-
59
+ ing short- and long-term goals amid randomness, trading, expansion, and block-
60
+ ing. Prompt-centric LLM agents (e.g., ReAct, Reflexion) must re-interpret large,
61
+ evolving game states each turn, quickly saturating context windows and losing
62
+ strategic consistency. We propose HexMachina, a continual learning multi-agent
63
+ system that separates environment discovery (inducing an adapter layer without
64
+ documentation) from strategy improvement (evolving a compiled player through
65
+ code refinement and simulation). This design preserves executable artifacts, al-
66
+ lowing the LLM to focus on high-level strategy rather than per-turn reasoning. In
67
+ controlled Catanatron experiments, HexMachina learns from scratch and evolves
68
+ players that outperform the strongest human-crafted baseline (AlphaBeta), achiev-
69
+ ing a 54% win rate and surpassing prompt-driven and no-discovery baselines. Ab-
70
+ lations confirm that isolating pure strategy learning improves performance. Over-
71
+ all, artifact-centric continual learning transforms LLMs from brittle stepwise de-
72
+ ciders into stable strategy designers, advancing long-horizon autonomy.
73
+ `;
74
+
75
+ console.log('Translation started...')
76
+ const a1 = await session.prompt(q1);
77
+ console.log("AI: " + a1);
78
+
79
+ session.dispose()
80
+ context.dispose()
81
+ model.dispose()
82
+ llama.dispose()
examples/04_think/CODE.md ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: think.js
2
+
3
+ This file demonstrates using system prompts for **logical reasoning** and **quantitative problem-solving**, showing how to configure an LLM as a specialized reasoning agent.
4
+
5
+ ## Step-by-Step Code Breakdown
6
+
7
+ ### 1. Import and Setup (Lines 1-8)
8
+ ```javascript
9
+ import {
10
+ getLlama,
11
+ LlamaChatSession,
12
+ } from "node-llama-cpp";
13
+ import {fileURLToPath} from "url";
14
+ import path from "path";
15
+
16
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
17
+ ```
18
+ - Standard imports for LLM interaction
19
+ - Path setup for locating the model file
20
+
21
+ ### 2. Initialize and Load Model (Lines 10-18)
22
+ ```javascript
23
+ const llama = await getLlama();
24
+ const model = await llama.loadModel({
25
+ modelPath: path.join(
26
+ __dirname,
27
+ "../",
28
+ "models",
29
+ "Qwen3-1.7B-Q6_K.gguf"
30
+ )
31
+ });
32
+ ```
33
+ - Uses **Qwen3-1.7B-Q6_K**: A 1.7B parameter model with 6-bit quantization
34
+ - Smaller than the translation example (1.7B vs 8B parameters)
35
+ - Q6_K quantization provides a balance between size and quality
36
+
37
+ ### 3. Define the System Prompt (Lines 19-24)
38
+ ```javascript
39
+ const systemPrompt = `You are an expert logical and quantitative reasoner.
40
+ Your goal is to analyze real-world word problems involving families, quantities, averages, and relationships
41
+ between entities, and compute the exact numeric answer.
42
+
43
+ Goal: Return the correct final number as a single value — no explanation, no reasoning steps, just the answer.
44
+ `
45
+ ```
46
+
47
+ **Key elements:**
48
+
49
+ 1. **Role**: "expert logical and quantitative reasoner"
50
+ - Sets expectations for mathematical/analytical thinking
51
+
52
+ 2. **Task Scope**: "real-world word problems involving families, quantities, averages, and relationships"
53
+ - Tells the model what type of problems to expect
54
+ - Primes it for complex counting and calculation tasks
55
+
56
+ 3. **Output Constraint**: "Return the correct final number as a single value — no explanation"
57
+ - Forces concise output
58
+ - Just the answer, not the work
59
+
60
+ ### Why This System Prompt Design?
61
+
62
+ The prompt is designed for the specific problem type:
63
+ - Word problems with complex family relationships
64
+ - Multiple nested conditions
65
+ - Requires careful tracking of people and quantities
66
+ - Needs arithmetic calculation
67
+
68
+ ### 4. Create Context and Session (Lines 25-29)
69
+ ```javascript
70
+ const context = await model.createContext();
71
+ const session = new LlamaChatSession({
72
+ contextSequence: context.getSequence(),
73
+ systemPrompt
74
+ });
75
+ ```
76
+ - Creates context for the conversation
77
+ - Initializes session with the reasoning system prompt
78
+ - No chat wrapper needed (using model's default format)
79
+
80
+ ### 5. The Complex Word Problem (Lines 31-40)
81
+ ```javascript
82
+ const prompt = `My family reunion is this week, and I was assigned the mashed potatoes to bring.
83
+ The attendees include my married mother and father, my twin brother and his family, my aunt and her family, my grandma
84
+ and her brother, her brother's daughter, and his daughter's family. All the adults but me have been married, and no one
85
+ is divorced or remarried, but my grandpa and my grandma's sister-in-law passed away last year. All living spouses are attending.
86
+ My brother has two children that are still kids, my aunt has one six-year-old, and my grandma's brother's daughter has
87
+ three kids under 12. I figure each adult will eat about 1.5 potatoes and each kid will eat about 1/2 a potato, except my
88
+ second cousins don't eat carbs. The average potato is about half a pound, and potatoes are sold in 5-pound bags.
89
+
90
+ How many whole bags of potatoes do I need?
91
+ `;
92
+ ```
93
+
94
+ **This is intentionally complex to test reasoning:**
95
+
96
+ **People to count:**
97
+ - Speaker (1)
98
+ - Mother and father (2)
99
+ - Twin brother + spouse (2)
100
+ - Brother's 2 kids (2)
101
+ - Aunt + spouse (2)
102
+ - Aunt's 1 kid (1)
103
+ - Grandma (1)
104
+ - Grandma's brother + spouse (2)
105
+ - Brother's daughter + spouse (2)
106
+ - Their 3 kids (3, but don't eat carbs)
107
+
108
+ **Calculations needed:**
109
+ 1. Count total adults
110
+ 2. Count total kids
111
+ 3. Subtract non-eating kids
112
+ 4. Calculate potato needs: (adults × 1.5) + (eating kids × 0.5)
113
+ 5. Convert to pounds: total potatoes × 0.5 lbs
114
+ 6. Convert to bags: pounds ÷ 5, round up
115
+
116
+ **The complexity:**
117
+ - Family relationships (who's married to whom)
118
+ - Deceased people (subtract from count)
119
+ - Special dietary needs (second cousins don't eat carbs)
120
+ - Unit conversions (potatoes → pounds → bags)
121
+
122
+ ### 6. Execute and Display (Lines 42-43)
123
+ ```javascript
124
+ const answer = await session.prompt(prompt);
125
+ console.log(`AI: ${answer}`);
126
+ ```
127
+ - Sends the complex problem to the model
128
+ - The model uses its reasoning abilities to work through the problem
129
+ - Outputs just the final number (based on system prompt)
130
+
131
+ ### 7. Cleanup (Lines 45-48)
132
+ ```javascript
133
+ session.dispose()
134
+ context.dispose()
135
+ model.dispose()
136
+ llama.dispose()
137
+ ```
138
+ - Standard resource cleanup
139
+
140
+ ## Key Concepts Demonstrated
141
+
142
+ ### 1. Reasoning Agent Configuration
143
+ This shows how to configure an LLM for analytical thinking:
144
+
145
+ ```
146
+ System Prompt → LLM becomes a "reasoning engine"
147
+ ```
148
+
149
+ Instead of conversational AI, we get:
150
+ - Focused analytical processing
151
+ - Mathematical computation
152
+ - Logical deduction
153
+
154
+ ### 2. Output Format Control
155
+ Compare these approaches:
156
+
157
+ **Without constraint:**
158
+ ```
159
+ AI: Let me work through this step by step.
160
+ First, I'll count the adults...
161
+ [lengthy explanation]
162
+ So the answer is 3 bags.
163
+ ```
164
+
165
+ **With constraint (this example):**
166
+ ```
167
+ AI: 3
168
+ ```
169
+
170
+ ### 3. Problem Complexity Testing
171
+ This example tests the model's ability to:
172
+ - Parse complex natural language
173
+ - Track multiple entities and relationships
174
+ - Apply arithmetic operations
175
+ - Handle edge cases (deceased people, dietary restrictions)
176
+ - Perform unit conversions
177
+
178
+ ### 4. Specialized Task Agents
179
+ This demonstrates creating task-specific agents:
180
+
181
+ ```
182
+ General LLM + "Reasoning Agent" System Prompt = Math Problem Solver
183
+ ```
184
+
185
+ Same pattern works for:
186
+ - Logic puzzles
187
+ - Data analysis
188
+ - Scientific calculations
189
+ - Statistical reasoning
190
+
191
+ ## Challenges & Limitations
192
+
193
+ ### 1. Model Size Matters
194
+ The 1.7B parameter model may struggle with:
195
+ - Very complex counting problems
196
+ - Multi-step reasoning requiring working memory
197
+ - Edge cases in the problem
198
+
199
+ Larger models (7B, 13B+) generally perform better on reasoning tasks.
200
+
201
+ ### 2. Hidden Reasoning
202
+ The system prompt asks for "just the answer," so we don't see:
203
+ - The model's reasoning process
204
+ - Where it might have made mistakes
205
+ - Its confidence level
206
+
207
+ ### 3. No Tool Use
208
+ The model must do all calculations "in its head" without:
209
+ - A calculator
210
+ - Note-taking
211
+ - Step-by-step verification
212
+
213
+ Later examples (like react-agent) address this by giving the model tools.
214
+
215
+ ## Why This Matters for AI Agents
216
+
217
+ ### Reasoning is Fundamental
218
+ All useful agents need reasoning capabilities:
219
+ - **Planning agents**: Reason about sequences of actions
220
+ - **Research agents**: Analyze and synthesize information
221
+ - **Decision agents**: Evaluate options and consequences
222
+
223
+ ### System Prompt Shapes Behavior
224
+ This example shows that the same model can behave differently based on instructions:
225
+ - Translator agent (previous example)
226
+ - Reasoning agent (this example)
227
+ - Code agent (later examples)
228
+
229
+ ### Foundation for Complex Agents
230
+ Understanding how to prompt for reasoning is essential before adding:
231
+ - Tools (giving the model a calculator)
232
+ - Memory (remembering previous calculations)
233
+ - Multi-step processes (ReAct pattern)
234
+
235
+ ## Expected Output
236
+
237
+ Running this script should output something like:
238
+ ```
239
+ AI: 3
240
+ ```
241
+
242
+ The exact answer depends on the model's ability to:
243
+ - Correctly count all family members
244
+ - Apply the eating rates
245
+ - Convert units
246
+ - Round up for whole bags
247
+
248
+ ## Improving This Approach
249
+
250
+ To get better reasoning:
251
+ 1. **Use larger models**: 7B+ parameters
252
+ 2. **Add step-by-step prompting**: "Show your work"
253
+ 3. **Provide tools**: Give the model a calculator
254
+ 4. **Use chain-of-thought**: Encourage explicit reasoning
255
+ 5. **Verify answers**: Run multiple times or use multiple models
256
+
257
+ The react-agent example demonstrates some of these improvements.
examples/04_think/CONCEPT.md ADDED
@@ -0,0 +1,368 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Reasoning & Problem-Solving Agents
2
+
3
+ ## Overview
4
+
5
+ This example demonstrates how to configure an LLM as a **reasoning agent** capable of analytical thinking and quantitative problem-solving. It shows the bridge between simple text generation and complex cognitive tasks.
6
+
7
+ ## What is a Reasoning Agent?
8
+
9
+ A **reasoning agent** is an LLM configured to perform logical analysis, mathematical computation, and multi-step problem-solving through careful system prompt design.
10
+
11
+ ### Human Analogy
12
+
13
+ ```
14
+ Regular Chat Reasoning Agent
15
+ ───────────── ──────────────────
16
+ "Can you help me?" "I am a mathematician.
17
+ "Sure! What do you need?" I analyze problems methodically
18
+ and compute exact answers."
19
+ ```
20
+
21
+ ## The Reasoning Challenge
22
+
23
+ ### Why Reasoning is Hard for LLMs
24
+
25
+ LLMs are trained on text prediction, not explicit reasoning:
26
+
27
+ ```
28
+ ┌───────────────────────────────────────┐
29
+ │ LLM Training │
30
+ │ "Predict next word in text" │
31
+ │ │
32
+ │ NOT explicitly trained for: │
33
+ │ • Step-by-step logic │
34
+ │ • Arithmetic computation │
35
+ │ • Tracking multiple variables │
36
+ │ • Systematic problem decomposition │
37
+ └───────────────────────────────────────┘
38
+ ```
39
+
40
+ However, they can learn reasoning patterns from training data and be guided by system prompts.
41
+
42
+ ## Reasoning Through System Prompts
43
+
44
+ ### Configuration Pattern
45
+
46
+ ```
47
+ ┌─────────────────────────────────────────┐
48
+ │ System Prompt Components │
49
+ ├─────────────────────────────────────────┤
50
+ │ 1. Role: "Expert reasoner" │
51
+ │ 2. Task: "Analyze and solve problems" │
52
+ │ 3. Method: "Compute exact answers" │
53
+ │ 4. Output: "Single numeric value" │
54
+ └─────────────────────────────────────────┘
55
+
56
+ Reasoning Behavior
57
+ ```
58
+
59
+ ### Types of Reasoning Tasks
60
+
61
+ **Quantitative Reasoning (this example):**
62
+ ```
63
+ Problem → Count entities → Calculate → Convert units → Answer
64
+ ```
65
+
66
+ **Logical Reasoning:**
67
+ ```
68
+ Premises → Apply rules → Deduce conclusions → Answer
69
+ ```
70
+
71
+ **Analytical Reasoning:**
72
+ ```
73
+ Data → Identify patterns → Form hypothesis → Conclude
74
+ ```
75
+
76
+ ## How LLMs "Reason"
77
+
78
+ ### Pattern Matching vs. True Reasoning
79
+
80
+ LLMs don't reason like humans, but they can:
81
+
82
+ ```
83
+ ┌─────────────────────────────────────────────┐
84
+ │ What LLMs Actually Do │
85
+ │ │
86
+ │ 1. Pattern Recognition │
87
+ │ "This looks like a counting problem" │
88
+ │ │
89
+ │ 2. Template Application │
90
+ │ "Similar problems follow this pattern" │
91
+ │ │
92
+ │ 3. Statistical Inference │
93
+ │ "These numbers likely combine this way" │
94
+ │ │
95
+ │ 4. Learned Procedures │
96
+ │ "I've seen this type of calculation" │
97
+ └─────────────────────────────────────────────┘
98
+ ```
99
+
100
+ ### The Reasoning Process
101
+
102
+ ```
103
+ Input: Complex Word Problem
104
+
105
+ ┌────────────┐
106
+ │ Parse │ Identify entities and relationships
107
+ └────────────┘
108
+
109
+ ┌────────────┐
110
+ │ Decompose │ Break into sub-problems
111
+ └────────────┘
112
+
113
+ ┌────────────┐
114
+ │ Calculate │ Apply arithmetic operations
115
+ └────────────┘
116
+
117
+ ┌────────────┐
118
+ │ Synthesize│ Combine results
119
+ └────────────┘
120
+
121
+ Final Answer
122
+ ```
123
+
124
+ ## Problem Complexity Hierarchy
125
+
126
+ ### Levels of Reasoning Difficulty
127
+
128
+ ```
129
+ Easy Hard
130
+ │ │
131
+ │ Simple Multi-step Nested Implicit │
132
+ │ Arithmetic Logic Conditions Reasoning│
133
+ │ │
134
+ └─────────────────────────────────────────────┘
135
+
136
+ Examples:
137
+ Easy: "What is 5 + 3?"
138
+ Medium: "If 3 apples cost $2 each, what's the total?"
139
+ Hard: "Count family members with complex relationships"
140
+ ```
141
+
142
+ ### This Example's Complexity
143
+
144
+ The potato problem is **highly complex**:
145
+
146
+ ```
147
+ ┌─────────────────────────────────────────┐
148
+ │ Complexity Factors │
149
+ ├─────────────────────────────────────────┤
150
+ │ ✓ Multiple entities (15+ people) │
151
+ │ ✓ Relationship reasoning (family tree)│
152
+ │ ✓ Conditional logic (if married then..)│
153
+ │ ✓ Negative conditions (deceased people)│
154
+ │ ✓ Special cases (dietary restrictions)│
155
+ │ ✓ Multiple calculations │
156
+ │ ✓ Unit conversions │
157
+ └─────────────────────────────────────────┘
158
+ ```
159
+
160
+ ## Limitations of Pure LLM Reasoning
161
+
162
+ ### Why This Approach Has Issues
163
+
164
+ ```
165
+ ┌────────────────────────────────────┐
166
+ │ Problem: No External Tools │
167
+ │ │
168
+ │ LLM must hold everything in │
169
+ │ "mental" context: │
170
+ │ • All entity counts │
171
+ │ • Intermediate calculations │
172
+ │ • Conversion factors │
173
+ │ • Final arithmetic │
174
+ │ │
175
+ │ Result: Prone to errors │
176
+ └────────────────────────────────────┘
177
+ ```
178
+
179
+ ### Common Failure Modes
180
+
181
+ **1. Counting Errors:**
182
+ ```
183
+ Problem: "Count 15 people with complex relationships"
184
+ LLM: "14" or "16" (off by one)
185
+ ```
186
+
187
+ **2. Arithmetic Mistakes:**
188
+ ```
189
+ Problem: "13 adults × 1.5 + 3 kids × 0.5"
190
+ LLM: May get intermediate steps wrong
191
+ ```
192
+
193
+ **3. Lost Context:**
194
+ ```
195
+ Problem: Multi-step with many facts
196
+ LLM: Forgets earlier information
197
+ ```
198
+
199
+ ## Improving Reasoning: Evolution Path
200
+
201
+ ### Level 1: Pure Prompting (This Example)
202
+ ```
203
+ User → LLM → Answer
204
+
205
+ System Prompt
206
+ ```
207
+
208
+ **Limitations:**
209
+ - All reasoning internal to LLM
210
+ - No verification
211
+ - No tools
212
+ - Hidden process
213
+
214
+ ### Level 2: Chain-of-Thought
215
+ ```
216
+ User → LLM → Show Work → Answer
217
+
218
+ "Explain your reasoning"
219
+ ```
220
+
221
+ **Improvements:**
222
+ - Visible reasoning steps
223
+ - Can catch some errors
224
+ - Still no tools
225
+
226
+ ### Level 3: Tool-Augmented (simple-agent)
227
+ ```
228
+ User → LLM ⟷ Tools → Answer
229
+ ↑ (Calculator)
230
+ System Prompt
231
+ ```
232
+
233
+ **Improvements:**
234
+ - External computation
235
+ - Reduced errors
236
+ - Verifiable steps
237
+
238
+ ### Level 4: ReAct Pattern (react-agent)
239
+ ```
240
+ User → LLM → Think → Act → Observe
241
+ ↑ ↓ ↓ ↓
242
+ System Reason Tool Result
243
+ Prompt Use
244
+ ↑ ↓ ↓
245
+ └───────────Iterate──┘
246
+ ```
247
+
248
+ **Best approach:**
249
+ - Explicit reasoning loop
250
+ - Tool use at each step
251
+ - Self-correction possible
252
+
253
+ ## System Prompt Design for Reasoning
254
+
255
+ ### Key Elements
256
+
257
+ **1. Role Definition:**
258
+ ```
259
+ "You are an expert logical and quantitative reasoner"
260
+ ```
261
+ Sets the mental framework.
262
+
263
+ **2. Task Specification:**
264
+ ```
265
+ "Analyze real-world word problems involving..."
266
+ ```
267
+ Defines the problem domain.
268
+
269
+ **3. Output Format:**
270
+ ```
271
+ "Return the correct final number as a single value"
272
+ ```
273
+ Controls response structure.
274
+
275
+ ### Design Patterns
276
+
277
+ **Pattern A: Direct Answer (This Example)**
278
+ ```
279
+ Prompt: [Problem]
280
+ Output: [Number]
281
+ ```
282
+ Pros: Concise, fast
283
+ Cons: No insight into reasoning
284
+
285
+ **Pattern B: Show Work**
286
+ ```
287
+ Prompt: [Problem] "Show your steps"
288
+ Output: Step 1: ... Step 2: ... Answer: [Number]
289
+ ```
290
+ Pros: Transparent, debuggable
291
+ Cons: Longer, may still have errors
292
+
293
+ **Pattern C: Self-Verification**
294
+ ```
295
+ Prompt: [Problem] "Solve, then verify"
296
+ Output: Solution + Verification + Final Answer
297
+ ```
298
+ Pros: More reliable
299
+ Cons: Slower, uses more tokens
300
+
301
+ ## Real-World Applications
302
+
303
+ ### Use Cases for Reasoning Agents
304
+
305
+ **1. Data Analysis:**
306
+ ```
307
+ Input: Dataset summary
308
+ Task: Compute statistics, identify trends
309
+ Output: Numerical insights
310
+ ```
311
+
312
+ **2. Planning:**
313
+ ```
314
+ Input: Goal + constraints
315
+ Task: Reason about optimal sequence
316
+ Output: Action plan
317
+ ```
318
+
319
+ **3. Decision Support:**
320
+ ```
321
+ Input: Options + criteria
322
+ Task: Evaluate and compare
323
+ Output: Recommended choice
324
+ ```
325
+
326
+ **4. Problem Solving:**
327
+ ```
328
+ Input: Complex scenario
329
+ Task: Break down and solve
330
+ Output: Solution
331
+ ```
332
+
333
+ ## Comparison: Different Agent Types
334
+
335
+ ```
336
+ Reasoning Tools Memory Multi-turn
337
+ ───────── ───── ────── ──────────
338
+ intro.js ✗ ✗ ✗ ✗
339
+ translation.js ~ ✗ ✗ ✗
340
+ think.js (here) ✓ ✗ ✗ ✗
341
+ simple-agent.js ✓ ✓ ✗ ~
342
+ memory-agent.js ✓ ✓ ✓ ✓
343
+ react-agent.js ✓✓ ✓ ~ ✓
344
+ ```
345
+
346
+ Legend:
347
+ - ✗ = Not present
348
+ - ~ = Limited/implicit
349
+ - ✓ = Present
350
+ - ✓✓ = Advanced/explicit
351
+
352
+ ## Key Takeaways
353
+
354
+ 1. **System prompts enable reasoning**: Proper configuration transforms an LLM into a reasoning agent
355
+ 2. **Limitations exist**: Pure LLM reasoning is prone to errors on complex problems
356
+ 3. **Tools help**: External computation (calculators, etc.) improves accuracy
357
+ 4. **Iteration matters**: Multi-step reasoning patterns (like ReAct) work better
358
+ 5. **Transparency is valuable**: Seeing the reasoning process helps debug and verify
359
+
360
+ ## Next Steps
361
+
362
+ After understanding basic reasoning:
363
+ - **Add tools**: Let the agent use calculators, databases, APIs
364
+ - **Implement verification**: Check answers, retry on errors
365
+ - **Use chain-of-thought**: Make reasoning explicit
366
+ - **Apply ReAct pattern**: Combine reasoning and tool use systematically
367
+
368
+ This example is the foundation for more sophisticated agent architectures that combine reasoning with external capabilities.
examples/04_think/think.js ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {
2
+ getLlama,
3
+ LlamaChatSession,
4
+ } from "node-llama-cpp";
5
+ import {fileURLToPath} from "url";
6
+ import path from "path";
7
+
8
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
9
+
10
+ const llama = await getLlama();
11
+ const model = await llama.loadModel({
12
+ modelPath: path.join(
13
+ __dirname,
14
+ '..',
15
+ '..',
16
+ 'models',
17
+ 'Qwen3-1.7B-Q8_0.gguf'
18
+ )
19
+ });
20
+ const systemPrompt = `You are an expert logical and quantitative reasoner.
21
+ Your goal is to analyze real-world word problems involving families, quantities, averages, and relationships
22
+ between entities, and compute the exact numeric answer.
23
+
24
+ Goal: Return the correct final number as a single value — no explanation, no reasoning steps, just the answer.
25
+ `
26
+ const context = await model.createContext();
27
+ const session = new LlamaChatSession({
28
+ contextSequence: context.getSequence(),
29
+ systemPrompt
30
+ });
31
+
32
+ const prompt = `My family reunion is this week, and I was assigned the mashed potatoes to bring.
33
+ The attendees include my married mother and father, my twin brother and his family, my aunt and her family, my grandma
34
+ and her brother, her brother's daughter, and his daughter's family. All the adults but me have been married, and no one
35
+ is divorced or remarried, but my grandpa and my grandma's sister-in-law passed away last year. All living spouses are attending.
36
+ My brother has two children that are still kids, my aunt has one six-year-old, and my grandma's brother's daughter has
37
+ three kids under 12. I figure each adult will eat about 1.5 potatoes and each kid will eat about 1/2 a potato, except my
38
+ second cousins don't eat carbs. The average potato is about half a pound, and potatoes are sold in 5-pound bags.
39
+
40
+ How many whole bags of potatoes do I need?
41
+ `;
42
+
43
+ const answer = await session.prompt(prompt);
44
+ console.log(`AI: ${answer}`);
45
+
46
+ llama.dispose()
47
+ model.dispose()
48
+ context.dispose()
49
+ session.dispose()
examples/05_batch/CODE.md ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: batch.js
2
+
3
+ This file demonstrates **parallel execution** of multiple LLM prompts using separate context sequences, enabling concurrent processing for better performance.
4
+
5
+ ## Step-by-Step Code Breakdown
6
+
7
+ ### 1. Import and Setup (Lines 1-10)
8
+ ```javascript
9
+ import {getLlama, LlamaChatSession} from "node-llama-cpp";
10
+ import path from "path";
11
+ import {fileURLToPath} from "url";
12
+
13
+ /**
14
+ * Asynchronous execution improves performance in GAIA benchmarks,
15
+ * multi-agent applications, and other high-throughput scenarios.
16
+ */
17
+
18
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
19
+ ```
20
+ - Standard imports for LLM interaction
21
+ - Comment explains the performance benefit
22
+ - **GAIA benchmark**: A standard for testing AI agent performance
23
+ - Useful for multi-agent systems that need to handle many requests
24
+
25
+ ### 2. Model Path Configuration (Lines 11-16)
26
+ ```javascript
27
+ const modelPath = path.join(
28
+ __dirname,
29
+ "../",
30
+ "models",
31
+ "DeepSeek-R1-0528-Qwen3-8B-Q6_K.gguf"
32
+ )
33
+ ```
34
+ - Uses **DeepSeek-R1**: An 8B parameter model optimized for reasoning
35
+ - **Q6_K quantization**: Balance between quality and size
36
+ - Model is loaded once and shared between sequences
37
+
38
+ ### 3. Initialize Llama and Load Model (Lines 18-19)
39
+ ```javascript
40
+ const llama = await getLlama();
41
+ const model = await llama.loadModel({modelPath});
42
+ ```
43
+ - Standard initialization
44
+ - Model is loaded into memory once
45
+ - Will be used by multiple sequences simultaneously
46
+
47
+ ### 4. Create Context with Multiple Sequences (Lines 20-23)
48
+ ```javascript
49
+ const context = await model.createContext({
50
+ sequences: 2,
51
+ batchSize: 1024 // The number of tokens that can be processed at once by the GPU.
52
+ });
53
+ ```
54
+
55
+ **Key parameters:**
56
+
57
+ - **sequences: 2**: Creates 2 independent conversation sequences
58
+ - Each sequence has its own conversation history
59
+ - Both share the same model and context memory pool
60
+ - Can be processed in parallel
61
+
62
+ - **batchSize: 1024**: Maximum tokens processed per GPU batch
63
+ - Larger = better GPU utilization
64
+ - Smaller = lower memory usage
65
+ - 1024 is a good balance for most GPUs
66
+
67
+ ### Why Multiple Sequences?
68
+
69
+ ```
70
+ Single Sequence (Sequential) Multiple Sequences (Parallel)
71
+ ───────────────────────── ──────────────────────────────
72
+ Process Prompt 1 → Response 1 Process Prompt 1 ──┐
73
+ Wait... ├→ Both responses
74
+ Process Prompt 2 → Response 2 Process Prompt 2 ──┘ in parallel!
75
+
76
+ Total Time: T1 + T2 Total Time: max(T1, T2)
77
+ ```
78
+
79
+ ### 5. Get Individual Sequences (Lines 25-26)
80
+ ```javascript
81
+ const sequence1 = context.getSequence();
82
+ const sequence2 = context.getSequence();
83
+ ```
84
+ - Retrieves two separate sequence objects from the context
85
+ - Each sequence maintains its own state
86
+ - They can be used independently for different conversations
87
+
88
+ ### 6. Create Separate Sessions (Lines 28-33)
89
+ ```javascript
90
+ const session1 = new LlamaChatSession({
91
+ contextSequence: sequence1
92
+ });
93
+ const session2 = new LlamaChatSession({
94
+ contextSequence: sequence2
95
+ });
96
+ ```
97
+ - Creates a chat session for each sequence
98
+ - Each session has its own conversation history
99
+ - Sessions are completely independent
100
+ - No system prompts in this example (could be added)
101
+
102
+ ### 7. Define Questions (Lines 35-36)
103
+ ```javascript
104
+ const q1 = "Hi there, how are you?";
105
+ const q2 = "How much is 6+6?";
106
+ ```
107
+ - Two completely different questions
108
+ - Will be processed simultaneously
109
+ - Different types: conversational vs. computational
110
+
111
+ ### 8. Parallel Execution with Promise.all (Lines 38-44)
112
+ ```javascript
113
+ const [
114
+ a1,
115
+ a2
116
+ ] = await Promise.all([
117
+ session1.prompt(q1),
118
+ session2.prompt(q2)
119
+ ]);
120
+ ```
121
+
122
+ **How this works:**
123
+
124
+ 1. `session1.prompt(q1)` starts asynchronously
125
+ 2. `session2.prompt(q2)` starts asynchronously (doesn't wait for #1)
126
+ 3. `Promise.all()` waits for BOTH to complete
127
+ 4. Returns results in array: [response1, response2]
128
+ 5. Destructures into `a1` and `a2`
129
+
130
+ **Key benefit**: Both prompts are processed at the same time, not one after another!
131
+
132
+ ### 9. Display Results (Lines 46-50)
133
+ ```javascript
134
+ console.log("User: " + q1);
135
+ console.log("AI: " + a1);
136
+
137
+ console.log("User: " + q2);
138
+ console.log("AI: " + a2);
139
+ ```
140
+ - Outputs both question-answer pairs
141
+ - Results appear in order despite parallel processing
142
+
143
+ ## Key Concepts Demonstrated
144
+
145
+ ### 1. Parallel Processing
146
+ Instead of:
147
+ ```javascript
148
+ // Sequential (slow)
149
+ const a1 = await session1.prompt(q1); // Wait
150
+ const a2 = await session2.prompt(q2); // Wait again
151
+ ```
152
+
153
+ We use:
154
+ ```javascript
155
+ // Parallel (fast)
156
+ const [a1, a2] = await Promise.all([
157
+ session1.prompt(q1),
158
+ session2.prompt(q2)
159
+ ]);
160
+ ```
161
+
162
+ ### 2. Context Sequences
163
+ A context can hold multiple independent sequences:
164
+
165
+ ```
166
+ ┌─────────────────────────────────────┐
167
+ │ Context (Shared) │
168
+ │ ┌───────────────────────────────┐ │
169
+ │ │ Model Weights (8B params) │ │
170
+ │ └───────────────────────────────┘ │
171
+ │ │
172
+ │ ┌─────────────┐ ┌─────────────┐ │
173
+ │ │ Sequence 1 │ │ Sequence 2 │ │
174
+ │ │ "Hi there" │ │ "6+6?" │ │
175
+ │ │ History... │ │ History... │ │
176
+ │ └─────────────┘ └─────────────┘ │
177
+ └─────────────────────────────────────┘
178
+ ```
179
+
180
+ ## Performance Comparison
181
+
182
+ ### Sequential Execution
183
+ ```
184
+ Request 1: 2 seconds
185
+ Request 2: 2 seconds
186
+ Total: 4 seconds
187
+ ```
188
+
189
+ ### Parallel Execution (This Example)
190
+ ```
191
+ Request 1: 2 seconds ──┐
192
+ Request 2: 2 seconds ──┤ Both running
193
+ Total: ~2 seconds └─ simultaneously
194
+ ```
195
+
196
+ **Speedup**: ~2x for 2 sequences, scales with more sequences
197
+
198
+ ## Use Cases
199
+
200
+ ### 1. Multi-User Applications
201
+ ```javascript
202
+ // Handle multiple users simultaneously
203
+ const [user1Response, user2Response, user3Response] = await Promise.all([
204
+ session1.prompt(user1Query),
205
+ session2.prompt(user2Query),
206
+ session3.prompt(user3Query)
207
+ ]);
208
+ ```
209
+
210
+ ### 2. Multi-Agent Systems
211
+ ```javascript
212
+ // Multiple agents working on different tasks
213
+ const [
214
+ plannerResponse,
215
+ analyzerResponse,
216
+ executorResponse
217
+ ] = await Promise.all([
218
+ plannerSession.prompt("Plan the task"),
219
+ analyzerSession.prompt("Analyze the data"),
220
+ executorSession.prompt("Execute step 1")
221
+ ]);
222
+ ```
223
+
224
+ ### 3. Benchmarking
225
+ ```javascript
226
+ // Test multiple prompts for evaluation
227
+ const results = await Promise.all(
228
+ testPrompts.map(prompt => session.prompt(prompt))
229
+ );
230
+ ```
231
+
232
+ ### 4. A/B Testing
233
+ ```javascript
234
+ // Test different system prompts
235
+ const [responseA, responseB] = await Promise.all([
236
+ sessionWithPromptA.prompt(query),
237
+ sessionWithPromptB.prompt(query)
238
+ ]);
239
+ ```
240
+
241
+ ## Resource Considerations
242
+
243
+ ### Memory Usage
244
+ Each sequence needs memory for:
245
+ - Conversation history
246
+ - Intermediate computations
247
+ - KV cache (key-value cache for transformer attention)
248
+
249
+ **Rule of thumb**: More sequences = more memory needed
250
+
251
+ ### GPU Utilization
252
+ - **Single sequence**: May underutilize GPU
253
+ - **Multiple sequences**: Better GPU utilization
254
+ - **Too many sequences**: May exceed VRAM, causing slowdown
255
+
256
+ ### Optimal Number of Sequences
257
+ Depends on:
258
+ - Available VRAM
259
+ - Model size
260
+ - Context length
261
+ - Batch size
262
+
263
+ **Typical**: 2-8 sequences for consumer GPUs
264
+
265
+ ## Limitations & Considerations
266
+
267
+ ### 1. Shared Context Limit
268
+ All sequences share the same context memory pool:
269
+ ```
270
+ Total context size: 8192 tokens
271
+ Sequence 1: 4096 tokens
272
+ Sequence 2: 4096 tokens
273
+ Maximum distribution!
274
+ ```
275
+
276
+ ### 2. Not True Parallelism for CPU
277
+ On CPU-only systems, sequences are interleaved, not truly parallel. Still provides better overall throughput.
278
+
279
+ ### 3. Model Loading Overhead
280
+ The model is loaded once and shared, which is efficient. But initial loading still takes time.
281
+
282
+ ## Why This Matters for AI Agents
283
+
284
+ ### Efficiency in Production
285
+ Real-world agent systems need to:
286
+ - Handle multiple requests concurrently
287
+ - Respond quickly to users
288
+ - Make efficient use of hardware
289
+
290
+ ### Multi-Agent Architectures
291
+ Complex agent systems often have:
292
+ - **Planner agent**: Thinks about strategy
293
+ - **Executor agent**: Takes actions
294
+ - **Critic agent**: Evaluates results
295
+
296
+ These can run in parallel using separate sequences.
297
+
298
+ ### Scalability
299
+ This pattern is the foundation for:
300
+ - Web services with multiple users
301
+ - Batch processing of data
302
+ - Distributed agent systems
303
+
304
+ ## Best Practices
305
+
306
+ 1. **Match sequences to workload**: Don't create more than you need
307
+ 2. **Monitor memory usage**: Each sequence consumes VRAM
308
+ 3. **Use appropriate batch size**: Balance speed vs. memory
309
+ 4. **Clean up resources**: Always dispose when done
310
+ 5. **Handle errors**: Wrap Promise.all in try-catch
311
+
312
+ ## Expected Output
313
+
314
+ Running this script should output something like:
315
+ ```
316
+ User: Hi there, how are you?
317
+ AI: Hello! I'm doing well, thank you for asking...
318
+
319
+ User: How much is 6+6?
320
+ AI: 12
321
+ ```
322
+
323
+ Both responses appear quickly because they were processed simultaneously!
examples/05_batch/CONCEPT.md ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Parallel Processing & Performance Optimization
2
+
3
+ ## Overview
4
+
5
+ This example demonstrates **concurrent execution** of multiple LLM requests using separate context sequences, a critical technique for building scalable AI agent systems.
6
+
7
+ ## The Performance Problem
8
+
9
+ ### Sequential Processing (Slow)
10
+
11
+ Traditional approach processes one request at a time:
12
+
13
+ ```
14
+ Request 1 ────────→ Response 1 (2s)
15
+
16
+ Request 2 ────────→ Response 2 (2s)
17
+
18
+ Total: 4 seconds
19
+ ```
20
+
21
+ ### Parallel Processing (Fast)
22
+
23
+ This example processes multiple requests simultaneously:
24
+
25
+ ```
26
+ Request 1 ────────→ Response 1 (2s) ──┐
27
+ ├→ Total: 2 seconds
28
+ Request 2 ────────→ Response 2 (2s) ──┘
29
+ (Both running at the same time)
30
+ ```
31
+
32
+ **Performance gain: 2x speedup!**
33
+
34
+ ## Core Concept: Context Sequences
35
+
36
+ ### Single vs. Multiple Sequences
37
+
38
+ ```
39
+ ┌────────────────────────────────────────────────┐
40
+ │ Model (Loaded Once) │
41
+ ├────────────────────────────────────────────────┤
42
+ │ Context │
43
+ │ ┌──────────────┐ ┌──────────────┐ │
44
+ │ │ Sequence 1 │ │ Sequence 2 │ │
45
+ │ │ │ │ │ │
46
+ │ │ Conversation │ │ Conversation │ │
47
+ │ │ History A │ │ History B │ │
48
+ │ └──────────────┘ └──────────────┘ │
49
+ └────────────────────────────────────────────────┘
50
+ ```
51
+
52
+ **Key insights:**
53
+ - Model weights are shared (memory efficient)
54
+ - Each sequence has independent history
55
+ - Sequences can process in parallel
56
+ - Both use the same underlying model
57
+
58
+ ## How Parallel Processing Works
59
+
60
+ ### Promise.all Pattern
61
+
62
+ JavaScript's `Promise.all()` enables concurrent execution:
63
+
64
+ ```
65
+ Sequential:
66
+ ────────────────────────────────────
67
+ await fn1(); // Wait 2s
68
+ await fn2(); // Wait 2s more
69
+ Total: 4s
70
+
71
+ Parallel:
72
+ ────────────────────────────────────
73
+ await Promise.all([
74
+ fn1(), // Start immediately
75
+ fn2() // Start immediately (don't wait!)
76
+ ]);
77
+ Total: 2s (whichever finishes last)
78
+ ```
79
+
80
+ ### Execution Timeline
81
+
82
+ ```
83
+ Time → 0s 1s 2s 3s 4s
84
+ │ │ │ │ │
85
+ Seq 1: ├───────Processing───────┤
86
+ │ └─ Response 1
87
+
88
+ Seq 2: ├───────Processing───────┤
89
+ └─ Response 2
90
+
91
+ Both complete at ~2s instead of 4s!
92
+ ```
93
+
94
+ ## GPU Batch Processing
95
+
96
+ ### Why Batching Matters
97
+
98
+ Modern GPUs process multiple operations efficiently:
99
+
100
+ ```
101
+ Without Batching (Inefficient)
102
+ ──────────────────────────────
103
+ GPU: [Token 1] ... wait ...
104
+ GPU: [Token 2] ... wait ...
105
+ GPU: [Token 3] ... wait ...
106
+ └─ GPU underutilized
107
+
108
+ With Batching (Efficient)
109
+ ─────────────────────────
110
+ GPU: [Tokens 1-1024] ← Full batch
111
+ └─ GPU fully utilized!
112
+ ```
113
+
114
+ **batchSize parameter**: Controls how many tokens process together.
115
+
116
+ ### Trade-offs
117
+
118
+ ```
119
+ Small Batch (e.g., 128) Large Batch (e.g., 2048)
120
+ ─────────────────────── ────────────────────────
121
+ ✓ Lower memory ✓ Better GPU utilization
122
+ ✓ More flexible ✓ Faster throughput
123
+ ✗ Slower throughput ✗ Higher memory usage
124
+ ✗ GPU underutilized ✗ May exceed VRAM
125
+ ```
126
+
127
+ **Sweet spot**: Usually 512-1024 for consumer GPUs.
128
+
129
+ ## Architecture Patterns
130
+
131
+ ### Pattern 1: Multi-User Service
132
+
133
+ ```
134
+ ┌─────────┐ ┌─────────┐ ┌─────────┐
135
+ │ User A │ │ User B │ │ User C │
136
+ └────┬────┘ └────┬────┘ └────┬────┘
137
+ │ │ │
138
+ └────────────┼��───────────┘
139
+
140
+ ┌────────────────┐
141
+ │ Load Balancer │
142
+ └────────────────┘
143
+
144
+ ┌────────────┼────────────┐
145
+ ↓ ↓ ↓
146
+ ┌─────────┐ ┌─────────┐ ┌─────────┐
147
+ │ Seq 1 │ │ Seq 2 │ │ Seq 3 │
148
+ └─────────┘ └─────────┘ └─────────┘
149
+ └────────────┼────────────┘
150
+
151
+ ┌────────────────┐
152
+ │ Shared Model │
153
+ └────────────────┘
154
+ ```
155
+
156
+ ### Pattern 2: Multi-Agent System
157
+
158
+ ```
159
+ ┌──────────────┐
160
+ │ Task │
161
+ └──────┬───────┘
162
+
163
+ ┌────────┼────────┐
164
+ ↓ ↓ ↓
165
+ ┌────────┐ ┌──────┐ ┌──────────┐
166
+ │Planner │ │Critic│ │ Executor │
167
+ │ Agent │ │Agent │ │ Agent │
168
+ └───┬────┘ └──┬───┘ └────┬─────┘
169
+ │ │ │
170
+ └─────────┼──────────┘
171
+
172
+ (All run in parallel)
173
+ ```
174
+
175
+ ### Pattern 3: Pipeline Processing
176
+
177
+ ```
178
+ Input Queue: [Task1, Task2, Task3, ...]
179
+
180
+ ┌───────────────┐
181
+ │ Dispatcher │
182
+ └───────────────┘
183
+
184
+ ┌───────────┼───────────┐
185
+ ↓ ↓ ↓
186
+ Sequence 1 Sequence 2 Sequence 3
187
+ ↓ ↓ ↓
188
+ └───────────┼───────────┘
189
+
190
+ Output: [R1, R2, R3]
191
+ ```
192
+
193
+ ## Resource Management
194
+
195
+ ### Memory Allocation
196
+
197
+ Each sequence consumes memory:
198
+
199
+ ```
200
+ ┌──────────────────────────────────┐
201
+ │ Total VRAM: 8GB │
202
+ ├──────────────────────────────────┤
203
+ │ Model Weights: 4.0 GB │
204
+ │ Context Base: 1.0 GB │
205
+ │ Sequence 1 (KV Cache): 0.8 GB │
206
+ │ Sequence 2 (KV Cache): 0.8 GB │
207
+ │ Sequence 3 (KV Cache): 0.8 GB │
208
+ │ Overhead: 0.6 GB │
209
+ ├──────────────────────────────────┤
210
+ │ Total Used: 8.0 GB │
211
+ │ Remaining: 0.0 GB │
212
+ └──────────────────────────────────┘
213
+ Maximum capacity!
214
+ ```
215
+
216
+ **Formula**:
217
+ ```
218
+ Required VRAM = Model + Context + (NumSequences × KVCache)
219
+ ```
220
+
221
+ ### Finding Optimal Sequence Count
222
+
223
+ ```
224
+ Too Few (1-2) Optimal (4-8) Too Many (16+)
225
+ ───────────── ───────────── ──────────────
226
+ GPU underutilized Balanced use Memory overflow
227
+ ↓ ↓ ↓
228
+ Slow throughput Best performance Thrashing/crashes
229
+ ```
230
+
231
+ **Test your system**:
232
+ 1. Start with 2 sequences
233
+ 2. Monitor VRAM usage
234
+ 3. Increase until performance plateaus
235
+ 4. Back off if memory issues occur
236
+
237
+ ## Real-World Scenarios
238
+
239
+ ### Scenario 1: Chatbot Service
240
+
241
+ ```
242
+ Challenge: 100 users, each waiting 2s per response
243
+ Sequential: 100 × 2s = 200s (3.3 minutes!)
244
+ Parallel (10 seq): 10 batches × 2s = 20s
245
+ 10x speedup!
246
+ ```
247
+
248
+ ### Scenario 2: Batch Analysis
249
+
250
+ ```
251
+ Task: Analyze 1000 documents
252
+ Sequential: 1000 × 3s = 50 minutes
253
+ Parallel (8 seq): 125 batches × 3s = 6.25 minutes
254
+ 8x speedup!
255
+ ```
256
+
257
+ ### Scenario 3: Multi-Agent Collaboration
258
+
259
+ ```
260
+ Agents: Planner, Analyzer, Executor (all needed)
261
+ Sequential: Wait for each → Slow pipeline
262
+ Parallel: All work together → Fast decision-making
263
+ ```
264
+
265
+ ## Limitations & Considerations
266
+
267
+ ### 1. Context Capacity Sharing
268
+
269
+ ```
270
+ Problem: Sequences share total context space
271
+ ───────────────────────────────────────────
272
+ Total context: 4096 tokens
273
+ 2 sequences: Each gets ~2048 tokens max
274
+ 4 sequences: Each gets ~1024 tokens max
275
+
276
+ More sequences = Less history per sequence!
277
+ ```
278
+
279
+ ### 2. CPU vs GPU Parallelism
280
+
281
+ ```
282
+ With GPU: CPU Only:
283
+ True parallel processing Interleaved processing
284
+ Multiple CUDA streams Single thread context-switching
285
+ (Still helps throughput!)
286
+ ```
287
+
288
+ ### 3. Not Always Faster
289
+
290
+ ```
291
+ When parallel helps: When it doesn't:
292
+ • Independent requests • Dependent requests (must wait)
293
+ • I/O-bound operations • Very short prompts (overhead)
294
+ • Multiple users • Single sequential conversation
295
+ ```
296
+
297
+ ## Best Practices
298
+
299
+ ### 1. Design for Independence
300
+ ```
301
+ ✓ Good: Separate user conversations
302
+ ✓ Good: Independent analysis tasks
303
+ ✗ Bad: Sequential reasoning steps (use ReAct instead)
304
+ ```
305
+
306
+ ### 2. Monitor Resources
307
+ ```
308
+ Track:
309
+ • VRAM usage per sequence
310
+ • Processing time per request
311
+ • Queue depths
312
+ • Error rates
313
+ ```
314
+
315
+ ### 3. Implement Graceful Degradation
316
+ ```
317
+ if (vramExceeded) {
318
+ reduceSequenceCount();
319
+ // or queue requests instead
320
+ }
321
+ ```
322
+
323
+ ### 4. Handle Errors Properly
324
+ ```javascript
325
+ try {
326
+ const results = await Promise.all([...]);
327
+ } catch (error) {
328
+ // One failure doesn't crash all sequences
329
+ handlePartialResults();
330
+ }
331
+ ```
332
+
333
+ ## Comparison: Evolution of Performance
334
+
335
+ ```
336
+ Stage Requests/Min Pattern
337
+ ───────────────── ───────────── ───────────────
338
+ 1. Basic (intro) 30 Sequential
339
+ 2. Batch (this) 120 4 sequences
340
+ 3. Load balanced 240 8 sequences + queue
341
+ 4. Distributed 1000+ Multiple machines
342
+ ```
343
+
344
+ ## Key Takeaways
345
+
346
+ 1. **Parallelism is essential** for production AI agent systems
347
+ 2. **Sequences share model** but maintain independent state
348
+ 3. **Promise.all** enables concurrent JavaScript execution
349
+ 4. **Batch size** affects GPU utilization and throughput
350
+ 5. **Memory is the limit** - more sequences need more VRAM
351
+ 6. **Not magic** - only helps with independent tasks
352
+
353
+ ## Practical Formula
354
+
355
+ ```
356
+ Speedup = min(
357
+ Number_of_Sequences,
358
+ Available_VRAM / Memory_Per_Sequence,
359
+ GPU_Compute_Limit
360
+ )
361
+ ```
362
+
363
+ Typically: 2-10x speedup for well-designed systems.
364
+
365
+ This technique is foundational for building scalable agent architectures that can handle real-world workloads efficiently.
examples/05_batch/batch.js ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {getLlama, LlamaChatSession} from "node-llama-cpp";
2
+ import path from "path";
3
+ import {fileURLToPath} from "url";
4
+
5
+ /**
6
+ * Asynchronous execution improves performance in GAIA benchmarks,
7
+ * multi-agent applications, and other high-throughput scenarios.
8
+ */
9
+
10
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
11
+ const modelPath = path.join(
12
+ __dirname,
13
+ '..',
14
+ '..',
15
+ 'models',
16
+ 'DeepSeek-R1-0528-Qwen3-8B-Q6_K.gguf'
17
+ )
18
+
19
+ const llama = await getLlama({
20
+ logLevel: 'error'
21
+ });
22
+ const model = await llama.loadModel({modelPath});
23
+ const context = await model.createContext({
24
+ sequences: 2,
25
+ batchSize: 1024 // The number of tokens that can be processed at once by the GPU.
26
+ });
27
+
28
+ const sequence1 = context.getSequence();
29
+ const sequence2 = context.getSequence();
30
+
31
+ const session1 = new LlamaChatSession({
32
+ contextSequence: sequence1
33
+ });
34
+ const session2 = new LlamaChatSession({
35
+ contextSequence: sequence2
36
+ });
37
+
38
+ const q1 = "Hi there, how are you?";
39
+ const q2 = "How much is 6+6?";
40
+
41
+ console.log('Batching started...')
42
+ const [
43
+ a1,
44
+ a2
45
+ ] = await Promise.all([
46
+ session1.prompt(q1),
47
+ session2.prompt(q2)
48
+ ]);
49
+
50
+ console.log("User: " + q1);
51
+ console.log("AI: " + a1);
52
+
53
+ console.log("User: " + q2);
54
+ console.log("AI: " + a2);
55
+
56
+ session1.dispose();
57
+ session2.dispose();
58
+ context.dispose();
59
+ model.dispose();
60
+ llama.dispose();
examples/06_coding/CODE.md ADDED
@@ -0,0 +1,380 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: coding.js
2
+
3
+ This file demonstrates **streaming responses** with token limits and real-time output, showing how to get immediate feedback from the LLM as it generates text.
4
+
5
+ ## Step-by-Step Code Breakdown
6
+
7
+ ### 1. Import and Setup (Lines 1-8)
8
+ ```javascript
9
+ import {
10
+ getLlama,
11
+ HarmonyChatWrapper,
12
+ LlamaChatSession,
13
+ } from "node-llama-cpp";
14
+ import {fileURLToPath} from "url";
15
+ import path from "path";
16
+
17
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
18
+ ```
19
+ - Standard setup for LLM interaction
20
+ - **HarmonyChatWrapper**: A chat format wrapper for models that use the Harmony format (more on this below)
21
+
22
+ ### 2. Understanding the Harmony Chat Format
23
+
24
+ #### What is Harmony?
25
+ Harmony is a structured message format used for multi-role chat interactions designed by OpenAI for their gpt-oss models. It's not just a prompt format - it's a complete rethinking of how models should structure their outputs, especially for complex reasoning and tool use.
26
+
27
+ #### Harmony Format Structure
28
+
29
+ The format uses special tokens and syntax to define roles such as `system`, `developer`, `user`, `assistant`, and `tool`, as well as output "channels" (`analysis`, `commentary`, `final`) that let the model reason internally, call tools, and produce clean user-facing responses.
30
+
31
+ **Basic message structure:**
32
+ ```
33
+ <|start|>ROLE<|message|>CONTENT<|end|>
34
+ <|start|>assistant<|channel|>CHANNEL<|message|>CONTENT<|end|>
35
+ ```
36
+
37
+ **The five roles in hierarchy order** (system > developer > user > assistant > tool):
38
+
39
+ 1. **system**: Global identity, guardrails, and model configuration
40
+ 2. **developer**: Product policy and style instructions (what you typically think of as "system prompt")
41
+ 3. **user**: User messages and queries
42
+ 4. **assistant**: Model responses
43
+ 5. **tool**: Tool execution results
44
+
45
+ **The three output channels:**
46
+
47
+ 1. **analysis**: Private chain-of-thought reasoning not shown to users
48
+ 2. **commentary**: Tool calling preambles and process updates
49
+ 3. **final**: Clean user-facing responses
50
+
51
+ **Example of Harmony in action:**
52
+ ```
53
+ <|start|>system<|message|>You are a helpful assistant.<|end|>
54
+ <|start|>developer<|message|>Always be concise.<|end|>
55
+ <|start|>user<|message|>What time is it?<|end|>
56
+ <|start|>assistant<|channel|>commentary<|message|>{"tool_use": {"name": "get_current_time", "arguments": {}}}<|end|>
57
+ <|start|>tool<|message|>{"time": "2025-10-25T13:47:00Z"}<|end|>
58
+ <|start|>assistant<|channel|>final<|message|>The current time is 1:47 PM UTC.<|end|>
59
+ ```
60
+
61
+ #### Why Use Harmony?
62
+
63
+ Harmony separates how the model thinks, what actions it takes, and what finally goes to the user, resulting in cleaner tool use, safer defaults for UI, and better observability. For our translation example:
64
+
65
+ - The `final` channel ensures we only get the translation, not explanations
66
+ - The structured format helps the model follow instructions more reliably
67
+ - The role hierarchy prevents instruction conflicts
68
+
69
+ **Important Note**: Models need to be specifically trained or fine-tuned to produce Harmony output correctly. You can't just apply this format to any model. Apertus and other models not explicitly trained on Harmony may be confused by this structure, but the HarmonyChatWrapper in node-llama-cpp handles the necessary formatting automatically.
70
+
71
+
72
+ ### 3. Load Model (Lines 10-18)
73
+ ```javascript
74
+ const llama = await getLlama();
75
+ const model = await llama.loadModel({
76
+ modelPath: path.join(
77
+ __dirname,
78
+ "../",
79
+ "models",
80
+ "hf_giladgd_gpt-oss-20b.MXFP4.gguf"
81
+ )
82
+ });
83
+ ```
84
+ - Uses **gpt-oss-20b**: A 20 billion parameter model
85
+ - **MXFP4**: Mixed precision 4-bit quantization for smaller size
86
+ - Larger model = better code explanations
87
+
88
+ ### 4. Create Context and Session (Lines 19-22)
89
+ ```javascript
90
+ const context = await model.createContext();
91
+ const session = new LlamaChatSession({
92
+ chatWrapper: new HarmonyChatWrapper(),
93
+ contextSequence: context.getSequence(),
94
+ });
95
+ ```
96
+ Basic session setup with no system prompt.
97
+
98
+ ### 5. Define the Question (Line 24)
99
+ ```javascript
100
+ const q1 = `What is hoisting in JavaScript? Explain with examples.`;
101
+ ```
102
+ A technical programming question that requires detailed explanation.
103
+
104
+ ### 6. Display Context Size (Line 26)
105
+ ```javascript
106
+ console.log('context.contextSize', context.contextSize)
107
+ ```
108
+ - Shows the maximum context window size
109
+ - Helps understand memory limitations
110
+ - Useful for debugging
111
+
112
+ ### 7. Streaming Prompt Execution (Lines 28-36)
113
+ ```javascript
114
+ const a1 = await session.prompt(q1, {
115
+ // Tip: let the lib choose or cap reasonably; using the whole context size can be wasteful
116
+ maxTokens: 2000,
117
+
118
+ // Fires as soon as the first characters arrive
119
+ onTextChunk: (text) => {
120
+ process.stdout.write(text); // optional: live print
121
+ },
122
+ });
123
+ ```
124
+
125
+ **Key parameters:**
126
+
127
+ **maxTokens: 2000**
128
+ - Limits response length to 2000 tokens (~1500 words)
129
+ - Prevents runaway generation
130
+ - Saves time and compute
131
+ - Without limit: model uses entire context
132
+
133
+ **onTextChunk callback**
134
+ - Fires **as each token is generated**
135
+ - Receives text as it's produced
136
+ - `process.stdout.write()`: Prints without newlines
137
+ - Creates real-time "typing" effect
138
+
139
+ ### How Streaming Works
140
+
141
+ ```
142
+ Without streaming:
143
+ User → [Wait 10 seconds...] → Complete response appears
144
+
145
+ With streaming:
146
+ User → [Token 1] → [Token 2] → [Token 3] → ... → Complete
147
+ "What" "is" "hoisting"
148
+ (Immediate feedback!)
149
+ ```
150
+
151
+ ### 8. Display Final Answer (Line 38)
152
+ ```javascript
153
+ console.log("\n\nFinal answer:\n", a1);
154
+ ```
155
+ - Prints the complete response again
156
+ - Useful for logging or verification
157
+ - Shows full text after streaming
158
+
159
+ ### 9. Cleanup (Lines 41-44)
160
+ ```javascript
161
+ session.dispose()
162
+ context.dispose()
163
+ model.dispose()
164
+ llama.dispose()
165
+ ```
166
+ Standard resource cleanup.
167
+
168
+ ## Key Concepts Demonstrated
169
+
170
+ ### 1. Streaming Responses
171
+
172
+ **Why streaming matters:**
173
+ - **Better UX**: Users see progress immediately
174
+ - **Early termination**: Can stop if response is off-track
175
+ - **Perceived speed**: Feels faster than waiting
176
+ - **Debugging**: See generation in real-time
177
+
178
+ **Comparison:**
179
+ ```
180
+ Non-streaming: Streaming:
181
+ ═══════════════ ═══════════════
182
+ Request sent Request sent
183
+ [10s wait...] "What" (0.1s)
184
+ Complete response "is" (0.2s)
185
+ "hoisting" (0.3s)
186
+ ... continues
187
+ (Same total time, better experience!)
188
+ ```
189
+
190
+ ### 2. Token Limits
191
+
192
+ **maxTokens controls generation length:**
193
+
194
+ ```
195
+ No limit: With limit (2000):
196
+ ───────── ─────────────────
197
+ May generate forever Stops at 2000 tokens
198
+ Uses entire context Saves computation
199
+ Unpredictable cost Predictable cost
200
+ ```
201
+
202
+ **Token approximation:**
203
+ - 1 token ≈ 0.75 words (English)
204
+ - 2000 tokens ≈ 1500 words
205
+ - 4-5 paragraphs of detailed explanation
206
+
207
+ ### 3. Real-Time Feedback Pattern
208
+
209
+ The `onTextChunk` callback enables:
210
+ ```javascript
211
+ onTextChunk: (text) => {
212
+ // Do anything with each chunk:
213
+ process.stdout.write(text); // Console output
214
+ // socket.emit('chunk', text); // WebSocket to client
215
+ // buffer += text; // Accumulate for processing
216
+ // analyzePartial(text); // Real-time analysis
217
+ }
218
+ ```
219
+
220
+ ### 4. Context Size Awareness
221
+
222
+ ```javascript
223
+ console.log('context.contextSize', context.contextSize)
224
+ ```
225
+
226
+ Shows model's memory capacity:
227
+ - Small models: 2048-4096 tokens
228
+ - Medium models: 8192-16384 tokens
229
+ - Large models: 32768+ tokens
230
+
231
+ **Why it matters:**
232
+ ```
233
+ Context Size: 4096 tokens
234
+ Prompt: 100 tokens
235
+ Max response: 2000 tokens
236
+ History: Up to 1996 tokens
237
+ ```
238
+
239
+ ## Use Cases
240
+
241
+ ### 1. Code Explanations (This Example)
242
+ ```javascript
243
+ prompt: "Explain hoisting in JavaScript"
244
+ → Streams detailed explanation with examples
245
+ ```
246
+
247
+ ### 2. Long-Form Content Generation
248
+ ```javascript
249
+ prompt: "Write a blog post about AI agents"
250
+ maxTokens: 3000
251
+ → Streams article as it's written
252
+ ```
253
+
254
+ ### 3. Interactive Tutoring
255
+ ```javascript
256
+ // User sees explanation being built
257
+ prompt: "Teach me about closures"
258
+ onTextChunk: (text) => displayToUser(text)
259
+ ```
260
+
261
+ ### 4. Web Applications
262
+ ```javascript
263
+ // Server-Sent Events or WebSocket
264
+ onTextChunk: (text) => {
265
+ websocket.send(text); // Send to browser
266
+ }
267
+ ```
268
+
269
+ ## Performance Considerations
270
+
271
+ ### Token Generation Speed
272
+
273
+ Depends on:
274
+ - **Model size**: Larger = slower per token
275
+ - **Hardware**: GPU > CPU
276
+ - **Quantization**: Lower bits = faster
277
+ - **Context length**: Longer context = slower
278
+
279
+ **Typical speeds:**
280
+ ```
281
+ Model Size GPU (RTX 4090) CPU (M2 Max)
282
+ ────────── ────────────── ────────────
283
+ 1.7B 50-80 tok/s 15-25 tok/s
284
+ 8B 20-35 tok/s 5-10 tok/s
285
+ 20B 10-15 tok/s 2-4 tok/s
286
+ ```
287
+
288
+ ### When to Use maxTokens
289
+
290
+ ```
291
+ ✓ Use maxTokens when:
292
+ • Response length is predictable
293
+ • You want to save computation
294
+ • Testing/debugging
295
+ • API rate limiting
296
+
297
+ ✗ Don't limit when:
298
+ • Need complete answer
299
+ • Length varies greatly
300
+ • Using stop sequences instead
301
+ ```
302
+
303
+ ## Advanced Streaming Patterns
304
+
305
+ ### Pattern 1: Progressive Enhancement
306
+ ```javascript
307
+ let buffer = '';
308
+ onTextChunk: (text) => {
309
+ buffer += text;
310
+ if (buffer.includes('\n\n')) {
311
+ // Complete paragraph ready
312
+ processParagraph(buffer);
313
+ buffer = '';
314
+ }
315
+ }
316
+ ```
317
+
318
+ ### Pattern 2: Early Stopping
319
+ ```javascript
320
+ let isRelevant = true;
321
+ onTextChunk: (text) => {
322
+ if (text.includes('irrelevant_keyword')) {
323
+ isRelevant = false;
324
+ // Stop generation (would need additional API)
325
+ }
326
+ }
327
+ ```
328
+
329
+ ### Pattern 3: Multi-Consumer
330
+ ```javascript
331
+ onTextChunk: (text) => {
332
+ console.log(text); // Console
333
+ logFile.write(text); // File
334
+ websocket.send(text); // Client
335
+ analyzer.process(text); // Analysis
336
+ }
337
+ ```
338
+
339
+ ## Expected Output
340
+
341
+ When run, you'll see:
342
+ 1. Context size logged (e.g., "context.contextSize 32768")
343
+ 2. Streaming response appearing token-by-token
344
+ 3. Complete final answer printed again
345
+
346
+ Example output flow:
347
+ ```
348
+ context.contextSize 32768
349
+ Hoisting is a JavaScript mechanism where variables and function
350
+ declarations are moved to the top of their scope before code
351
+ execution. For example:
352
+
353
+ console.log(x); // undefined (not an error!)
354
+ var x = 5;
355
+
356
+ This works because...
357
+ [continues streaming...]
358
+
359
+ Final answer:
360
+ [Complete response printed again]
361
+ ```
362
+
363
+ ## Why This Matters for AI Agents
364
+
365
+ ### User Experience
366
+ - Real-time agents feel more responsive
367
+ - Users can interrupt if going wrong direction
368
+ - Better for conversational interfaces
369
+
370
+ ### Resource Management
371
+ - Token limits prevent runaway generation
372
+ - Predictable costs and timing
373
+ - Can cancel expensive operations early
374
+
375
+ ### Integration Patterns
376
+ - Web UIs show "typing" effect
377
+ - CLIs display progressive output
378
+ - APIs stream to clients efficiently
379
+
380
+ This pattern is essential for production agent systems where user experience and resource control matter.
examples/06_coding/CONCEPT.md ADDED
@@ -0,0 +1,400 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Streaming & Response Control
2
+
3
+ ## Overview
4
+
5
+ This example demonstrates **streaming responses** and **token limits**, two essential techniques for building responsive AI agents with controlled output.
6
+
7
+ ## The Streaming Problem
8
+
9
+ ### Traditional (Non-Streaming) Approach
10
+
11
+ ```
12
+ User sends prompt
13
+
14
+ [Wait 10 seconds...]
15
+
16
+ Complete response appears all at once
17
+ ```
18
+
19
+ **Problems:**
20
+ - Poor user experience (long wait)
21
+ - No progress indication
22
+ - Can't interrupt bad responses
23
+ - Feels unresponsive
24
+
25
+ ### Streaming Approach (This Example)
26
+
27
+ ```
28
+ User sends prompt
29
+
30
+ "Hoisting" (0.1s) → User sees first word!
31
+
32
+ "is a" (0.2s) → More text appears
33
+
34
+ "JavaScript" (0.3s) → Continuous feedback
35
+
36
+ [Continues token by token...]
37
+ ```
38
+
39
+ **Benefits:**
40
+ - Immediate feedback
41
+ - Progress visible
42
+ - Can interrupt early
43
+ - Feels interactive
44
+
45
+ ## How Streaming Works
46
+
47
+ ### Token-by-Token Generation
48
+
49
+ LLMs generate one token at a time internally. Streaming exposes this:
50
+
51
+ ```
52
+ Internal LLM Process:
53
+ ┌─────────────────────────────────────┐
54
+ │ Token 1: "Hoisting" │
55
+ │ Token 2: "is" │
56
+ │ Token 3: "a" │
57
+ │ Token 4: "JavaScript" │
58
+ │ Token 5: "mechanism" │
59
+ │ ... │
60
+ └─────────────────────────────────────┘
61
+
62
+ Without Streaming: With Streaming:
63
+ Wait for all tokens Emit each token immediately
64
+ └─→ Buffer → Return └─→ Callback → Display
65
+ ```
66
+
67
+ ### The onTextChunk Callback
68
+
69
+ ```
70
+ ┌────────────────────────────────────┐
71
+ │ Model Generation │
72
+ └────────────┬───────────────────────┘
73
+
74
+ ┌────────┴─────────┐
75
+ │ Each new token │
76
+ └────────┬─────────┘
77
+
78
+ ┌────────────────────┐
79
+ │ onTextChunk(text) │ ← Your callback
80
+ └────────┬───────────┘
81
+
82
+ Your code processes it:
83
+ • Display to user
84
+ • Send over network
85
+ • Log to file
86
+ • Analyze content
87
+ ```
88
+
89
+ ## Token Limits: maxTokens
90
+
91
+ ### Why Limit Output?
92
+
93
+ Without limits, models might generate:
94
+ ```
95
+ User: "Explain hoisting"
96
+ Model: [Generates 10,000 words including:
97
+ - Complete JavaScript history
98
+ - Every edge case
99
+ - Unrelated examples
100
+ - Never stops...]
101
+ ```
102
+
103
+ With limits:
104
+ ```
105
+ User: "Explain hoisting"
106
+ Model: [Generates ~1500 words
107
+ - Core concept
108
+ - Key examples
109
+ - Stops at 2000 tokens]
110
+ ```
111
+
112
+ ### Token Budgeting
113
+
114
+ ```
115
+ Context Window: 4096 tokens
116
+ ├─ System Prompt: 200 tokens
117
+ ├─ User Message: 100 tokens
118
+ ├─ Response (maxTokens): 2000 tokens
119
+ └─ Remaining for history: 1796 tokens
120
+
121
+ Total used: 2300 tokens
122
+ Available: 1796 tokens for future conversation
123
+ ```
124
+
125
+ ### Cost vs Quality
126
+
127
+ ```
128
+ Token Limit Output Quality Use Case
129
+ ─────────── ─────────────── ─────────────────
130
+ 100 Brief, may be cut Quick answers
131
+ 500 Concise but complete Short explanations
132
+ 2000 (example) Detailed Full explanations
133
+ No limit Risk of rambling When length unknown
134
+ ```
135
+
136
+ ## Real-Time Applications
137
+
138
+ ### Pattern 1: Interactive CLI
139
+
140
+ ```
141
+ User: "Explain closures"
142
+
143
+ Terminal: "A closure is a function..."
144
+ (Appears word by word, like typing)
145
+
146
+ User sees progress, knows it's working
147
+ ```
148
+
149
+ ### Pattern 2: Web Application
150
+
151
+ ```
152
+ Browser Server
153
+ │ │
154
+ ├─── Send prompt ────────→│
155
+ │ │
156
+ │←── Chunk 1: "Closures"──┤
157
+ │ (Display immediately) │
158
+ │ │
159
+ │←── Chunk 2: "are"───────┤
160
+ │ (Append to display) │
161
+ │ │
162
+ │←── Chunk 3: "functions"─┤
163
+ │ (Keep appending...) │
164
+ ```
165
+
166
+ Implementation:
167
+ - Server-Sent Events (SSE)
168
+ - WebSockets
169
+ - HTTP streaming
170
+
171
+ ### Pattern 3: Multi-Consumer
172
+
173
+ ```
174
+ onTextChunk(text)
175
+
176
+ ┌───────┼───────┐
177
+ ↓ ↓ ↓
178
+ Console WebSocket Log File
179
+ Display → Client → Storage
180
+ ```
181
+
182
+ ## Performance Characteristics
183
+
184
+ ### Latency vs Throughput
185
+
186
+ ```
187
+ Time to First Token (TTFT):
188
+ ├─ Small model (1.7B): ~100ms
189
+ ├─ Medium model (8B): ~200ms
190
+ └─ Large model (20B): ~500ms
191
+
192
+ Tokens Per Second:
193
+ ├─ Small model: 50-80 tok/s
194
+ ├─ Medium model: 20-35 tok/s
195
+ └─ Large model: 10-15 tok/s
196
+
197
+ User Experience:
198
+ TTFT < 500ms → Feels instant
199
+ Tok/s > 20 → Reads naturally
200
+ ```
201
+
202
+ ### Resource Trade-offs
203
+
204
+ ```
205
+ Model Size Memory Speed Quality
206
+ ────────── ──────── ───── ───────
207
+ 1.7B ~2GB Fast Good
208
+ 8B ~6GB Medium Better
209
+ 20B ~12GB Slower Best
210
+ ```
211
+
212
+ ## Advanced Concepts
213
+
214
+ ### Buffering Strategies
215
+
216
+ **No Buffer (Immediate)**
217
+ ```
218
+ Every token → callback → display
219
+ └─ Smoothest UX but more overhead
220
+ ```
221
+
222
+ **Line Buffer**
223
+ ```
224
+ Accumulate until newline → flush
225
+ └─ Better for paragraph-based output
226
+ ```
227
+
228
+ **Time Buffer**
229
+ ```
230
+ Accumulate for 50ms → flush batch
231
+ └─ Reduces callback frequency
232
+ ```
233
+
234
+ ### Early Stopping
235
+
236
+ ```
237
+ Generation in progress:
238
+ "The answer is clearly... wait, actually..."
239
+
240
+ onTextChunk detects issue
241
+
242
+ Stop generation
243
+
244
+ "Let me reconsider"
245
+ ```
246
+
247
+ Useful for:
248
+ - Detecting off-topic responses
249
+ - Safety filters
250
+ - Relevance checking
251
+
252
+ ### Progressive Enhancement
253
+
254
+ ```
255
+ Partial Response Analysis:
256
+ ┌─────────────────────────────────┐
257
+ │ "To implement this feature..." │
258
+ │ │
259
+ │ ← Already useful information │
260
+ │ │
261
+ │ "...you'll need: 1) Node.js" │
262
+ │ │
263
+ │ ← Can start acting on this │
264
+ │ │
265
+ │ "2) Express framework" │
266
+ └─────────────────────────────────┘
267
+
268
+ Agent can begin working before response completes!
269
+ ```
270
+
271
+ ## Context Size Awareness
272
+
273
+ ### Why It Matters
274
+
275
+ ```
276
+ ┌────────────────────────────────┐
277
+ │ Context Window (4096) │
278
+ ├────────────────────────────────┤
279
+ │ System Prompt 200 tokens │
280
+ │ Conversation History 1000 │
281
+ │ Current Prompt 100 │
282
+ │ Response Space 2796 │
283
+ └────────────────────────────────┘
284
+
285
+ If maxTokens > 2796:
286
+ └─→ Error or truncation!
287
+ ```
288
+
289
+ ### Dynamic Adjustment
290
+
291
+ ```
292
+ Available = contextSize - (prompt + history)
293
+
294
+ if (maxTokens > available) {
295
+ maxTokens = available;
296
+ // or clear old history
297
+ }
298
+ ```
299
+
300
+ ## Streaming in Agent Architectures
301
+
302
+ ### Simple Agent
303
+
304
+ ```
305
+ User → LLM (streaming) → Display
306
+ └─ onTextChunk shows progress
307
+ ```
308
+
309
+ ### Multi-Step Agent
310
+
311
+ ```
312
+ Step 1: Plan (stream) → Show thinking
313
+ Step 2: Act (stream) → Show action
314
+ Step 3: Result (stream) → Show outcome
315
+ └─ User sees agent's process
316
+ ```
317
+
318
+ ### Collaborative Agents
319
+
320
+ ```
321
+ Agent A (streaming) ──┐
322
+ ├─→ Coordinator → User
323
+ Agent B (streaming) ──┘
324
+ └─ Both stream simultaneously
325
+ ```
326
+
327
+ ## Best Practices
328
+
329
+ ### 1. Always Set maxTokens
330
+
331
+ ```
332
+ ✓ Good:
333
+ session.prompt(query, { maxTokens: 2000 })
334
+
335
+ ✗ Risky:
336
+ session.prompt(query)
337
+ └─ May use entire context!
338
+ ```
339
+
340
+ ### 2. Handle Partial Updates
341
+
342
+ ```
343
+ let fullResponse = '';
344
+ onTextChunk: (chunk) => {
345
+ fullResponse += chunk;
346
+ display(chunk); // Show immediately
347
+ logComplete = false; // Mark incomplete
348
+ }
349
+ // After completion:
350
+ saveToDatabase(fullResponse);
351
+ ```
352
+
353
+ ### 3. Provide Feedback
354
+
355
+ ```
356
+ onTextChunk: (chunk) => {
357
+ if (firstChunk) {
358
+ showLoadingDone();
359
+ firstChunk = false;
360
+ }
361
+ appendToDisplay(chunk);
362
+ }
363
+ ```
364
+
365
+ ### 4. Monitor Performance
366
+
367
+ ```
368
+ const startTime = Date.now();
369
+ let tokenCount = 0;
370
+
371
+ onTextChunk: (chunk) => {
372
+ tokenCount += estimateTokens(chunk);
373
+ const elapsed = (Date.now() - startTime) / 1000;
374
+ const tokensPerSecond = tokenCount / elapsed;
375
+ updateMetrics(tokensPerSecond);
376
+ }
377
+ ```
378
+
379
+ ## Key Takeaways
380
+
381
+ 1. **Streaming improves UX**: Users see progress immediately
382
+ 2. **maxTokens controls cost**: Prevents runaway generation
383
+ 3. **Token-by-token generation**: LLMs produce one token at a time
384
+ 4. **onTextChunk callback**: Your hook into the generation process
385
+ 5. **Context awareness matters**: Monitor available space
386
+ 6. **Essential for production**: Real-time systems need streaming
387
+
388
+ ## Comparison
389
+
390
+ ```
391
+ Feature intro.js coding.js (this)
392
+ ──────────────── ───────── ─────────────────
393
+ Streaming ✗ ✓
394
+ Token limit ✗ ✓ (2000)
395
+ Real-time output ✗ ✓
396
+ Progress visible ✗ ✓
397
+ User control ✗ ✓
398
+ ```
399
+
400
+ This pattern is foundational for building responsive, user-friendly AI agent interfaces.
examples/06_coding/coding.js ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {
2
+ getLlama,
3
+ HarmonyChatWrapper,
4
+ LlamaChatSession,
5
+ } from "node-llama-cpp";
6
+ import {fileURLToPath} from "url";
7
+ import path from "path";
8
+
9
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
10
+
11
+ const llama = await getLlama();
12
+ const model = await llama.loadModel({
13
+ modelPath: path.join(
14
+ __dirname,
15
+ '..',
16
+ '..',
17
+ 'models',
18
+ 'hf_giladgd_gpt-oss-20b.MXFP4.gguf'
19
+ )
20
+ });
21
+ const context = await model.createContext();
22
+ const session = new LlamaChatSession({
23
+ chatWrapper: new HarmonyChatWrapper(),
24
+ contextSequence: context.getSequence(),
25
+ });
26
+
27
+ const q1 = `What is hoisting in JavaScript? Explain with examples.`;
28
+
29
+ console.log('context.contextSize', context.contextSize)
30
+
31
+ const a1 = await session.prompt(q1, {
32
+ // Tip: let the lib choose or cap reasonably; using the whole context size can be wasteful
33
+ maxTokens: 2000,
34
+
35
+ // Fires as soon as the first characters arrive
36
+ onTextChunk: (text) => {
37
+ process.stdout.write(text); // optional: live print
38
+ },
39
+ });
40
+
41
+ console.log("\n\nFinal answer:\n", a1);
42
+
43
+
44
+ session.dispose()
45
+ context.dispose()
46
+ model.dispose()
47
+ llama.dispose()
examples/07_simple-agent/CODE.md ADDED
@@ -0,0 +1,368 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: simple-agent.js
2
+
3
+ This file demonstrates **function calling** - the core feature that transforms an LLM from a text generator into an agent that can take actions using tools.
4
+
5
+ ## Step-by-Step Code Breakdown
6
+
7
+ ### 1. Import and Setup (Lines 1-7)
8
+ ```javascript
9
+ import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
10
+ import {fileURLToPath} from "url";
11
+ import path from "path";
12
+ import {PromptDebugger} from "../helper/prompt-debugger.js";
13
+
14
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
15
+ const debug = false;
16
+ ```
17
+ - **defineChatSessionFunction**: Key import for creating callable functions
18
+ - **PromptDebugger**: Helper for debugging prompts (covered at the end)
19
+ - **debug**: Controls verbose logging
20
+
21
+ ### 2. Initialize and Load Model (Lines 9-17)
22
+ ```javascript
23
+ const llama = await getLlama({debug});
24
+ const model = await llama.loadModel({
25
+ modelPath: path.join(
26
+ __dirname,
27
+ "../",
28
+ "models",
29
+ "Qwen3-1.7B-Q8_0.gguf"
30
+ )
31
+ });
32
+ const context = await model.createContext({contextSize: 2000});
33
+ ```
34
+ - Uses Qwen3-1.7B model (good for function calling)
35
+ - Sets context size to 2000 tokens explicitly
36
+
37
+ ### 3. System Prompt for Time Conversion (Lines 20-23)
38
+ ```javascript
39
+ const systemPrompt = `You are a professional chronologist who standardizes time representations across different systems.
40
+
41
+ Always convert times from 12-hour format (e.g., "1:46:36 PM") to 24-hour format (e.g., "13:46") without seconds
42
+ before returning them.`;
43
+ ```
44
+
45
+ **Purpose:**
46
+ - Defines agent's role and behavior
47
+ - Instructs on output format (24-hour, no seconds)
48
+ - Ensures consistency in time representation
49
+
50
+ ### 4. Create Session (Lines 25-28)
51
+ ```javascript
52
+ const session = new LlamaChatSession({
53
+ contextSequence: context.getSequence(),
54
+ systemPrompt,
55
+ });
56
+ ```
57
+ Standard session with system prompt.
58
+
59
+ ### 5. Define a Tool Function (Lines 30-39)
60
+ ```javascript
61
+ const getCurrentTime = defineChatSessionFunction({
62
+ description: "Get the current time",
63
+ params: {
64
+ type: "object",
65
+ properties: {}
66
+ },
67
+ async handler() {
68
+ return new Date().toLocaleTimeString();
69
+ }
70
+ });
71
+ ```
72
+
73
+ **Breaking it down:**
74
+
75
+ **description:**
76
+ - Tells the LLM what this function does
77
+ - LLM reads this to decide when to call it
78
+
79
+ **params:**
80
+ - Defines function parameters (JSON Schema format)
81
+ - Empty `properties: {}` means no parameters needed
82
+ - Type must be "object" even if no properties
83
+
84
+ **handler:**
85
+ - The actual JavaScript function that executes
86
+ - Returns current time as string (e.g., "1:46:36 PM")
87
+ - Can be async (use await inside)
88
+
89
+ ### How Function Calling Works
90
+
91
+ ```
92
+ 1. User asks: "What time is it?"
93
+ 2. LLM reads:
94
+ - System prompt
95
+ - Available functions (getCurrentTime)
96
+ - Function description
97
+ 3. LLM decides: "I should call getCurrentTime()"
98
+ 4. Library executes: handler()
99
+ 5. Handler returns: "1:46:36 PM"
100
+ 6. LLM receives result as "tool output"
101
+ 7. LLM processes: Converts to 24-hour format per system prompt
102
+ 8. LLM responds: "13:46"
103
+ ```
104
+
105
+ ### 6. Register Functions (Line 41)
106
+ ```javascript
107
+ const functions = {getCurrentTime};
108
+ ```
109
+ - Creates object with all available functions
110
+ - Multiple functions: `{getCurrentTime, getWeather, calculate, ...}`
111
+ - LLM can choose which function(s) to call
112
+
113
+ ### 7. Define User Prompt (Line 42)
114
+ ```javascript
115
+ const prompt = `What time is it right now?`;
116
+ ```
117
+ A question that requires using the tool.
118
+
119
+ ### 8. Execute with Functions (Line 45)
120
+ ```javascript
121
+ const a1 = await session.prompt(prompt, {functions});
122
+ console.log("AI: " + a1);
123
+ ```
124
+ - **{functions}** makes tools available to the LLM
125
+ - LLM will automatically call getCurrentTime if needed
126
+ - Response includes tool result processed by LLM
127
+
128
+ ### 9. Debug Prompt Context (Lines 49-55)
129
+ ```javascript
130
+ const promptDebugger = new PromptDebugger({
131
+ outputDir: './logs',
132
+ filename: 'qwen_prompts.txt',
133
+ includeTimestamp: true,
134
+ appendMode: false
135
+ });
136
+ await promptDebugger.debugContextState({session, model});
137
+ ```
138
+
139
+ **What this does:**
140
+ - Saves the entire prompt sent to the model
141
+ - Shows exactly what the LLM sees (including function definitions)
142
+ - Useful for debugging why model does/doesn't call functions
143
+ - Writes to `./logs/qwen_prompts_[timestamp].txt`
144
+
145
+ ### 10. Cleanup (Lines 58-61)
146
+ ```javascript
147
+ session.dispose()
148
+ context.dispose()
149
+ model.dispose()
150
+ llama.dispose()
151
+ ```
152
+ Standard cleanup.
153
+
154
+ ## Key Concepts Demonstrated
155
+
156
+ ### 1. Function Calling (Tool Use)
157
+
158
+ This is what makes it an "agent":
159
+ ```
160
+ Without tools: With tools:
161
+ LLM → Text only LLM → Can take actions
162
+
163
+ Call functions
164
+ Access data
165
+ Execute code
166
+ ```
167
+
168
+ ### 2. Function Definition Pattern
169
+
170
+ ```javascript
171
+ defineChatSessionFunction({
172
+ description: "What the function does", // LLM reads this
173
+ params: { // Expected parameters
174
+ type: "object",
175
+ properties: {
176
+ paramName: {
177
+ type: "string",
178
+ description: "What this param is for"
179
+ }
180
+ },
181
+ required: ["paramName"]
182
+ },
183
+ handler: async (params) => { // Your code
184
+ // Do something with params
185
+ return result;
186
+ }
187
+ });
188
+ ```
189
+
190
+ ### 3. JSON Schema for Parameters
191
+
192
+ Uses standard JSON Schema:
193
+ ```javascript
194
+ // No parameters
195
+ properties: {}
196
+
197
+ // One string parameter
198
+ properties: {
199
+ city: {
200
+ type: "string",
201
+ description: "City name"
202
+ }
203
+ }
204
+
205
+ // Multiple parameters
206
+ properties: {
207
+ a: { type: "number" },
208
+ b: { type: "number" }
209
+ },
210
+ required: ["a", "b"]
211
+ ```
212
+
213
+ ### 4. Agent Decision Making
214
+
215
+ ```
216
+ User: "What time is it?"
217
+
218
+ LLM thinks:
219
+ "I need current time"
220
+ "I see function: getCurrentTime"
221
+ "Description matches what I need"
222
+
223
+ LLM outputs special format:
224
+ {function_call: "getCurrentTime"}
225
+
226
+ Library intercepts and runs handler()
227
+
228
+ Handler returns: "1:46:36 PM"
229
+
230
+ LLM receives: Tool result
231
+
232
+ LLM applies system prompt:
233
+ Convert to 24-hour format
234
+
235
+ Final answer: "13:46"
236
+ ```
237
+
238
+ ## Use Cases
239
+
240
+ ### 1. Information Retrieval
241
+ ```javascript
242
+ const getWeather = defineChatSessionFunction({
243
+ description: "Get weather for a city",
244
+ params: {
245
+ type: "object",
246
+ properties: {
247
+ city: { type: "string" }
248
+ }
249
+ },
250
+ handler: async ({city}) => {
251
+ return await fetchWeather(city);
252
+ }
253
+ });
254
+ ```
255
+
256
+ ### 2. Calculations
257
+ ```javascript
258
+ const calculate = defineChatSessionFunction({
259
+ description: "Perform arithmetic calculation",
260
+ params: {
261
+ type: "object",
262
+ properties: {
263
+ expression: { type: "string" }
264
+ }
265
+ },
266
+ handler: async ({expression}) => {
267
+ return eval(expression); // (Be careful with eval!)
268
+ }
269
+ });
270
+ ```
271
+
272
+ ### 3. Data Access
273
+ ```javascript
274
+ const queryDatabase = defineChatSessionFunction({
275
+ description: "Query user database",
276
+ params: {
277
+ type: "object",
278
+ properties: {
279
+ userId: { type: "string" }
280
+ }
281
+ },
282
+ handler: async ({userId}) => {
283
+ return await db.users.findById(userId);
284
+ }
285
+ });
286
+ ```
287
+
288
+ ### 4. External APIs
289
+ ```javascript
290
+ const searchWeb = defineChatSessionFunction({
291
+ description: "Search the web",
292
+ params: {
293
+ type: "object",
294
+ properties: {
295
+ query: { type: "string" }
296
+ }
297
+ },
298
+ handler: async ({query}) => {
299
+ return await googleSearch(query);
300
+ }
301
+ });
302
+ ```
303
+
304
+ ## Expected Output
305
+
306
+ When run:
307
+ ```
308
+ AI: 13:46
309
+ ```
310
+
311
+ The LLM:
312
+ 1. Called getCurrentTime() internally
313
+ 2. Got "1:46:36 PM"
314
+ 3. Converted to 24-hour format
315
+ 4. Removed seconds
316
+ 5. Returned "13:46"
317
+
318
+ ## Debugging with PromptDebugger
319
+
320
+ The debug output shows the full prompt including function schemas:
321
+ ```
322
+ System: You are a professional chronologist...
323
+
324
+ Functions available:
325
+ - getCurrentTime: Get the current time
326
+ Parameters: (none)
327
+
328
+ User: What time is it right now?
329
+ ```
330
+
331
+ This helps debug:
332
+ - Did the model see the function?
333
+ - Was the description clear?
334
+ - Did parameters match expectations?
335
+
336
+ ## Why This Matters for AI Agents
337
+
338
+ ### Agents = LLMs + Tools
339
+
340
+ ```
341
+ LLM alone: LLM + Tools:
342
+ ├─ Generate text ├─ Generate text
343
+ └─ That's it ├─ Access real data
344
+ ├─ Perform calculations
345
+ ├─ Call APIs
346
+ ├─ Execute actions
347
+ └─ Interact with world
348
+ ```
349
+
350
+ ### Foundation for Complex Agents
351
+
352
+ This simple example is the foundation for:
353
+ - **Research agents**: Search web, read documents
354
+ - **Coding agents**: Run code, check errors
355
+ - **Personal assistants**: Calendar, email, reminders
356
+ - **Analysis agents**: Query databases, compute statistics
357
+
358
+ All start with basic function calling!
359
+
360
+ ## Best Practices
361
+
362
+ 1. **Clear descriptions**: LLM uses these to decide when to call
363
+ 2. **Type safety**: Use JSON Schema properly
364
+ 3. **Error handling**: Handler should catch errors
365
+ 4. **Return strings**: LLM processes text best
366
+ 5. **Keep functions focused**: One clear purpose per function
367
+
368
+ This is the minimum viable agent: one LLM + one tool + proper configuration.
examples/07_simple-agent/CONCEPT.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Function Calling & Tool Use
2
+
3
+ ## Overview
4
+
5
+ Function calling transforms LLMs from text generators into agents that can take actions and interact with the world.
6
+
7
+ ## What Makes an Agent?
8
+
9
+ ```
10
+ Text Generator Agent
11
+ ────────────── ──────
12
+ LLM → Text only LLM + Tools → Can act
13
+ ```
14
+
15
+ **Function calling** lets the LLM invoke predefined functions to access data or perform actions it cannot do alone.
16
+
17
+ ## The Core Idea
18
+
19
+ ```
20
+ User: "What time is it?"
21
+
22
+ LLM thinks: "I need current time"
23
+
24
+ LLM calls: getCurrentTime()
25
+
26
+ Tool returns: "1:46:36 PM"
27
+
28
+ LLM responds: "It's 13:46"
29
+ ```
30
+
31
+ This is agency - the ability to DO, not just SAY.
32
+
33
+ ## How It Works
34
+
35
+ ### 1. Function Definition
36
+ ```javascript
37
+ getCurrentTime = {
38
+ description: "Get the current time",
39
+ handler: () => new Date().toLocaleTimeString()
40
+ }
41
+ ```
42
+
43
+ ### 2. LLM Sees Available Tools
44
+ ```
45
+ Available functions:
46
+ - getCurrentTime: "Get the current time"
47
+ - getWeather: "Get weather for a city"
48
+ - calculate: "Perform math"
49
+ ```
50
+
51
+ ### 3. LLM Decides When to Use
52
+ ```
53
+ "What time?" → getCurrentTime() ✓
54
+ "What's 5+5?" → calculate() ✓
55
+ "Tell a joke" → No tool needed
56
+ ```
57
+
58
+ ## Real-World Applications
59
+
60
+ **Personal Assistant**: Calendar, email, reminders
61
+ **Research Agent**: Web search, document reading
62
+ **Coding Assistant**: File operations, code execution
63
+ **Data Analyst**: Database queries, calculations
64
+
65
+ ## Key Takeaway
66
+
67
+ Function calling is THE feature that enables AI agents. Without it, LLMs can only talk. With it, they can act.
68
+
69
+ This is the foundation of all modern agent systems.
examples/07_simple-agent/simple-agent.js ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
2
+ import {fileURLToPath} from "url";
3
+ import path from "path";
4
+ import {PromptDebugger} from "../../helper/prompt-debugger.js";
5
+
6
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
7
+ const debug = false;
8
+
9
+ const llama = await getLlama({debug});
10
+ const model = await llama.loadModel({
11
+ modelPath: path.join(
12
+ __dirname,
13
+ '..',
14
+ '..',
15
+ 'models',
16
+ 'Qwen3-1.7B-Q8_0.gguf'
17
+ )
18
+ });
19
+ const context = await model.createContext({contextSize: 2000});
20
+
21
+ const systemPrompt = `You are a professional chronologist who standardizes time representations across different systems.
22
+
23
+ Always convert times from 12-hour format (e.g., "1:46:36 PM") to 24-hour format (e.g., "13:46") without seconds
24
+ before returning them.`;
25
+
26
+ const session = new LlamaChatSession({
27
+ contextSequence: context.getSequence(),
28
+ systemPrompt,
29
+ });
30
+
31
+ const getCurrentTime = defineChatSessionFunction({
32
+ description: "Get the current time",
33
+ params: {
34
+ type: "object",
35
+ properties: {}
36
+ },
37
+ async handler() {
38
+ return new Date().toLocaleTimeString();
39
+ }
40
+ });
41
+
42
+ const functions = {getCurrentTime};
43
+ const prompt = `What time is it right now?`;
44
+
45
+ // Execute the prompt
46
+ const a1 = await session.prompt(prompt, {functions});
47
+ console.log("AI: " + a1);
48
+
49
+ // Debug after the prompt execution
50
+ const promptDebugger = new PromptDebugger({
51
+ outputDir: './logs',
52
+ filename: 'qwen_prompts.txt',
53
+ includeTimestamp: true, // adds timestamp to filename
54
+ appendMode: false // overwrites file each time
55
+ });
56
+ await promptDebugger.debugContextState({session, model});
57
+
58
+ // Clean up
59
+ session.dispose()
60
+ context.dispose()
61
+ model.dispose()
62
+ llama.dispose()
examples/08_simple-agent-with-memory/CODE.md ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: simple-agent-with-memory.js
2
+
3
+ This example extends the simple agent with **persistent memory**, enabling it to remember information across sessions while intelligently avoiding duplicate saves.
4
+
5
+ ## Key Components
6
+
7
+ ### 1. MemoryManager Import
8
+ ```javascript
9
+ import {MemoryManager} from "./memory-manager.js";
10
+ ```
11
+ Custom class for persisting agent memories to JSON files with unified memory storage.
12
+
13
+ ### 2. Initialize Memory Manager
14
+ ```javascript
15
+ const memoryManager = new MemoryManager('./agent-memory.json');
16
+ const memorySummary = await memoryManager.getMemorySummary();
17
+ ```
18
+ - Loads existing memories from file
19
+ - Generates formatted summary for system prompt
20
+ - Handles migration from old memory schemas
21
+
22
+ ### 3. Memory-Aware System Prompt with Reasoning
23
+ ```javascript
24
+ const systemPrompt = `
25
+ You are a helpful assistant with long-term memory.
26
+
27
+ Before calling any function, always follow this reasoning process:
28
+
29
+ 1. **Compare** new user statements against existing memories below.
30
+ 2. **If the same key and value already exist**, do NOT call saveMemory again.
31
+ - Instead, simply acknowledge the known information.
32
+ - Example: if the user says "My name is Malua" and memory already says "user_name: Malua", reply "Yes, I remember your name is Malua."
33
+ 3. **If the user provides an updated value** (e.g., "I actually prefer sushi now"),
34
+ then call saveMemory once to update the value.
35
+ 4. **Only call saveMemory for genuinely new information.**
36
+
37
+ When saving new data, call saveMemory with structured fields:
38
+ - type: "fact" or "preference"
39
+ - key: short descriptive identifier (e.g., "user_name", "favorite_food")
40
+ - value: the specific information (e.g., "Malua", "chinua")
41
+
42
+ Examples:
43
+ saveMemory({ type: "fact", key: "user_name", value: "Malua" })
44
+ saveMemory({ type: "preference", key: "favorite_food", value: "chinua" })
45
+
46
+ ${memorySummary}
47
+ `;
48
+ ```
49
+
50
+ **What this does:**
51
+ - Includes existing memories in the prompt
52
+ - Provides explicit reasoning guidelines to prevent duplicate saves
53
+ - Teaches the agent to compare before saving
54
+ - Instructs when to update vs. acknowledge existing data
55
+
56
+ ### 4. saveMemory Function
57
+ ```javascript
58
+ const saveMemory = defineChatSessionFunction({
59
+ description: "Save important information to long-term memory (user preferences, facts, personal details)",
60
+ params: {
61
+ type: "object",
62
+ properties: {
63
+ type: {
64
+ type: "string",
65
+ enum: ["fact", "preference"]
66
+ },
67
+ key: { type: "string" },
68
+ value: { type: "string" }
69
+ },
70
+ required: ["type", "key", "value"]
71
+ },
72
+ async handler({ type, key, value }) {
73
+ await memoryManager.addMemory({ type, key, value });
74
+ return `Memory saved: ${key} = ${value}`;
75
+ }
76
+ });
77
+ ```
78
+
79
+ **What it does:**
80
+ - Uses structured key-value format for all memories
81
+ - Saves both facts and preferences with the same method
82
+ - Automatically handles duplicates (updates if value changes)
83
+ - Persists to JSON file
84
+ - Returns confirmation message
85
+
86
+ **Parameter Structure:**
87
+ - `type`: Either "fact" or "preference"
88
+ - `key`: Short identifier (e.g., "user_name", "favorite_food")
89
+ - `value`: The actual information (e.g., "Alex", "pizza")
90
+
91
+ ### 5. Example Conversation
92
+ ```javascript
93
+ const prompt1 = "Hi! My name is Alex and I love pizza.";
94
+ const response1 = await session.prompt(prompt1, {functions});
95
+ // Agent calls saveMemory twice:
96
+ // - saveMemory({ type: "fact", key: "user_name", value: "Alex" })
97
+ // - saveMemory({ type: "preference", key: "favorite_food", value: "pizza" })
98
+
99
+ const prompt2 = "What's my favorite food?";
100
+ const response2 = await session.prompt(prompt2, {functions});
101
+ // Agent recalls from memory: "Pizza"
102
+ ```
103
+
104
+ ## How Memory Works
105
+
106
+ ### Flow Diagram
107
+ ```
108
+ Session 1:
109
+ User: "My name is Alex and I love pizza"
110
+
111
+ Agent calls: saveMemory({ type: "fact", key: "user_name", value: "Alex" })
112
+ Agent calls: saveMemory({ type: "preference", key: "favorite_food", value: "pizza" })
113
+
114
+ Saved to: agent-memory.json
115
+
116
+ Session 2 (after restart):
117
+ 1. Load memories from agent-memory.json
118
+ 2. Add to system prompt
119
+ 3. Agent sees: "user_name: Alex" and "favorite_food: pizza"
120
+ 4. Can use this information in responses
121
+
122
+ Session 3:
123
+ User: "My name is Alex"
124
+
125
+ Agent compares: user_name already = "Alex"
126
+
127
+ No function call! Just acknowledges: "Yes, I remember your name is Alex."
128
+ ```
129
+
130
+ ## The MemoryManager Class
131
+
132
+ Located in `memory-manager.js`:
133
+ ```javascript
134
+ class MemoryManager {
135
+ async loadMemories() // Load from JSON (handles schema migration)
136
+ async saveMemories() // Write to JSON
137
+ async addMemory() // Unified method for all memory types
138
+ async getMemorySummary() // Format memories for system prompt
139
+ extractKey() // Helper for migration
140
+ extractValue() // Helper for migration
141
+ }
142
+ ```
143
+
144
+ **Benefits:**
145
+ - Single unified method for all memory types
146
+ - Automatic duplicate detection and prevention
147
+ - Automatic value updates when information changes
148
+
149
+ ## Key Concepts
150
+
151
+ ### 1. Structured Memory Format
152
+ All memories now use a consistent structure:
153
+ ```javascript
154
+ {
155
+ type: "fact" | "preference",
156
+ key: "user_name", // Identifier
157
+ value: "Alex", // The actual data
158
+ source: "user", // Where it came from
159
+ timestamp: "2025-10-29..." // When it was saved/updated
160
+ }
161
+ ```
162
+
163
+ ### 2. Intelligent Duplicate Prevention
164
+ The agent is trained to:
165
+ - **Compare** before saving
166
+ - **Skip** if data is identical
167
+ - **Update** if value changed
168
+ - **Acknowledge** existing memories instead of re-saving
169
+
170
+ ### 3. Persistent State
171
+ - Memories survive script restarts
172
+ - Stored in JSON file with metadata
173
+ - Loaded at startup and injected into prompt
174
+
175
+ ### 4. Memory Integration in System Prompt
176
+ Memories are automatically formatted and injected:
177
+ ```
178
+ === LONG-TERM MEMORY ===
179
+
180
+ Known Facts:
181
+ - user_name: Alex
182
+ - location: Paris
183
+
184
+ User Preferences:
185
+ - favorite_food: pizza
186
+ - preferred_language: French
187
+ ```
188
+
189
+ ## Why This Matters
190
+
191
+ **Without memory:** Agent starts fresh every time, asks same questions repeatedly
192
+
193
+ **With basic memory:** Agent remembers, but may save duplicates wastefully
194
+
195
+ **With smart memory:** Agent remembers AND avoids redundant saves by reasoning first
196
+
197
+ This enables:
198
+ - **Personalized responses** based on user history
199
+ - **Efficient memory usage** (no duplicate entries)
200
+ - **Natural conversations** that feel continuous
201
+ - **Stateful agents** that maintain context
202
+ - **Automatic updates** when information changes
203
+
204
+ ## Expected Output
205
+
206
+ **First run:**
207
+ ```
208
+ User: "Hi! My name is Alex and I love pizza."
209
+ AI: "Nice to meet you, Alex! I've noted that you love pizza."
210
+ [Calls saveMemory twice - new information saved]
211
+ ```
212
+
213
+ **Second run (after restart):**
214
+ ```
215
+ User: "What's my favorite food?"
216
+ AI: "Your favorite food is pizza! You mentioned that you love it."
217
+ [No function calls - recalls from loaded memory]
218
+ ```
219
+
220
+ **Third run (duplicate statement):**
221
+ ```
222
+ User: "My name is Alex."
223
+ AI: "Yes, I remember your name is Alex!"
224
+ [No function call - recognizes duplicate, just acknowledges]
225
+ ```
226
+
227
+ **Fourth run (updated information):**
228
+ ```
229
+ User: "I actually prefer sushi now."
230
+ AI: "Got it! I've updated your favorite food to sushi."
231
+ [Calls saveMemory once - updates existing value]
232
+ ```
233
+
234
+ ## Reasoning Process
235
+
236
+ The system prompt explicitly guides the agent through this decision tree:
237
+ ```
238
+ New user statement
239
+
240
+ Compare to existing memories
241
+
242
+ ├─→ Exact match? → Acknowledge only (no save)
243
+ ├─→ Updated value? → Save to update
244
+ └─→ New information? → Save as new
245
+ ```
246
+
247
+ This reasoning-first approach makes the agent more intelligent and efficient with memory operations!
examples/08_simple-agent-with-memory/CONCEPT.md ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Persistent Memory & State Management
2
+
3
+ ## Overview
4
+
5
+ Adding persistent memory transforms agents from stateless responders into systems that can maintain context and relationships across sessions.
6
+
7
+ ## The Memory Problem
8
+
9
+ ```
10
+ Without Memory With Memory
11
+ ────────────── ─────────────
12
+ Session 1: Session 1:
13
+ "I'm Alex" "I'm Alex" → Saved
14
+ "I love pizza" "I love pizza" → Saved
15
+
16
+ Session 2: Session 2:
17
+ "What's my name?" "What's my name?"
18
+ "I don't know" "Alex!" ✓
19
+ ```
20
+
21
+ ## Architecture
22
+
23
+ ```
24
+ ┌─────────────────────────────────┐
25
+ │ Agent Session │
26
+ ├─────────────────────────────────┤
27
+ │ System Prompt │
28
+ │ + Loaded Memories │
29
+ │ + saveMemory Tool │
30
+ └────────┬────────────────────────┘
31
+
32
+
33
+ ┌─────────────────────────────────┐
34
+ │ Memory Manager │
35
+ ├─────────────────────────────────┤
36
+ │ • Load from storage │
37
+ │ • Save to storage │
38
+ │ • Format for prompt │
39
+ └────────┬────────────────────────┘
40
+
41
+
42
+ ┌─────────────────────────────────┐
43
+ │ Persistent Storage │
44
+ │ (agent-memory.json) │
45
+ └─────────────────────────────────┘
46
+ ```
47
+
48
+ ## How It Works
49
+
50
+ ### 1. Startup
51
+ ```
52
+ 1. Load agent-memory.json
53
+ 2. Extract facts and preferences
54
+ 3. Add to system prompt
55
+ 4. Agent "remembers" past information
56
+ ```
57
+
58
+ ### 2. During Conversation
59
+ ```
60
+ User shares information
61
+
62
+ Agent recognizes important fact
63
+
64
+ Agent calls saveMemory()
65
+
66
+ Saved to JSON file
67
+
68
+ Available in future sessions
69
+ ```
70
+
71
+ ### 3. Memory Types
72
+
73
+ **Facts**: General information
74
+ ```json
75
+ {
76
+ "memories": [
77
+ {
78
+ "type": "fact",
79
+ "key": "user_name",
80
+ "value": "Alex",
81
+ "source": "user",
82
+ "timestamp": "2025-10-29T11:22:57.372Z"
83
+ }
84
+ ]
85
+ }
86
+ ```
87
+
88
+ **Preferences**:
89
+ ```json
90
+ {
91
+ "memories": [
92
+ {
93
+ "type": "preference",
94
+ "key": "favorite_food",
95
+ "value": "pizza",
96
+ "source": "user",
97
+ "timestamp": "2025-10-29T11:22:58.022Z"
98
+ }
99
+ ]
100
+ }
101
+ ```
102
+
103
+ ## Memory Integration Pattern
104
+
105
+ ### System Prompt Enhancement
106
+ ```
107
+ Base Prompt:
108
+ "You are a helpful assistant."
109
+
110
+ Enhanced with Memory:
111
+ "You are a helpful assistant with long-term memory.
112
+
113
+ === LONG-TERM MEMORY ===
114
+ Known Facts:
115
+ - User's name is Alex
116
+ - User loves pizza"
117
+ ```
118
+
119
+ ### Tool-Assisted Saving
120
+ ```
121
+ Agent decides when to save:
122
+ User: "My favorite color is blue"
123
+
124
+ Agent: "I should remember this"
125
+
126
+ Calls: saveMemory(type="preference", key="color", content="blue")
127
+ ```
128
+
129
+ ## Real-World Applications
130
+
131
+ **Personal Assistant**
132
+ - Remember appointments, preferences, contacts
133
+ - Personalized responses based on history
134
+
135
+ **Customer Service**
136
+ - Past interactions and issues
137
+ - Customer preferences and context
138
+
139
+ **Learning Tutor**
140
+ - Student progress and weak areas
141
+ - Adapted teaching based on history
142
+
143
+ **Healthcare Assistant**
144
+ - Medical history
145
+ - Medication reminders
146
+ - Health tracking
147
+
148
+ ## Memory Strategies
149
+
150
+ ### 1. Episodic Memory
151
+ Store specific events and conversations:
152
+ ```
153
+ - "On 2025-01-15, user asked about Python"
154
+ - "User struggled with async concepts"
155
+ ```
156
+
157
+ ### 2. Semantic Memory
158
+ Store facts and knowledge:
159
+ ```
160
+ - "User is a software engineer"
161
+ - "User prefers TypeScript over JavaScript"
162
+ ```
163
+
164
+ ### 3. Procedural Memory
165
+ Store how-to information:
166
+ ```
167
+ - "User's workflow: design → code → test"
168
+ - "User's preferred tools: VS Code, Git"
169
+ ```
170
+
171
+ ## Challenges & Solutions
172
+
173
+ ### Challenge 1: Memory Bloat
174
+ **Problem**: Too many memories slow down agent
175
+ **Solution**:
176
+ - Importance scoring
177
+ - Periodic cleanup
178
+ - Summary compression
179
+
180
+ ### Challenge 2: Conflicting Information
181
+ **Problem**: "User likes pizza" vs "User is vegan"
182
+ **Solution**:
183
+ - Timestamps for recency
184
+ - Explicit updates
185
+ - Conflict resolution logic
186
+
187
+ ### Challenge 3: Privacy
188
+ **Problem**: Sensitive information in memory
189
+ **Solution**:
190
+ - Encryption at rest
191
+ - Access controls
192
+ - Expiration policies
193
+
194
+ ## Key Concepts
195
+
196
+ ### 1. Persistence
197
+ Memory survives:
198
+ - Application restarts
199
+ - System reboots
200
+ - Time gaps
201
+
202
+ ### 2. Context Augmentation
203
+ Memories enhance system prompt:
204
+ ```
205
+ Prompt = Base + Memories + User Input
206
+ ```
207
+
208
+ ### 3. Agent-Driven Storage
209
+ Agent decides what to remember:
210
+ ```
211
+ Important? → Save
212
+ Trivial? → Ignore
213
+ ```
214
+
215
+ ## Evolution Path
216
+
217
+ ```
218
+ 1. Stateless → Each interaction independent
219
+ 2. Session memory → Remember during conversation
220
+ 3. Persistent memory → Remember across sessions
221
+ 4. Distributed memory → Share across instances
222
+ 5. Semantic search → Find relevant memories
223
+ ```
224
+
225
+ ## Best Practices
226
+
227
+ 1. **Structure memory**: Use types (facts, preferences, events)
228
+ 2. **Add timestamps**: Know when information was saved
229
+ 3. **Enable updates**: Allow overwriting old information
230
+ 4. **Implement search**: Find relevant memories efficiently
231
+ 5. **Monitor size**: Prevent unbounded growth
232
+
233
+ ## Comparison
234
+
235
+ ```
236
+ Feature Simple Agent Memory Agent
237
+ ─────────────────── ───────────── ──────────────
238
+ Remembers names ✗ ✓
239
+ Recalls preferences ✗ ✓
240
+ Personalization ✗ ✓
241
+ Context continuity ✗ ✓
242
+ Cross-session state ✗ ✓
243
+ ```
244
+
245
+ ## Key Takeaway
246
+
247
+ Memory transforms agents from tools into assistants. They can build relationships, provide personalized experiences, and maintain context over time.
248
+
249
+ This is essential for production AI agent systems.
examples/08_simple-agent-with-memory/agent-memory.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "memories": [
3
+ {
4
+ "type": "fact",
5
+ "key": "user_name",
6
+ "value": "Alex",
7
+ "source": "user",
8
+ "timestamp": "2025-11-05T20:24:58.220Z"
9
+ },
10
+ {
11
+ "type": "preference",
12
+ "key": "favorite_food",
13
+ "value": "pizza",
14
+ "source": "user",
15
+ "timestamp": "2025-11-05T20:24:58.848Z"
16
+ }
17
+ ],
18
+ "conversationHistory": []
19
+ }
examples/08_simple-agent-with-memory/memory-manager.js ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import fs from 'fs/promises';
2
+ import path from 'path';
3
+ import {fileURLToPath} from 'url';
4
+
5
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
6
+
7
+ export class MemoryManager {
8
+ constructor(memoryFileName = './memory.json') {
9
+ this.memoryFilePath = path.resolve(__dirname, memoryFileName);
10
+ }
11
+
12
+ async loadMemories() {
13
+ try {
14
+ const data = await fs.readFile(this.memoryFilePath, 'utf-8');
15
+ const json = JSON.parse(data);
16
+
17
+ // 🔧 Migrate old schema if needed
18
+ if (!json.memories) {
19
+ const upgraded = {memories: [], conversationHistory: []};
20
+
21
+ if (Array.isArray(json.facts)) {
22
+ for (const f of json.facts) {
23
+ upgraded.memories.push({
24
+ type: 'fact',
25
+ key: this.extractKey(f.content),
26
+ value: this.extractValue(f.content),
27
+ source: 'migration',
28
+ timestamp: f.timestamp || new Date().toISOString()
29
+ });
30
+ }
31
+ }
32
+
33
+ if (json.preferences && typeof json.preferences === 'object') {
34
+ for (const [key, val] of Object.entries(json.preferences)) {
35
+ upgraded.memories.push({
36
+ type: 'preference',
37
+ key,
38
+ value: this.extractValue(val),
39
+ source: 'migration',
40
+ timestamp: new Date().toISOString()
41
+ });
42
+ }
43
+ }
44
+
45
+ await this.saveMemories(upgraded);
46
+ return upgraded;
47
+ }
48
+
49
+ if (!Array.isArray(json.memories)) json.memories = [];
50
+ if (!Array.isArray(json.conversationHistory)) json.conversationHistory = [];
51
+
52
+ return json;
53
+ } catch {
54
+ return {memories: [], conversationHistory: []};
55
+ }
56
+ }
57
+
58
+ async saveMemories(memories) {
59
+ await fs.writeFile(this.memoryFilePath, JSON.stringify(memories, null, 2));
60
+ }
61
+
62
+ // Add or update memory without duplicates
63
+ async addMemory({type, key, value, source = 'user'}) {
64
+ const data = await this.loadMemories();
65
+
66
+ // Normalize for comparison
67
+ const normType = type.trim().toLowerCase();
68
+ const normKey = key.trim().toLowerCase();
69
+ const normValue = value.trim();
70
+
71
+ // Check if same key+type already exists
72
+ const existingIndex = data.memories.findIndex(
73
+ m => m.type === normType && m.key.toLowerCase() === normKey
74
+ );
75
+
76
+ if (existingIndex >= 0) {
77
+ const existing = data.memories[existingIndex];
78
+ // Update value if changed
79
+ if (existing.value !== normValue) {
80
+ existing.value = normValue;
81
+ existing.timestamp = new Date().toISOString();
82
+ existing.source = source;
83
+ console.log(`Updated memory: ${normKey} → ${normValue}`);
84
+ } else {
85
+ console.log(`Skipped duplicate memory: ${normKey}`);
86
+ }
87
+ } else {
88
+ // Add new memory
89
+ data.memories.push({
90
+ type: normType,
91
+ key: normKey,
92
+ value: normValue,
93
+ source,
94
+ timestamp: new Date().toISOString()
95
+ });
96
+ console.log(`Added memory: ${normKey} = ${normValue}`);
97
+ }
98
+
99
+ await this.saveMemories(data);
100
+ }
101
+
102
+ async getMemorySummary() {
103
+ const data = await this.loadMemories();
104
+ const facts = Array.isArray(data.memories)
105
+ ? data.memories.filter(m => m.type === 'fact')
106
+ : [];
107
+ const prefs = Array.isArray(data.memories)
108
+ ? data.memories.filter(m => m.type === 'preference')
109
+ : [];
110
+
111
+ let summary = "\n=== LONG-TERM MEMORY ===\n";
112
+
113
+ if (facts.length > 0) {
114
+ summary += "\nKnown Facts:\n";
115
+ for (const f of facts) summary += `- ${f.key}: ${f.value}\n`;
116
+ }
117
+
118
+ if (prefs.length > 0) {
119
+ summary += "\nUser Preferences:\n";
120
+ for (const p of prefs) summary += `- ${p.key}: ${p.value}\n`;
121
+ }
122
+
123
+ return summary;
124
+ }
125
+
126
+ extractKey(content) {
127
+ if (typeof content !== 'string') return 'unknown';
128
+ const [key] = content.split(':').map(s => s.trim());
129
+ return key || 'unknown';
130
+ }
131
+
132
+ extractValue(content) {
133
+ if (typeof content !== 'string') return '';
134
+ const parts = content.split(':').map(s => s.trim());
135
+ return parts.length > 1 ? parts.slice(1).join(':') : content;
136
+ }
137
+ }
examples/08_simple-agent-with-memory/simple-agent-with-memory.js ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
2
+ import {fileURLToPath} from "url";
3
+ import path from "path";
4
+ import {MemoryManager} from "./memory-manager.js";
5
+
6
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
7
+
8
+ const llama = await getLlama({debug: false});
9
+ const model = await llama.loadModel({
10
+ modelPath: path.join(
11
+ __dirname,
12
+ '..',
13
+ '..',
14
+ 'models',
15
+ 'Qwen3-1.7B-Q8_0.gguf'
16
+ )
17
+ });
18
+ const context = await model.createContext({contextSize: 2000});
19
+
20
+ // Initialize memory manager
21
+ const memoryManager = new MemoryManager('./agent-memory.json');
22
+
23
+ // Load existing memories and add to system prompt
24
+ const memorySummary = await memoryManager.getMemorySummary();
25
+
26
+ const systemPrompt = `
27
+ You are a helpful assistant with long-term memory.
28
+
29
+ Before calling any function, always follow this reasoning process:
30
+
31
+ 1. **Compare** new user statements against existing memories below.
32
+ 2. **If the same key and value already exist**, do NOT call saveMemory again.
33
+ - Instead, simply acknowledge the known information.
34
+ - Example: if the user says "My name is Malua" and memory already says "user_name: Malua", reply "Yes, I remember your name is Malua."
35
+ 3. **If the user provides an updated value** (e.g., "I actually prefer sushi now"),
36
+ then call saveMemory once to update the value.
37
+ 4. **Only call saveMemory for genuinely new information.**
38
+
39
+ When saving new data, call saveMemory with structured fields:
40
+ - type: "fact" or "preference"
41
+ - key: short descriptive identifier (e.g., "user_name", "favorite_food")
42
+ - value: the specific information (e.g., "Malua", "chinua")
43
+
44
+ Examples:
45
+ saveMemory({ type: "fact", key: "user_name", value: "Malua" })
46
+ saveMemory({ type: "preference", key: "favorite_food", value: "chinua" })
47
+
48
+ ${memorySummary}
49
+ `;
50
+
51
+ const session = new LlamaChatSession({
52
+ contextSequence: context.getSequence(),
53
+ systemPrompt,
54
+ });
55
+
56
+ // Function to save memories
57
+ const saveMemory = defineChatSessionFunction({
58
+ description: "Save important information to long-term memory (user preferences, facts, personal details)",
59
+ params: {
60
+ type: "object",
61
+ properties: {
62
+ type: {
63
+ type: "string",
64
+ enum: ["fact", "preference"]
65
+ },
66
+ key: {type: "string"},
67
+ value: {type: "string"}
68
+ },
69
+ required: ["type", "key", "value"]
70
+ },
71
+ async handler({type, key, value}) {
72
+ await memoryManager.addMemory({type, key, value});
73
+ return `Memory saved: ${key} = ${value}`;
74
+ }
75
+ });
76
+
77
+ const functions = {saveMemory};
78
+
79
+ // Example conversation
80
+ const prompt1 = "Hi! My name is Alex and I love pizza.";
81
+ const response1 = await session.prompt(prompt1, {functions});
82
+ console.log("AI: " + response1);
83
+
84
+ // Later conversation (even after restarting the script)
85
+ const prompt2 = "What's my favorite food?";
86
+ const response2 = await session.prompt(prompt2, {functions});
87
+ console.log("AI: " + response2);
88
+
89
+ // Clean up
90
+ session.dispose()
91
+ context.dispose()
92
+ model.dispose()
93
+ llama.dispose()
examples/09_react-agent/CODE.md ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: react-agent.js
2
+
3
+ This example implements the **ReAct pattern** (Reasoning + Acting), a powerful approach for multi-step problem-solving with tools.
4
+
5
+ ## What is ReAct?
6
+
7
+ ReAct = **Rea**soning + **Act**ing
8
+
9
+ The agent alternates between:
10
+ 1. **Thinking** (reasoning about what to do)
11
+ 2. **Acting** (using tools)
12
+ 3. **Observing** (seeing tool results)
13
+ 4. Repeat until problem is solved
14
+
15
+ ## Key Components
16
+
17
+ ### 1. ReAct System Prompt (Lines 20-52)
18
+ ```javascript
19
+ const systemPrompt = `You are a mathematical assistant that uses the ReAct approach.
20
+
21
+ CRITICAL: You must follow this EXACT pattern:
22
+
23
+ Thought: [Explain what calculation you need]
24
+ Action: [Call ONE tool]
25
+ Observation: [Wait for result]
26
+ Thought: [Analyze result]
27
+ Action: [Call another tool if needed]
28
+ ...
29
+ Thought: [Once you have all information]
30
+ Answer: [Final answer and STOP]
31
+ ```
32
+
33
+ **Key instructions:**
34
+ - Explicit step-by-step pattern
35
+ - One tool call at a time
36
+ - Continue until final answer
37
+ - Stop after "Answer:"
38
+
39
+ ### 2. Calculator Tools (Lines 60-159)
40
+
41
+ Four basic math operations:
42
+ ```javascript
43
+ const add = defineChatSessionFunction({...});
44
+ const multiply = defineChatSessionFunction({...});
45
+ const subtract = defineChatSessionFunction({...});
46
+ const divide = defineChatSessionFunction({...});
47
+ ```
48
+
49
+ Each tool:
50
+ - Takes two numbers (a, b)
51
+ - Performs operation
52
+ - Logs the call
53
+ - Returns result as string
54
+
55
+ ### 3. ReAct Agent Loop (Lines 164-212)
56
+
57
+ ```javascript
58
+ async function reactAgent(userPrompt, maxIterations = 10) {
59
+ let iteration = 0;
60
+ let fullResponse = "";
61
+
62
+ while (iteration < maxIterations) {
63
+ iteration++;
64
+
65
+ // Prompt the LLM
66
+ const response = await session.prompt(
67
+ iteration === 1 ? userPrompt : "Continue your reasoning.",
68
+ {
69
+ functions,
70
+ maxTokens: 300,
71
+ onTextChunk: (chunk) => {
72
+ process.stdout.write(chunk); // Stream output
73
+ currentChunk += chunk;
74
+ }
75
+ }
76
+ );
77
+
78
+ fullResponse += currentChunk;
79
+
80
+ // Check if final answer reached
81
+ if (response.toLowerCase().includes("answer:")) {
82
+ return fullResponse;
83
+ }
84
+ }
85
+ }
86
+ ```
87
+
88
+ **How it works:**
89
+ 1. Loop up to maxIterations times
90
+ 2. On first iteration: send user's question
91
+ 3. On subsequent iterations: ask to continue
92
+ 4. Stream output in real-time
93
+ 5. Stop when "Answer:" appears
94
+ 6. Return full reasoning trace
95
+
96
+ ### 4. Example Query (Lines 215-220)
97
+
98
+ ```javascript
99
+ const queries = [
100
+ "A store sells 15 items Monday at $8 each, 20 items Tuesday at $8 each,
101
+ 10 items Wednesday at $8 each. What's the average items per day and total revenue?"
102
+ ];
103
+ ```
104
+
105
+ Complex problem requiring multiple calculations:
106
+ - 15 × 8
107
+ - 20 × 8
108
+ - 10 × 8
109
+ - Sum results
110
+ - Calculate average
111
+ - Format answer
112
+
113
+ ## The ReAct Flow
114
+
115
+ ### Example Execution
116
+
117
+ ```
118
+ USER: "A store sells 15 items at $8 each and 20 items at $8 each. Total revenue?"
119
+
120
+ Iteration 1:
121
+ Thought: First I need to calculate 15 × 8
122
+ Action: multiply(15, 8)
123
+ Observation: 120
124
+
125
+ Iteration 2:
126
+ Thought: Now I need to calculate 20 × 8
127
+ Action: multiply(20, 8)
128
+ Observation: 160
129
+
130
+ Iteration 3:
131
+ Thought: Now I need to add both results
132
+ Action: add(120, 160)
133
+ Observation: 280
134
+
135
+ Iteration 4:
136
+ Thought: I have the total revenue
137
+ Answer: The total revenue is $280
138
+ ```
139
+
140
+ **Loop stops** because "Answer:" was detected.
141
+
142
+ ## Why ReAct Works
143
+
144
+ ### Traditional Approach (Fails)
145
+ ```
146
+ User: "Complex math problem"
147
+ LLM: [Tries to calculate in head]
148
+ → Often wrong due to arithmetic errors
149
+ ```
150
+
151
+ ### ReAct Approach (Succeeds)
152
+ ```
153
+ User: "Complex math problem"
154
+ LLM: "I need to calculate X"
155
+ → Calls calculator tool
156
+ → Gets accurate result
157
+ → Uses result for next step
158
+ → Continues until solved
159
+ ```
160
+
161
+ ## Key Concepts
162
+
163
+ ### 1. Explicit Reasoning
164
+ The agent must "show its work":
165
+ ```
166
+ Thought: What do I need to do?
167
+ Action: Do it
168
+ Observation: What happened?
169
+ ```
170
+
171
+ ### 2. Tool Use at Each Step
172
+ ```
173
+ Don't calculate: 15 × 8 = 120 (may be wrong)
174
+ Do calculate: multiply(15, 8) → 120 (always correct)
175
+ ```
176
+
177
+ ### 3. Iterative Problem Solving
178
+ ```
179
+ Complex Problem → Break into steps → Solve each step → Combine results
180
+ ```
181
+
182
+ ### 4. Self-Correction
183
+ Agent can observe bad results and try again:
184
+ ```
185
+ Thought: That doesn't look right
186
+ Action: Let me recalculate
187
+ ```
188
+
189
+ ## Debug Output
190
+
191
+ The code includes PromptDebugger (lines 228-234):
192
+ ```javascript
193
+ const promptDebugger = new PromptDebugger({
194
+ outputDir: './logs',
195
+ filename: 'react_calculator.txt',
196
+ includeTimestamp: true
197
+ });
198
+ await promptDebugger.debugContextState({session, model});
199
+ ```
200
+
201
+ Saves complete prompt history to logs for debugging.
202
+
203
+ ## Expected Output
204
+
205
+ ```
206
+ ========================================================
207
+ USER QUESTION: [Problem statement]
208
+ ========================================================
209
+
210
+ --- Iteration 1 ---
211
+ Thought: First I need to multiply 15 by 8
212
+ Action: multiply(15, 8)
213
+
214
+ 🔧 TOOL CALLED: multiply(15, 8)
215
+ 📊 RESULT: 120
216
+
217
+ Observation: 120
218
+
219
+ --- Iteration 2 ---
220
+ Thought: Now I need to multiply 20 by 8
221
+ Action: multiply(20, 8)
222
+
223
+ 🔧 TOOL CALLED: multiply(20, 8)
224
+ 📊 RESULT: 160
225
+
226
+ ... continues ...
227
+
228
+ --- Iteration N ---
229
+ Thought: I have all the information
230
+ Answer: [Final answer]
231
+
232
+ ========================================================
233
+ FINAL ANSWER REACHED
234
+ ========================================================
235
+ ```
236
+
237
+ ## Why This Matters
238
+
239
+ ### Enables Complex Tasks
240
+ - Multi-step reasoning
241
+ - Accurate calculations
242
+ - Self-correction
243
+ - Transparent process
244
+
245
+ ### Foundation of Modern Agents
246
+ This pattern powers:
247
+ - LangChain agents
248
+ - AutoGPT
249
+ - BabyAGI
250
+ - Most production agent frameworks
251
+
252
+ ### Observable Reasoning
253
+ Unlike "black box" LLMs, you see:
254
+ - What the agent is thinking
255
+ - Which tools it uses
256
+ - Why it makes decisions
257
+ - Where it might fail
258
+
259
+ ## Best Practices
260
+
261
+ 1. **Clear system prompt**: Define exact pattern
262
+ 2. **One tool per action**: Don't combine operations
263
+ 3. **Limit iterations**: Prevent infinite loops
264
+ 4. **Stream output**: Show progress
265
+ 5. **Debug thoroughly**: Use PromptDebugger
266
+
267
+ ## Comparison
268
+
269
+ ```
270
+ Simple Agent vs ReAct Agent
271
+ ────────────────────────────
272
+ Single prompt/response Multi-step iteration
273
+ One tool call (maybe) Multiple tool calls
274
+ No visible reasoning Explicit reasoning
275
+ Works for simple tasks Handles complex problems
276
+ ```
277
+
278
+ This is the state-of-the-art pattern for building capable AI agents!
examples/09_react-agent/CONCEPT.md ADDED
@@ -0,0 +1,372 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: ReAct Pattern for AI Agents
2
+
3
+ ## What is ReAct?
4
+
5
+ **ReAct** (Reasoning + Acting) is a framework that combines:
6
+ - **Reasoning**: Thinking through problems step-by-step
7
+ - **Acting**: Using tools to accomplish subtasks
8
+ - **Observing**: Learning from tool results
9
+
10
+ This creates agents that can solve complex, multi-step problems reliably.
11
+
12
+ ## The Core Pattern
13
+
14
+ ```
15
+ ┌─────────────┐
16
+ │ Problem │
17
+ └──────┬──────┘
18
+
19
+
20
+ ┌─────────────────────────────────────┐
21
+ │ ReAct Loop │
22
+ │ │
23
+ │ ┌──────────────────────────────┐ │
24
+ │ │ 1. THOUGHT │ │
25
+ │ │ "What do I need to do?" │ │
26
+ │ └─────────────┬────────────────┘ │
27
+ │ ▼ │
28
+ │ ┌──────────────────────────────┐ │
29
+ │ │ 2. ACTION │ │
30
+ │ │ Call tool with parameters │ │
31
+ │ └─────────────┬────────────────┘ │
32
+ │ ▼ │
33
+ │ ┌──────────────────────────────┐ │
34
+ │ │ 3. OBSERVATION │ │
35
+ │ │ Receive tool result │ │
36
+ │ └─────────────┬────────────────┘ │
37
+ │ │ │
38
+ │ └──► Repeat or │
39
+ │ Final Answer │
40
+ └─────────────────────────────────────┘
41
+ ```
42
+
43
+ ## Why ReAct Matters
44
+
45
+ ### Traditional LLMs Struggle With:
46
+ 1. **Complex calculations** - arithmetic errors
47
+ 2. **Multi-step problems** - lose track of progress
48
+ 3. **Using tools** - don't know when/how
49
+ 4. **Explaining decisions** - black box reasoning
50
+
51
+ ### ReAct Solves This:
52
+ 1. **Reliable calculations** - delegates to tools
53
+ 2. **Structured progress** - explicit steps
54
+ 3. **Tool orchestration** - knows when to use what
55
+ 4. **Transparent reasoning** - visible thought process
56
+
57
+ ## The Three Components
58
+
59
+ ### 1. Thought (Reasoning)
60
+
61
+ The agent reasons about:
62
+ - What information is needed
63
+ - Which tool to use
64
+ - Whether the result makes sense
65
+ - What to do next
66
+
67
+ Example:
68
+ ```
69
+ Thought: I need to calculate 15 × 8 to find revenue
70
+ ```
71
+
72
+ ### 2. Action (Tool Use)
73
+
74
+ The agent calls a tool with specific parameters:
75
+
76
+ Example:
77
+ ```
78
+ Action: multiply(15, 8)
79
+ ```
80
+
81
+ ### 3. Observation (Learning)
82
+
83
+ The agent receives and interprets the tool result:
84
+
85
+ Example:
86
+ ```
87
+ Observation: 120
88
+ ```
89
+
90
+ ## Complete Example
91
+
92
+ ```
93
+ Problem: "If 15 items cost $8 each and 20 items cost $8 each,
94
+ what's the total revenue?"
95
+
96
+ Thought: First I need to calculate revenue from 15 items
97
+ Action: multiply(15, 8)
98
+ Observation: 120
99
+
100
+ Thought: Now I need revenue from 20 items
101
+ Action: multiply(20, 8)
102
+ Observation: 160
103
+
104
+ Thought: Now I add both revenues
105
+ Action: add(120, 160)
106
+ Observation: 280
107
+
108
+ Thought: I have the final answer
109
+ Answer: The total revenue is $280
110
+ ```
111
+
112
+ ## Key Benefits
113
+
114
+ ### 1. Reliability
115
+ - Tools provide accurate results
116
+ - No arithmetic mistakes
117
+ - Verifiable calculations
118
+
119
+ ### 2. Transparency
120
+ - See each reasoning step
121
+ - Understand decision-making
122
+ - Debug easily
123
+
124
+ ### 3. Scalability
125
+ - Handle complex problems
126
+ - Break into manageable steps
127
+ - Add more tools as needed
128
+
129
+ ### 4. Flexibility
130
+ - Works with any tools
131
+ - Adapts to problem complexity
132
+ - Self-corrects when needed
133
+
134
+ ## Comparison with Other Approaches
135
+
136
+ ### Zero-Shot Prompting
137
+ ```
138
+ User: "Calculate 15×8 + 20×8"
139
+ LLM: "The answer is 279" ❌ Wrong!
140
+ ```
141
+ **Problem**: LLM calculates in head, makes errors
142
+
143
+ ### Chain-of-Thought
144
+ ```
145
+ User: "Calculate 15×8 + 20×8"
146
+ LLM: "Let me think step by step:
147
+ 15×8 = 120
148
+ 20×8 = 160
149
+ 120+160 = 279" ❌ Still wrong!
150
+ ```
151
+ **Problem**: Shows work but still miscalculates
152
+
153
+ ### ReAct (This Implementation)
154
+ ```
155
+ User: "Calculate 15×8 + 20×8"
156
+ Agent:
157
+ Thought: Calculate 15×8
158
+ Action: multiply(15, 8)
159
+ Observation: 120
160
+
161
+ Thought: Calculate 20×8
162
+ Action: multiply(20, 8)
163
+ Observation: 160
164
+
165
+ Thought: Add results
166
+ Action: add(120, 160)
167
+ Observation: 280
168
+
169
+ Answer: 280 ✅ Correct!
170
+ ```
171
+ **Success**: Uses tools, gets accurate results
172
+
173
+ ## Architecture Diagram
174
+
175
+ ```
176
+ ┌──────────────────────────────────────┐
177
+ ��� User Question │
178
+ └──────────────┬───────────────────────┘
179
+
180
+
181
+ ┌──────────────────────────────────────┐
182
+ │ LLM with ReAct Prompt │
183
+ │ │
184
+ │ "Think, Act, Observe pattern" │
185
+ └──────┬───────────────────────────────┘
186
+
187
+ ├──► Generates: "Thought: ..."
188
+
189
+ ├──► Generates: "Action: tool(params)"
190
+ │ │
191
+ │ ▼
192
+ │ ┌─────────────────┐
193
+ │ │ Tool Executor │
194
+ │ │ │
195
+ │ │ - multiply() │
196
+ │ │ - add() │
197
+ │ │ - divide() │
198
+ │ │ - subtract() │
199
+ │ └─────────┬───────┘
200
+ │ │
201
+ │ ▼
202
+ └───────── "Observation: result"
203
+
204
+ ├──► Next iteration or Final Answer
205
+
206
+
207
+ ┌──────────────────────────────────────┐
208
+ │ Final Answer │
209
+ └──────────────────────────────────────┘
210
+ ```
211
+
212
+ ## Implementation Strategies
213
+
214
+ ### 1. Explicit Pattern Enforcement
215
+
216
+ Force the LLM to follow structure:
217
+ ```javascript
218
+ systemPrompt: `CRITICAL: Follow this EXACT pattern:
219
+ Thought: [reasoning]
220
+ Action: [tool call]
221
+ Observation: [result]
222
+ ...
223
+ Answer: [final answer]`
224
+ ```
225
+
226
+ ### 2. Iteration Control
227
+
228
+ Prevent infinite loops:
229
+ ```javascript
230
+ maxIterations = 10 // Safety limit
231
+ ```
232
+
233
+ ### 3. Streaming Output
234
+
235
+ Show progress in real-time:
236
+ ```javascript
237
+ onTextChunk: (chunk) => {
238
+ process.stdout.write(chunk);
239
+ }
240
+ ```
241
+
242
+ ### 4. Answer Detection
243
+
244
+ Know when to stop:
245
+ ```javascript
246
+ if (response.includes("Answer:")) {
247
+ return fullResponse; // Done!
248
+ }
249
+ ```
250
+
251
+ ## Real-World Applications
252
+
253
+ ### 1. Math & Science
254
+ - Complex calculations
255
+ - Multi-step derivations
256
+ - Unit conversions
257
+
258
+ ### 2. Data Analysis
259
+ - Query databases
260
+ - Process results
261
+ - Generate reports
262
+
263
+ ### 3. Research Assistants
264
+ - Search multiple sources
265
+ - Synthesize information
266
+ - Cite sources
267
+
268
+ ### 4. Coding Agents
269
+ - Read code
270
+ - Run tests
271
+ - Fix bugs
272
+ - Refactor
273
+
274
+ ### 5. Customer Support
275
+ - Query knowledge base
276
+ - Check order status
277
+ - Process refunds
278
+ - Escalate issues
279
+
280
+ ## Limitations & Considerations
281
+
282
+ ### 1. Iteration Cost
283
+ Each thought/action/observation cycle costs tokens and time.
284
+
285
+ **Solution**: Use efficient models, limit iterations
286
+
287
+ ### 2. Tool Quality
288
+ ReAct is only as good as its tools.
289
+
290
+ **Solution**: Build robust, well-tested tools
291
+
292
+ ### 3. Prompt Engineering
293
+ System prompt must be very clear.
294
+
295
+ **Solution**: Test extensively, iterate on prompt
296
+
297
+ ### 4. Error Handling
298
+ Tools can fail or return unexpected results.
299
+
300
+ **Solution**: Add error handling, validation
301
+
302
+ ## Advanced Patterns
303
+
304
+ ### Self-Correction
305
+ ```
306
+ Thought: That result seems wrong
307
+ Action: verify(previous_result)
308
+ Observation: Error detected
309
+ Thought: Let me recalculate
310
+ Action: multiply(15, 8) # Try again
311
+ ```
312
+
313
+ ### Meta-Reasoning
314
+ ```
315
+ Thought: I've used 5 iterations, I should finish soon
316
+ Action: summarize_progress()
317
+ Observation: Still need to add final numbers
318
+ Thought: One more step should do it
319
+ ```
320
+
321
+ ### Dynamic Tool Selection
322
+ ```
323
+ Thought: This is a division problem
324
+ Action: divide(10, 2) # Chooses right tool
325
+
326
+ Thought: Now I need to add
327
+ Action: add(5, 3) # Switches tools
328
+ ```
329
+
330
+ ## Research Origins
331
+
332
+ ReAct was introduced in:
333
+ > **"ReAct: Synergizing Reasoning and Acting in Language Models"**
334
+ > Yao et al., 2022
335
+ > Paper: https://arxiv.org/abs/2210.03629
336
+
337
+ Key insight: Combining reasoning traces with task-specific actions creates more powerful agents than either alone.
338
+
339
+ ## Modern Frameworks Using ReAct
340
+
341
+ 1. **LangChain** - AgentExecutor with ReAct
342
+ 2. **AutoGPT** - Autonomous task execution
343
+ 3. **BabyAGI** - Task management system
344
+ 4. **GPT Engineer** - Code generation
345
+ 5. **ChatGPT Plugins** - Tool-using chatbots
346
+
347
+ ## Why Learn This Pattern?
348
+
349
+ ### 1. Foundation of Modern Agents
350
+ Nearly all production agent systems use ReAct or similar patterns.
351
+
352
+ ### 2. Understandable AI
353
+ Unlike black-box models, you see exactly what's happening.
354
+
355
+ ### 3. Extendable
356
+ Easy to add new tools and capabilities.
357
+
358
+ ### 4. Debuggable
359
+ When things go wrong, you can see where and why.
360
+
361
+ ### 5. Production-Ready
362
+ This pattern scales from demos to real applications.
363
+
364
+ ## Summary
365
+
366
+ ReAct transforms LLMs from:
367
+ - **Brittle calculators** → Reliable problem solvers
368
+ - **Black boxes** → Transparent reasoners
369
+ - **Single-shot answerers** → Iterative thinkers
370
+ - **Isolated models** → Tool-using agents
371
+
372
+ It's the bridge between language models and autonomous agents that can actually accomplish complex tasks reliably.
examples/09_react-agent/react-agent.js ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {defineChatSessionFunction, getLlama, LlamaChatSession} from "node-llama-cpp";
2
+ import {fileURLToPath} from "url";
3
+ import path from "path";
4
+ import {PromptDebugger} from "../../helper/prompt-debugger.js";
5
+
6
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
7
+ const debug = false;
8
+
9
+ const llama = await getLlama({debug});
10
+ const model = await llama.loadModel({
11
+ modelPath: path.join(
12
+ __dirname,
13
+ '..',
14
+ '..',
15
+ 'models',
16
+ 'hf_giladgd_gpt-oss-20b.MXFP4.gguf'
17
+ )
18
+ });
19
+ const context = await model.createContext({contextSize: 2000});
20
+
21
+ // ReAct-style system prompt for mathematical reasoning
22
+ const systemPrompt = `You are a mathematical assistant that uses the ReAct (Reasoning + Acting) approach.
23
+
24
+ CRITICAL: You must follow this EXACT pattern for every problem:
25
+
26
+ Thought: [Explain what calculation you need to do next and why]
27
+ Action: [Call ONE tool with specific numbers]
28
+ Observation: [Wait for the tool result]
29
+ Thought: [Analyze the result and decide next step]
30
+ Action: [Call another tool if needed]
31
+ Observation: [Wait for the tool result]
32
+ ... (repeat as many times as needed)
33
+ Thought: [Once you have ALL the information needed to answer the question]
34
+ Answer: [Give the final answer and STOP]
35
+
36
+ RULES:
37
+ 1. Only write "Answer:" when you have the complete final answer to the user's question
38
+ 2. After writing "Answer:", DO NOT continue calculating or thinking
39
+ 3. Break complex problems into the smallest possible steps
40
+ 4. Use tools for ALL calculations - never calculate in your head
41
+ 5. Each Action should call exactly ONE tool
42
+
43
+ EXAMPLE:
44
+ User: "What is 5 + 3, then multiply that by 2?"
45
+
46
+ Thought: First I need to add 5 and 3
47
+ Action: add(5, 3)
48
+ Observation: 8
49
+ Thought: Now I need to multiply that result by 2
50
+ Action: multiply(8, 2)
51
+ Observation: 16
52
+ Thought: I now have the final result
53
+ Answer: 16`;
54
+
55
+ const session = new LlamaChatSession({
56
+ contextSequence: context.getSequence(),
57
+ systemPrompt,
58
+ });
59
+
60
+ // Simple calculator tools that force step-by-step reasoning
61
+ const add = defineChatSessionFunction({
62
+ description: "Add two numbers together",
63
+ params: {
64
+ type: "object",
65
+ properties: {
66
+ a: {
67
+ type: "number",
68
+ description: "First number"
69
+ },
70
+ b: {
71
+ type: "number",
72
+ description: "Second number"
73
+ }
74
+ },
75
+ required: ["a", "b"]
76
+ },
77
+ async handler(params) {
78
+ const result = params.a + params.b;
79
+ console.log(`\n 🔧 TOOL CALLED: add(${params.a}, ${params.b})`);
80
+ console.log(` 📊 RESULT: ${result}\n`);
81
+ return result.toString();
82
+ }
83
+ });
84
+
85
+ const multiply = defineChatSessionFunction({
86
+ description: "Multiply two numbers together",
87
+ params: {
88
+ type: "object",
89
+ properties: {
90
+ a: {
91
+ type: "number",
92
+ description: "First number"
93
+ },
94
+ b: {
95
+ type: "number",
96
+ description: "Second number"
97
+ }
98
+ },
99
+ required: ["a", "b"]
100
+ },
101
+ async handler(params) {
102
+ const result = params.a * params.b;
103
+ console.log(`\n 🔧 TOOL CALLED: multiply(${params.a}, ${params.b})`);
104
+ console.log(` 📊 RESULT: ${result}\n`);
105
+ return result.toString();
106
+ }
107
+ });
108
+
109
+ const subtract = defineChatSessionFunction({
110
+ description: "Subtract second number from first number",
111
+ params: {
112
+ type: "object",
113
+ properties: {
114
+ a: {
115
+ type: "number",
116
+ description: "Number to subtract from"
117
+ },
118
+ b: {
119
+ type: "number",
120
+ description: "Number to subtract"
121
+ }
122
+ },
123
+ required: ["a", "b"]
124
+ },
125
+ async handler(params) {
126
+ const result = params.a - params.b;
127
+ console.log(`\n 🔧 TOOL CALLED: subtract(${params.a}, ${params.b})`);
128
+ console.log(` 📊 RESULT: ${result}\n`);
129
+ return result.toString();
130
+ }
131
+ });
132
+
133
+ const divide = defineChatSessionFunction({
134
+ description: "Divide first number by second number",
135
+ params: {
136
+ type: "object",
137
+ properties: {
138
+ a: {
139
+ type: "number",
140
+ description: "Dividend (number to be divided)"
141
+ },
142
+ b: {
143
+ type: "number",
144
+ description: "Divisor (number to divide by)"
145
+ }
146
+ },
147
+ required: ["a", "b"]
148
+ },
149
+ async handler(params) {
150
+ if (params.b === 0) {
151
+ console.log(`\n 🔧 TOOL CALLED: divide(${params.a}, ${params.b})`);
152
+ console.log(` ❌ ERROR: Division by zero\n`);
153
+ return "Error: Cannot divide by zero";
154
+ }
155
+ const result = params.a / params.b;
156
+ console.log(`\n 🔧 TOOL CALLED: divide(${params.a}, ${params.b})`);
157
+ console.log(` 📊 RESULT: ${result}\n`);
158
+ return result.toString();
159
+ }
160
+ });
161
+
162
+ const functions = {add, multiply, subtract, divide};
163
+
164
+ // ReAct Agent execution loop with proper output handling
165
+ async function reactAgent(userPrompt, maxIterations = 10) {
166
+ console.log("\n" + "=".repeat(70));
167
+ console.log("USER QUESTION:", userPrompt);
168
+ console.log("=".repeat(70) + "\n");
169
+
170
+ let iteration = 0;
171
+ let fullResponse = "";
172
+
173
+ while (iteration < maxIterations) {
174
+ iteration++;
175
+ console.log(`--- Iteration ${iteration} ---`);
176
+
177
+ // Prompt with onTextChunk to capture streaming output
178
+ let currentChunk = "";
179
+ const response = await session.prompt(
180
+ iteration === 1 ? userPrompt : "Continue your reasoning. What's the next step?",
181
+ {
182
+ functions,
183
+ maxTokens: 300,
184
+ onTextChunk: (chunk) => {
185
+ // Print each chunk as it arrives
186
+ process.stdout.write(chunk);
187
+ currentChunk += chunk;
188
+ }
189
+ }
190
+ );
191
+
192
+ console.log(); // New line after streaming
193
+
194
+ fullResponse += currentChunk;
195
+
196
+ // If no output was generated in this iteration, something's wrong
197
+ if (!currentChunk.trim() && !response.trim()) {
198
+ console.log(" (No output generated this iteration)\n");
199
+ }
200
+
201
+ // Check if we have a final answer
202
+ if (response.toLowerCase().includes("answer:") ||
203
+ fullResponse.toLowerCase().includes("answer:")) {
204
+ console.log("\n" + "=".repeat(70));
205
+ console.log("FINAL ANSWER REACHED");
206
+ console.log("=".repeat(70));
207
+ return fullResponse;
208
+ }
209
+ }
210
+
211
+ console.log("\n⚠️ Max iterations reached without final answer");
212
+ return fullResponse || "Could not complete reasoning within iteration limit.";
213
+ }
214
+
215
+ // Test queries that require multi-step reasoning
216
+ const queries = [
217
+ // "If I buy 3 apples at $2 each and 4 oranges at $3 each, how much do I spend in total?",
218
+ // "Calculate: (15 + 7) × 3 - 10",
219
+ //"A pizza costs $20. If 4 friends split it equally, how much does each person pay?",
220
+ "A store sells 15 items on Monday at $8 each, 20 items on Tuesday at $8 each, and 10 items on Wednesday at $8 each. What's the average number of items sold per day, and what's the total revenue?",
221
+ ];
222
+
223
+ for (const query of queries) {
224
+ await reactAgent(query, 3);
225
+ console.log("\n");
226
+ }
227
+
228
+ // Debug
229
+ const promptDebugger = new PromptDebugger({
230
+ outputDir: './logs',
231
+ filename: 'react_calculator.txt',
232
+ includeTimestamp: true,
233
+ appendMode: false
234
+ });
235
+ await promptDebugger.debugContextState({session, model});
236
+
237
+ // Clean up
238
+ session.dispose()
239
+ context.dispose()
240
+ model.dispose()
241
+ llama.dispose()
examples/10_aot-agent/CODE.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code Explanation: aot-agent.js
2
+
3
+ This example demonstrates the **Atom of Thought** prompting pattern using a mathematical calculator as the domain.
4
+
5
+ ## Three-Phase Architecture
6
+
7
+ ### Phase 1: Planning (LLM)
8
+ ```javascript
9
+ async function generatePlan(userPrompt) {
10
+ const grammar = await llama.createGrammarForJsonSchema(planSchema);
11
+ const planText = await session.prompt(userPrompt, { grammar });
12
+ return grammar.parse(planText);
13
+ }
14
+ ```
15
+
16
+ **Key points:**
17
+ - LLM outputs **structured JSON** (enforced by grammar)
18
+ - LLM does NOT execute calculations
19
+ - Each atom represents one operation
20
+ - Dependencies are explicit (`dependsOn` array)
21
+
22
+ **Example output:**
23
+ ```json
24
+ {
25
+ "atoms": [
26
+ {"id": 1, "kind": "tool", "name": "add", "input": {"a": 15, "b": 7}},
27
+ {"id": 2, "kind": "tool", "name": "multiply", "input": {"a": "<result_of_1>", "b": 3}},
28
+ {"id": 3, "kind": "tool", "name": "subtract", "input": {"a": "<result_of_2>", "b": 10}},
29
+ {"id": 4, "kind": "final", "name": "report", "dependsOn": [3]}
30
+ ]
31
+ }
32
+ ```
33
+
34
+ ### Phase 2: Validation (System)
35
+ ```javascript
36
+ function validatePlan(plan) {
37
+ const allowedTools = new Set(Object.keys(tools));
38
+
39
+ for (const atom of plan.atoms) {
40
+ if (ids.has(atom.id)) throw new Error(`Duplicate ID`);
41
+ if (atom.kind === "tool" && !allowedTools.has(atom.name)) {
42
+ throw new Error(`Unknown tool: ${atom.name}`);
43
+ }
44
+ }
45
+ }
46
+ ```
47
+
48
+ **Validates:**
49
+ - No duplicate atom IDs
50
+ - Only allowed tools are referenced
51
+ - Dependencies make sense
52
+ - JSON structure is correct
53
+
54
+ ### Phase 3: Execution (System)
55
+ ```javascript
56
+ function executePlan(plan) {
57
+ const state = {};
58
+
59
+ for (const atom of sortedAtoms) {
60
+ // Resolve dependencies
61
+ let resolvedInput = {};
62
+ for (const [key, value] of Object.entries(atom.input)) {
63
+ if (value.startsWith('<result_of_')) {
64
+ const refId = parseInt(value.match(/\d+/)[0]);
65
+ resolvedInput[key] = state[refId];
66
+ }
67
+ }
68
+
69
+ // Execute
70
+ state[atom.id] = tools[atom.name](resolvedInput.a, resolvedInput.b);
71
+ }
72
+ }
73
+ ```
74
+
75
+ **Key behaviors:**
76
+ - Executes atoms in order (sorted by ID)
77
+ - Resolves `<result_of_N>` references from state
78
+ - Each atom stores its result in `state[atom.id]`
79
+ - Execution is **deterministic** (same plan + same state = same result)
80
+
81
+ ## Why This Matters
82
+
83
+ ### Comparison with ReAct
84
+
85
+ | Aspect | ReAct | Atom of Thought |
86
+ |--------|-------|-----------------|
87
+ | **Planning** | Implicit (in LLM reasoning) | Explicit (JSON structure) |
88
+ | **Execution** | LLM decides next step | System follows plan |
89
+ | **Validation** | None | Before execution |
90
+ | **Debugging** | Hard (trace through text) | Easy (inspect atoms) |
91
+ | **Testing** | Hard (mock LLM) | Easy (test executor) |
92
+ | **Failures** | May hallucinate | Fail at specific atom |
93
+
94
+ ### Benefits
95
+
96
+ 1. **No hidden reasoning**: Every operation is an explicit atom
97
+ 2. **Testable**: Execute plan without LLM involvement
98
+ 3. **Debuggable**: Know exactly which atom failed
99
+ 4. **Auditable**: Plan is a data structure, not text
100
+ 5. **Deterministic**: Same input = same output (given same plan)
101
+
102
+ ## Tool Implementation
103
+
104
+ Tools are **pure functions** with no side effects:
105
+ ```javascript
106
+ const tools = {
107
+ add: (a, b) => {
108
+ const result = a + b;
109
+ console.log(`EXECUTING: add(${a}, ${b}) = ${result}`);
110
+ return result;
111
+ },
112
+ // ... more tools
113
+ };
114
+ ```
115
+
116
+ **Why pure functions?**
117
+ - Easy to test
118
+ - Easy to replay
119
+ - No hidden state
120
+ - Composable
121
+
122
+ ## State Flow
123
+ ```
124
+ User Question
125
+
126
+ [LLM generates plan]
127
+
128
+ {atoms: [...]} ← JSON plan
129
+
130
+ [System validates]
131
+
132
+ Plan valid
133
+
134
+ [System executes atom 1] → state[1] = result
135
+
136
+ [System executes atom 2] → state[2] = result (uses state[1])
137
+
138
+ [System executes atom 3] → state[3] = result (uses state[2])
139
+
140
+ Final Answer
141
+ ```
142
+
143
+ ## Error Handling
144
+ ```javascript
145
+ // Atom validation fails → re-prompt LLM
146
+ validatePlan(plan); // throws if invalid
147
+
148
+ // Tool execution fails → stop at that atom
149
+ if (b === 0) throw new Error("Division by zero");
150
+
151
+ // Dependency missing → clear error message
152
+ if (!(depId in state)) {
153
+ throw new Error(`Atom ${atom.id} depends on incomplete atom ${depId}`);
154
+ }
155
+ ```
156
+
157
+ ## When to Use AoT
158
+
159
+ ✅ **Use AoT when:**
160
+ - Execution must be auditable
161
+ - Failures must be recoverable
162
+ - Multiple steps with dependencies
163
+ - Testing is important
164
+ - Compliance matters
165
+
166
+ ❌ **Don't use AoT when:**
167
+ - Single-step tasks
168
+ - Creative/exploratory tasks
169
+ - Brainstorming
170
+ - Natural conversation
171
+
172
+ ## Extension Ideas
173
+
174
+ 1. **Add compensation atoms** for rollback
175
+ 2. **Add retry logic** per atom
176
+ 3. **Parallelize independent atoms** (atoms with no shared dependencies)
177
+ 4. **Persist plan** for debugging
178
+ 5. **Visualize atom graph** (dependency tree)
examples/10_aot-agent/CONCEPT.md ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Concept: Atom of Thought (AoT) Pattern for AI Agents
2
+
3
+ ## The Core Idea
4
+
5
+ **Atom of Thought = "SQL for Reasoning"**
6
+
7
+ Just as SQL breaks complex data operations into atomic, composable statements, AoT breaks reasoning into minimal, executable steps.
8
+
9
+ ## What is an Atom?
10
+
11
+ An atom is the **smallest unit of reasoning** that:
12
+ 1. Expresses exactly **one** idea
13
+ 2. Can be **validated independently**
14
+ 3. Can be **executed deterministically**
15
+ 4. **Cannot hide** a mistake
16
+
17
+ ### Examples
18
+
19
+ ❌ **Not atomic** (compound statement):
20
+ ```
21
+ "Search for rooms in Graz and filter by capacity"
22
+ ```
23
+
24
+ ✅ **Atomic** (separate steps):
25
+ ```
26
+ 1. Search for rooms in Graz
27
+ 2. Filter rooms by minimum capacity of 30
28
+ ```
29
+
30
+ ## The Three Layers
31
+ ```
32
+ ┌─────────────────────────────────┐
33
+ │ LLM (Planning Layer) │
34
+ │ - Proposes atomic plan │
35
+ │ - Does NOT execute │
36
+ └─────────────────────────────────┘
37
+
38
+ ┌─────────────────────────────────┐
39
+ │ Validator (Safety Layer) │
40
+ │ - Checks plan structure │
41
+ │ - Validates dependencies │
42
+ └─────────────────────────────────┘
43
+
44
+ ┌─────────────────────────────────┐
45
+ │ Executor (Execution Layer) │
46
+ │ - Runs atoms deterministically│
47
+ │ - Manages state │
48
+ └─────────────────────────────────┘
49
+ ```
50
+
51
+ ## Why Separation Matters
52
+
53
+ ### Traditional LLM Approach (ReAct)
54
+ ```
55
+ LLM thinks → LLM acts → LLM thinks → LLM acts
56
+ ```
57
+ **Problem:** Execution logic lives inside the model (black box)
58
+
59
+ ### Atom of Thought Approach
60
+ ```
61
+ LLM plans → System validates → System executes
62
+ ```
63
+ **Benefit:** Execution logic lives in code (white box)
64
+
65
+ ## Mental Model
66
+
67
+ Think of AoT as the difference between:
68
+
69
+ | Cooking | Programming |
70
+ |---------|------------|
71
+ | **Recipe** (AoT plan) | **Algorithm** |
72
+ | "Boil water" | `boilWater()` |
73
+ | "Add pasta" | `addPasta()` |
74
+ | "Cook 8 minutes" | `cook(8)` |
75
+
76
+ vs.
77
+
78
+ | Improvising | Natural Language |
79
+ |-------------|------------------|
80
+ | "Make dinner" | "Figure it out" |
81
+ | (figure it out) | (hallucinate) |
82
+
83
+ ## The Atom Structure
84
+ ```javascript
85
+ {
86
+ "id": 2,
87
+ "kind": "tool", // tool | decision | final
88
+ "name": "multiply", // operation name
89
+ "input": { // explicit inputs
90
+ "a": "<result_of_1>", // reference to previous result
91
+ "b": 3
92
+ },
93
+ "dependsOn": [1] // must wait for atom 1
94
+ }
95
+ ```
96
+
97
+ **Why this structure?**
98
+ - `id`: Establishes order
99
+ - `kind`: Categorizes operation type
100
+ - `name`: References executable function
101
+ - `input`: Makes data flow explicit
102
+ - `dependsOn`: Declares dependencies
103
+
104
+ ## Dependency Graph
105
+
106
+ Atoms form a **directed acyclic graph (DAG)**:
107
+ ```
108
+ ┌─────┐
109
+ │ 1 │ add(15, 7)
110
+ └──┬──┘
111
+
112
+ ┌──▼──┐
113
+ │ 2 │ multiply(result_1, 3)
114
+ └──┬──┘
115
+
116
+ ┌──▼──┐
117
+ │ 3 │ subtract(result_2, 10)
118
+ └──┬──┘
119
+
120
+ ┌──▼──┐
121
+ │ 4 │ final
122
+ └─────┘
123
+ ```
124
+
125
+ **Properties:**
126
+ - Can be executed in topological order
127
+ - Can parallelize independent branches
128
+ - Failures stop at failed node
129
+ - Easy to visualize and debug
130
+
131
+ ## State Management
132
+ ```javascript
133
+ const state = {};
134
+
135
+ // After atom 1
136
+ state[1] = 22; // result of add(15, 7)
137
+
138
+ // After atom 2
139
+ state[2] = 66; // result of multiply(22, 3)
140
+
141
+ // After atom 3
142
+ state[3] = 56; // result of subtract(66, 10)
143
+ ```
144
+
145
+ **State is:**
146
+ - Explicit (key-value map)
147
+ - Immutable per atom (no overwrites)
148
+ - Traceable (full history)
149
+ - Inspectable (debugging)
150
+
151
+ ## Comparison: AoT vs ReAct
152
+
153
+ ### Question: "What is (15 + 7) × 3 - 10?"
154
+
155
+ #### ReAct Output (text):
156
+ ```
157
+ Thought: I need to add 15 and 7 first
158
+ Action: add(15, 7)
159
+ Observation: 22
160
+ Thought: Now multiply by 3
161
+ Action: multiply(22, 3)
162
+ Observation: 66
163
+ Thought: Finally subtract 10
164
+ Action: subtract(66, 10)
165
+ Observation: 56
166
+ Answer: 56
167
+ ```
168
+
169
+ #### AoT Output (JSON):
170
+ ```json
171
+ {
172
+ "atoms": [
173
+ {"id": 1, "kind": "tool", "name": "add", "input": {"a": 15, "b": 7}},
174
+ {"id": 2, "kind": "tool", "name": "multiply", "input": {"a": "<result_of_1>", "b": 3}, "dependsOn": [1]},
175
+ {"id": 3, "kind": "tool", "name": "subtract", "input": {"a": "<result_of_2>", "b": 10}, "dependsOn": [2]},
176
+ {"id": 4, "kind": "final", "name": "report", "dependsOn": [3]}
177
+ ]
178
+ }
179
+ ```
180
+
181
+ ### Key Differences
182
+
183
+ | Aspect | ReAct | AoT |
184
+ |--------|-------|-----|
185
+ | **Format** | Natural language | Structured data |
186
+ | **Validation** | Impossible | Before execution |
187
+ | **Testing** | Mock entire LLM | Test executor independently |
188
+ | **Debugging** | Read through text | Inspect atom N |
189
+ | **Replay** | Re-run entire conversation | Re-run from any atom |
190
+ | **Audit trail** | Conversational history | Data structure |
191
+
192
+ ## When AoT Shines
193
+
194
+ ### ✅ Perfect for:
195
+ - **Multi-step workflows** (booking, pipelines)
196
+ - **API orchestration** (call A, then B with A's result)
197
+ - **Financial transactions** (auditable, reversible)
198
+ - **Compliance-sensitive systems** (every step logged)
199
+ - **Production agents** (failures must be clean)
200
+
201
+ ### ❌ Not ideal for:
202
+ - **Creative writing**
203
+ - **Open-ended exploration**
204
+ - **Brainstorming**
205
+ - **Single-step queries**
206
+
207
+ ## Real-World Analogy
208
+
209
+ **ReAct is like a chef improvising:**
210
+ - Flexible
211
+ - Creative
212
+ - Hard to replicate exactly
213
+ - Mistakes hidden in process
214
+
215
+ **AoT is like following a recipe:**
216
+ - Repeatable
217
+ - Testable
218
+ - Step X failed? Start from step X-1
219
+ - Every ingredient and action is explicit
220
+
221
+ ## The Hidden Benefit: Debuggability
222
+
223
+ When something goes wrong:
224
+
225
+ **ReAct:**
226
+ ```
227
+ "The model said something weird in iteration 7"
228
+ → Re-read entire conversation
229
+ → Guess where it went wrong
230
+ → Hope it doesn't happen again
231
+ ```
232
+
233
+ **AoT:**
234
+ ```
235
+ "Atom 3 failed with 'Division by zero'"
236
+ → Look at atom 3's inputs
237
+ → Check where those inputs came from (atom 1, 2)
238
+ → Fix tool or add validation
239
+ → Re-run from atom 3
240
+ ```
241
+
242
+ ## Implementation Checklist
243
+
244
+ ✅ **LLM side:**
245
+ - [ ] System prompt enforces JSON output
246
+ - [ ] Grammar constrains to valid schema
247
+ - [ ] Atoms are minimal (one operation each)
248
+ - [ ] Dependencies are explicit
249
+
250
+ ✅ **System side:**
251
+ - [ ] Validator checks tool names
252
+ - [ ] Validator checks dependencies
253
+ - [ ] Executor resolves references
254
+ - [ ] Executor is deterministic
255
+ - [ ] State is immutable
256
+
257
+ ## The Bottom Line
258
+
259
+ **ReAct asks:**
260
+ "What would an intelligent agent say next?"
261
+
262
+ **AoT asks:**
263
+ "What is the minimal, executable plan?"
264
+
265
+ For production systems, you want the second question.
examples/10_aot-agent/aot-agent.js ADDED
@@ -0,0 +1,416 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { getLlama, LlamaChatSession } from "node-llama-cpp";
2
+ import { fileURLToPath } from "url";
3
+ import path from "path";
4
+ import { PromptDebugger } from "../../helper/prompt-debugger.js";
5
+ import { JsonParser } from "../../helper/json-parser.js";
6
+
7
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
8
+ const debug = false;
9
+
10
+ const llama = await getLlama({ debug });
11
+ const model = await llama.loadModel({
12
+ modelPath: path.join(
13
+ __dirname,
14
+ '..',
15
+ '..',
16
+ 'models',
17
+ 'Qwen3-1.7B-Q8_0.gguf'
18
+ )
19
+ });
20
+ const context = await model.createContext({ contextSize: 2000 });
21
+
22
+ // Atom of Thought system prompt - LLM only plans, doesn't execute
23
+ const systemPrompt = `You are a mathematical planning assistant using Atom of Thought methodology.
24
+
25
+ CRITICAL RULES:
26
+ 1. Extract every number from the user's question and put it in the "input" field.
27
+ 2. Each atom expresses EXACTLY ONE operation: add, subtract, multiply, divide.
28
+ 3. NEVER combine operations in one atom. For example, "(5 + 3) × 2" → must be TWO atoms: one for add, one for multiply.
29
+ 4. The "final" atom reports only the result of the last computational atom; it must NOT have its own input. Do not include an "input" field in final atoms.
30
+ 5. Use "<result_of_N>" to reference previous atom results; never invent calculations in the final atom.
31
+ 6. Output ONLY valid JSON matching the schema, with no explanation or extra text.
32
+
33
+ CORRECT EXAMPLE for "What is (15 + 7) × 3 - 10?":
34
+ {
35
+ "atoms": [
36
+ {"id": 1, "kind": "tool", "name": "add", "input": {"a": 15, "b": 7}, "dependsOn": []},
37
+ {"id": 2, "kind": "tool", "name": "multiply", "input": {"a": "<result_of_1>", "b": 3}, "dependsOn": [1]},
38
+ {"id": 3, "kind": "tool", "name": "subtract", "input": {"a": "<result_of_2>", "b": 10}, "dependsOn": [2]},
39
+ {"id": 4, "kind": "final", "name": "report", "dependsOn": [3]}
40
+ ]
41
+ }
42
+
43
+ WRONG EXAMPLES:
44
+ - Empty input: {"input": {}}
45
+ - Missing numbers: {"input": {"a": "<result_of_1>"}}
46
+ - Combined operations: "add then multiply" → must be TWO atoms
47
+ - Final atom with input: {"kind": "final", "input": {"a": 5}} is INVALID
48
+
49
+ Available tools: add, subtract, multiply, divide
50
+ - Each tool requires: {"a": <number or reference>, "b": <number or reference>}
51
+ - kind options: "tool", "decision", "final"
52
+ - dependsOn: array of atom IDs that must complete first
53
+
54
+ Always extract the actual numbers from the question and put them in the input fields! Never combine operations or invent calculations in final atoms.`;
55
+
56
+ // Define JSON schema for plan validation
57
+ const planSchema = {
58
+ type: "object",
59
+ properties: {
60
+ atoms: {
61
+ type: "array",
62
+ items: {
63
+ type: "object",
64
+ properties: {
65
+ id: { type: "number" },
66
+ kind: { enum: ["tool", "decision", "final"] },
67
+ name: { type: "string" },
68
+ input: {
69
+ type: "object",
70
+ properties: {
71
+ a: {
72
+ oneOf: [
73
+ { type: "number" },
74
+ { type: "string", pattern: "^<result_of_\\d+>$" }
75
+ ]
76
+ },
77
+ b: {
78
+ oneOf: [
79
+ { type: "number" },
80
+ { type: "string", pattern: "^<result_of_\\d+>$" }
81
+ ]
82
+ }
83
+ }
84
+ },
85
+ dependsOn: {
86
+ type: "array",
87
+ items: { type: "number" }
88
+ }
89
+ },
90
+ required: ["id", "kind", "name"]
91
+ }
92
+ }
93
+ },
94
+ required: ["atoms"]
95
+ };
96
+
97
+ const session = new LlamaChatSession({
98
+ contextSequence: context.getSequence(),
99
+ systemPrompt,
100
+ });
101
+
102
+ // Tool implementations (pure functions, deterministic)
103
+ const tools = {
104
+ add: (a, b) => {
105
+ const result = a + b;
106
+ console.log(`EXECUTING: add(${a}, ${b}) = ${result}`);
107
+ return result;
108
+ },
109
+
110
+ subtract: (a, b) => {
111
+ const result = a - b;
112
+ console.log(`EXECUTING: subtract(${a}, ${b}) = ${result}`);
113
+ return result;
114
+ },
115
+
116
+ multiply: (a, b) => {
117
+ const result = a * b;
118
+ console.log(`EXECUTING: multiply(${a}, ${b}) = ${result}`);
119
+ return result;
120
+ },
121
+
122
+ divide: (a, b) => {
123
+ if (b === 0) {
124
+ console.log(`ERROR: divide(${a}, ${b}) - Division by zero`);
125
+ throw new Error("Division by zero");
126
+ }
127
+ const result = a / b;
128
+ console.log(`EXECUTING: divide(${a}, ${b}) = ${result}`);
129
+ return result;
130
+ }
131
+ };
132
+
133
+ // Decision handlers (for complex logic)
134
+ const decisions = {
135
+ average: (values) => {
136
+ const sum = values.reduce((acc, v) => acc + v, 0);
137
+ const avg = sum / values.length;
138
+ console.log(`DECISION: average([${values}]) = ${avg}`);
139
+ return avg;
140
+ },
141
+
142
+ chooseCheapest: (values) => {
143
+ const min = Math.min(...values);
144
+ console.log(`DECISION: chooseCheapest([${values}]) = ${min}`);
145
+ return min;
146
+ }
147
+ };
148
+
149
+ // Phase 1: LLM generates atomic plan
150
+ async function generatePlan(userPrompt) {
151
+ console.log("\n" + "=".repeat(70));
152
+ console.log("PHASE 1: PLANNING (LLM generates atomic plan)");
153
+ console.log("=".repeat(70));
154
+ console.log("USER QUESTION:", userPrompt);
155
+ console.log("-".repeat(70) + "\n");
156
+
157
+ const grammar = await llama.createGrammarForJsonSchema(planSchema);
158
+
159
+ // Add reminder about extracting numbers
160
+ const enhancedPrompt = `${userPrompt}
161
+
162
+ Remember: Extract the actual numbers from this question and put them in the input fields!`;
163
+
164
+ const planText = await session.prompt(enhancedPrompt, {
165
+ grammar,
166
+ maxTokens: 1000
167
+ });
168
+
169
+ let plan;
170
+ try {
171
+ // Use the robust JSON parser
172
+ plan = JsonParser.parse(planText, {
173
+ debug: debug,
174
+ expectObject: true,
175
+ repairAttempts: true
176
+ });
177
+
178
+ // Validate the plan structure
179
+ JsonParser.validatePlan(plan, debug);
180
+
181
+ // Pretty print the plan
182
+ if (debug) {
183
+ JsonParser.prettyPrint(plan);
184
+ } else {
185
+ console.log("GENERATED PLAN:");
186
+ console.log(JSON.stringify(plan, null, 2));
187
+ console.log();
188
+ }
189
+ } catch (error) {
190
+ console.error("Failed to parse plan:", error.message);
191
+ console.log("\nRaw LLM output:");
192
+ console.log(planText);
193
+ throw error;
194
+ }
195
+
196
+ return plan;
197
+ }
198
+
199
+ // Phase 2: System validates plan
200
+ function validatePlan(plan) {
201
+ console.log("\n" + "=".repeat(70));
202
+ console.log("PHASE 2: VALIDATION (System checks plan)");
203
+ console.log("=".repeat(70) + "\n");
204
+
205
+ const allowedTools = new Set(Object.keys(tools));
206
+ const allowedDecisions = new Set(Object.keys(decisions));
207
+ const ids = new Set();
208
+
209
+ for (const atom of plan.atoms) {
210
+ // Check for duplicate IDs
211
+ if (ids.has(atom.id)) {
212
+ throw new Error(`Validation failed: Duplicate atom ID ${atom.id}`);
213
+ }
214
+ ids.add(atom.id);
215
+
216
+ // Check tool names
217
+ if (atom.kind === "tool" && !allowedTools.has(atom.name)) {
218
+ throw new Error(`Validation failed: Unknown tool "${atom.name}" in atom ${atom.id}`);
219
+ }
220
+
221
+ // Check decision names
222
+ if (atom.kind === "decision" && !allowedDecisions.has(atom.name)) {
223
+ throw new Error(`Validation failed: Unknown decision "${atom.name}" in atom ${atom.id}`);
224
+ }
225
+
226
+ // NEW: Validate tool inputs have actual values
227
+ if (atom.kind === "tool") {
228
+ if (!atom.input || typeof atom.input !== 'object') {
229
+ throw new Error(
230
+ `Validation failed: Tool atom ${atom.id} (${atom.name}) must have an input object\n` +
231
+ ` Current: ${JSON.stringify(atom.input)}`
232
+ );
233
+ }
234
+
235
+ // Check if a and b are present
236
+ if (atom.input.a === undefined || atom.input.b === undefined) {
237
+ throw new Error(
238
+ `Validation failed: Tool atom ${atom.id} (${atom.name}) missing required parameters\n` +
239
+ ` Expected: {"a": <number or reference>, "b": <number or reference>}\n` +
240
+ ` Current: ${JSON.stringify(atom.input)}\n` +
241
+ ` Tip: The LLM must extract numbers from the user's question`
242
+ );
243
+ }
244
+
245
+ // For first operations, ensure we have concrete numbers (not references)
246
+ if (atom.dependsOn.length === 0) {
247
+ const hasConcreteNumbers =
248
+ (typeof atom.input.a === 'number') &&
249
+ (typeof atom.input.b === 'number');
250
+
251
+ if (!hasConcreteNumbers) {
252
+ throw new Error(
253
+ `Validation failed: First atom ${atom.id} must have concrete numbers\n` +
254
+ ` Expected: {"a": <number>, "b": <number>}\n` +
255
+ ` Current: ${JSON.stringify(atom.input)}\n` +
256
+ ` The LLM failed to extract numbers from the question`
257
+ );
258
+ }
259
+ }
260
+ }
261
+
262
+ // Check dependencies exist
263
+ if (atom.dependsOn) {
264
+ for (const depId of atom.dependsOn) {
265
+ if (!ids.has(depId) && depId < atom.id) {
266
+ console.warn(`Warning: atom ${atom.id} depends on ${depId} which hasn't been validated yet`);
267
+ }
268
+ }
269
+ }
270
+
271
+ console.log(`Atom ${atom.id} (${atom.kind}:${atom.name}) validated`);
272
+ }
273
+
274
+ console.log("\nPlan validation successful\n");
275
+ return true;
276
+ }
277
+
278
+ // Phase 3: System executes plan deterministically
279
+ function executePlan(plan) {
280
+ console.log("\n" + "=".repeat(70));
281
+ console.log("PHASE 3: EXECUTION (System runs atoms)");
282
+ console.log("=".repeat(70) + "\n");
283
+
284
+ const state = {};
285
+ const sortedAtoms = [...plan.atoms].sort((a, b) => a.id - b.id);
286
+
287
+ for (const atom of sortedAtoms) {
288
+ console.log(`\nExecuting atom ${atom.id} (${atom.kind}:${atom.name})`);
289
+
290
+ // Check dependencies
291
+ if (atom.dependsOn && atom.dependsOn.length > 0) {
292
+ const missingDeps = atom.dependsOn.filter(id => !(id in state));
293
+ if (missingDeps.length > 0) {
294
+ throw new Error(`Atom ${atom.id} depends on incomplete atoms: ${missingDeps}`);
295
+ }
296
+ console.log(`Dependencies satisfied: ${atom.dependsOn.join(', ')}`);
297
+ }
298
+
299
+ // Resolve input values (replace <result_of_N> references)
300
+ let resolvedInput = { a: undefined, b: undefined };
301
+ if (atom.input) {
302
+ // Deep clone to avoid mutations
303
+ resolvedInput = JSON.parse(JSON.stringify(atom.input));
304
+
305
+ for (const [key, value] of Object.entries(resolvedInput)) {
306
+ if (typeof value === 'string' && value.startsWith('<result_of_')) {
307
+ const refId = parseInt(value.match(/\d+/)[0]);
308
+
309
+ if (!(refId in state)) {
310
+ throw new Error(
311
+ `Atom ${atom.id} references <result_of_${refId}> but atom ${refId} hasn't executed yet`
312
+ );
313
+ }
314
+
315
+ resolvedInput[key] = state[refId];
316
+ console.log(`Resolved ${key}: ${value} → ${state[refId]}`);
317
+ }
318
+ }
319
+ }
320
+
321
+ // Execute based on kind
322
+ if (atom.kind === "tool") {
323
+ const tool = tools[atom.name];
324
+ if (!tool) {
325
+ throw new Error(`Tool not found: ${atom.name}`);
326
+ }
327
+
328
+ // Show input before execution
329
+ console.log(`Input: a=${resolvedInput.a}, b=${resolvedInput.b}`);
330
+
331
+ // Safety check
332
+ if (resolvedInput.a === undefined || resolvedInput.b === undefined) {
333
+ throw new Error(
334
+ `Cannot execute ${atom.name}: undefined input values\n` +
335
+ ` This means the LLM didn't extract numbers from your question.\n` +
336
+ ` Original input: ${JSON.stringify(atom.input)}`
337
+ );
338
+ }
339
+
340
+ state[atom.id] = tool(resolvedInput.a, resolvedInput.b);
341
+ }
342
+ else if (atom.kind === "decision") {
343
+ const decision = decisions[atom.name];
344
+ if (!decision) {
345
+ throw new Error(`Decision not found: ${atom.name}`);
346
+ }
347
+
348
+ // Collect results from dependencies
349
+ const depResults = atom.dependsOn.map(id => state[id]);
350
+ state[atom.id] = decision(depResults);
351
+ }
352
+ else if (atom.kind === "final") {
353
+ const finalValue = state[atom.dependsOn[0]];
354
+ console.log(`\n FINAL RESULT: ${finalValue}`);
355
+ state[atom.id] = finalValue;
356
+ }
357
+ }
358
+
359
+ return state;
360
+ }
361
+
362
+ // Main AoT Agent execution
363
+ async function aotAgent(userPrompt) {
364
+ try {
365
+ // Phase 1: Plan
366
+ const plan = await generatePlan(userPrompt);
367
+
368
+ // Phase 2: Validate
369
+ validatePlan(plan);
370
+
371
+ // Phase 3: Execute
372
+ const result = executePlan(plan);
373
+
374
+ console.log("\n" + "=".repeat(70));
375
+ console.log("EXECUTION COMPLETE");
376
+ console.log("=".repeat(70));
377
+
378
+ // Find final atom
379
+ const finalAtom = plan.atoms.find(a => a.kind === "final");
380
+ if (finalAtom) {
381
+ console.log(`\nANSWER: ${result[finalAtom.id]}\n`);
382
+ }
383
+
384
+ return result;
385
+ } catch (error) {
386
+ console.error("\nEXECUTION FAILED:", error.message);
387
+ throw error;
388
+ }
389
+ }
390
+
391
+ // Test queries
392
+ const queries = [
393
+ // "What is (15 + 7) multiplied by 3 minus 10?",
394
+ // "A pizza costs 20 dollars. If 4 friends split it equally, how much does each person pay?",
395
+ "Calculate: 100 divided by 5, then add 3, then multiply by 2",
396
+ ];
397
+
398
+ for (const query of queries) {
399
+ await aotAgent(query);
400
+ console.log("\n");
401
+ }
402
+
403
+ // Debug
404
+ const promptDebugger = new PromptDebugger({
405
+ outputDir: './logs',
406
+ filename: 'aot_calculator.txt',
407
+ includeTimestamp: true,
408
+ appendMode: false
409
+ });
410
+ await promptDebugger.debugContextState({ session, model });
411
+
412
+ // Clean up
413
+ session.dispose();
414
+ context.dispose();
415
+ model.dispose();
416
+ llama.dispose();
helper/json-parser.js ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Robust JSON parser for LLM outputs
3
+ * Handles common issues like:
4
+ * - Missing opening/closing braces
5
+ * - Markdown code blocks
6
+ * - Extra text before/after JSON
7
+ * - Escaped quotes
8
+ * - Trailing commas
9
+ */
10
+
11
+ export class JsonParser {
12
+ /**
13
+ * Extract and parse JSON from potentially messy LLM output
14
+ * @param {string} text - Raw text from LLM
15
+ * @param {object} options - Parsing options
16
+ * @returns {object} Parsed JSON object
17
+ */
18
+ static parse(text, options = {}) {
19
+ const {
20
+ debug = false,
21
+ expectArray = false,
22
+ expectObject = true,
23
+ repairAttempts = true
24
+ } = options;
25
+
26
+ if (debug) {
27
+ console.log("\nRAW LLM OUTPUT:");
28
+ console.log("-".repeat(70));
29
+ console.log(text);
30
+ console.log("-".repeat(70) + "\n");
31
+ }
32
+
33
+ // Step 1: Clean the text
34
+ let cleaned = this.cleanText(text, debug);
35
+
36
+ // Step 2: Extract JSON
37
+ let extracted = this.extractJson(cleaned, expectArray, expectObject, debug);
38
+
39
+ // Step 3: Attempt to parse
40
+ try {
41
+ const parsed = JSON.parse(extracted);
42
+ if (debug) console.log("Successfully parsed JSON\n");
43
+ return parsed;
44
+ } catch (firstError) {
45
+ if (debug) {
46
+ console.log("First parse attempt failed:", firstError.message);
47
+ }
48
+
49
+ if (!repairAttempts) {
50
+ throw new Error(`JSON parse failed: ${firstError.message}\n\nExtracted text:\n${extracted}`);
51
+ }
52
+
53
+ // Step 4: Attempt repairs
54
+ return this.attemptRepairs(extracted, debug);
55
+ }
56
+ }
57
+
58
+ /**
59
+ * Clean text from common LLM artifacts
60
+ */
61
+ static cleanText(text, debug = false) {
62
+ let cleaned = text;
63
+
64
+ // Remove markdown code blocks
65
+ cleaned = cleaned.replace(/```json\s*/gi, '');
66
+ cleaned = cleaned.replace(/```\s*/g, '');
67
+
68
+ // Remove common prefixes
69
+ cleaned = cleaned.replace(/^(Here's the plan:|JSON output:|Plan:|Output:)\s*/i, '');
70
+
71
+ // Trim whitespace
72
+ cleaned = cleaned.trim();
73
+
74
+ if (debug && cleaned !== text) {
75
+ console.log("Cleaned text (removed markdown/prefixes)\n");
76
+ }
77
+
78
+ return cleaned;
79
+ }
80
+
81
+ /**
82
+ * Extract JSON from text (handles text before/after JSON)
83
+ */
84
+ static extractJson(text, expectArray = false, expectObject = true, debug = false) {
85
+ // Try to find JSON boundaries
86
+ const startChar = expectArray ? '[' : '{';
87
+ const endChar = expectArray ? ']' : '}';
88
+
89
+ const startIdx = text.indexOf(startChar);
90
+ const lastIdx = text.lastIndexOf(endChar);
91
+
92
+ if (startIdx === -1 || lastIdx === -1 || startIdx >= lastIdx) {
93
+ if (debug) {
94
+ console.log(`Could not find valid ${startChar}...${endChar} boundaries`);
95
+ console.log(`Start index: ${startIdx}, End index: ${lastIdx}`);
96
+ }
97
+
98
+ // Maybe it's missing braces - try to add them
99
+ if (expectObject && !text.trim().startsWith('{')) {
100
+ const withBraces = '{' + text.trim() + '}';
101
+ if (debug) console.log("Added missing opening brace");
102
+ return withBraces;
103
+ }
104
+
105
+ return text;
106
+ }
107
+
108
+ const extracted = text.substring(startIdx, lastIdx + 1);
109
+
110
+ if (debug && extracted !== text) {
111
+ console.log("Extracted JSON from surrounding text:");
112
+ console.log(extracted.substring(0, 100) + (extracted.length > 100 ? '...' : ''));
113
+ console.log();
114
+ }
115
+
116
+ return extracted;
117
+ }
118
+
119
+ /**
120
+ * Attempt various repair strategies
121
+ */
122
+ static attemptRepairs(jsonString, debug = false) {
123
+ const repairs = [
124
+ // Repair 1: Remove trailing commas
125
+ (str) => {
126
+ const fixed = str.replace(/,(\s*[}\]])/g, '$1');
127
+ if (debug && fixed !== str) console.log("Repair 1: Removed trailing commas");
128
+ return fixed;
129
+ },
130
+
131
+ // Repair 2: Fix missing quotes around property names
132
+ (str) => {
133
+ const fixed = str.replace(/([{,]\s*)([a-zA-Z_][a-zA-Z0-9_]*)\s*:/g, '$1"$2":');
134
+ if (debug && fixed !== str) console.log("Repair 2: Added quotes around property names");
135
+ return fixed;
136
+ },
137
+
138
+ // Repair 3: Fix single quotes to double quotes
139
+ (str) => {
140
+ const fixed = str.replace(/'/g, '"');
141
+ if (debug && fixed !== str) console.log("Repair 3: Converted single quotes to double quotes");
142
+ return fixed;
143
+ },
144
+
145
+ // Repair 4: Add missing closing braces
146
+ (str) => {
147
+ const openBraces = (str.match(/{/g) || []).length;
148
+ const closeBraces = (str.match(/}/g) || []).length;
149
+ if (openBraces > closeBraces) {
150
+ const fixed = str + '}'.repeat(openBraces - closeBraces);
151
+ if (debug) console.log(`Repair 4: Added ${openBraces - closeBraces} missing closing brace(s)`);
152
+ return fixed;
153
+ }
154
+ return str;
155
+ },
156
+
157
+ // Repair 5: Add missing closing brackets
158
+ (str) => {
159
+ const openBrackets = (str.match(/\[/g) || []).length;
160
+ const closeBrackets = (str.match(/]/g) || []).length;
161
+ if (openBrackets > closeBrackets) {
162
+ const fixed = str + ']'.repeat(openBrackets - closeBrackets);
163
+ if (debug) console.log(`Repair 5: Added ${openBrackets - closeBrackets} missing closing bracket(s)`);
164
+ return fixed;
165
+ }
166
+ return str;
167
+ },
168
+
169
+ // Repair 6: Fix escaped quotes that shouldn't be escaped
170
+ (str) => {
171
+ const fixed = str.replace(/\\"/g, '"');
172
+ if (debug && fixed !== str) console.log("Repair 6: Fixed escaped quotes");
173
+ return fixed;
174
+ },
175
+
176
+ // Repair 7: Remove control characters
177
+ (str) => {
178
+ // eslint-disable-next-line no-control-regex
179
+ const fixed = str.replace(/[\x00-\x1F\x7F]/g, '');
180
+ if (debug && fixed !== str) console.log("Repair 7: Removed control characters");
181
+ return fixed;
182
+ }
183
+ ];
184
+
185
+ let current = jsonString;
186
+
187
+ // Try each repair in sequence
188
+ for (const repair of repairs) {
189
+ current = repair(current);
190
+ }
191
+
192
+ // Try parsing after all repairs
193
+ try {
194
+ const parsed = JSON.parse(current);
195
+ if (debug) console.log("Successfully parsed after repairs\n");
196
+ return parsed;
197
+ } catch (error) {
198
+ // Last resort: try to extract just the atoms array if it's there
199
+ const atomsMatch = current.match(/"atoms"\s*:\s*(\[[\s\S]*\])/);
200
+ if (atomsMatch) {
201
+ try {
202
+ const atomsOnly = { atoms: JSON.parse(atomsMatch[1]) };
203
+ if (debug) console.log("Extracted and parsed atoms array\n");
204
+ return atomsOnly;
205
+ } catch (innerError) {
206
+ // Fall through to final error
207
+ }
208
+ }
209
+
210
+ // If all repairs fail, throw detailed error
211
+ throw new Error(
212
+ `JSON parse failed after all repair attempts.\n\n` +
213
+ `Original error: ${error.message}\n\n` +
214
+ `Attempted repairs:\n${current.substring(0, 500)}${current.length > 500 ? '...' : ''}\n\n` +
215
+ `Tip: Check if the LLM is following the JSON schema correctly.`
216
+ );
217
+ }
218
+ }
219
+
220
+ /**
221
+ * Validate parsed plan structure
222
+ */
223
+ static validatePlan(plan, debug = false) {
224
+ if (!plan || typeof plan !== 'object') {
225
+ throw new Error('Plan must be an object');
226
+ }
227
+
228
+ if (!Array.isArray(plan.atoms)) {
229
+ throw new Error('Plan must have an "atoms" array');
230
+ }
231
+
232
+ if (plan.atoms.length === 0) {
233
+ throw new Error('Plan must have at least one atom');
234
+ }
235
+
236
+ for (const atom of plan.atoms) {
237
+ if (typeof atom.id !== 'number') {
238
+ throw new Error(`Atom missing or invalid id: ${JSON.stringify(atom)}`);
239
+ }
240
+
241
+ if (!atom.kind || !['tool', 'decision', 'final'].includes(atom.kind)) {
242
+ throw new Error(`Atom ${atom.id} has invalid kind: ${atom.kind}`);
243
+ }
244
+
245
+ if (!atom.name || typeof atom.name !== 'string') {
246
+ throw new Error(`Atom ${atom.id} missing or invalid name`);
247
+ }
248
+
249
+ if (atom.dependsOn && !Array.isArray(atom.dependsOn)) {
250
+ throw new Error(`Atom ${atom.id} dependsOn must be an array`);
251
+ }
252
+ }
253
+
254
+ if (debug) {
255
+ console.log(`Plan structure validated: ${plan.atoms.length} atoms\n`);
256
+ }
257
+
258
+ return true;
259
+ }
260
+
261
+ /**
262
+ * Pretty print plan for debugging
263
+ */
264
+ static prettyPrint(plan) {
265
+ console.log("\nPLAN STRUCTURE:");
266
+ console.log("=".repeat(70));
267
+
268
+ for (const atom of plan.atoms) {
269
+ const deps = atom.dependsOn && atom.dependsOn.length > 0
270
+ ? ` (depends on: ${atom.dependsOn.join(', ')})`
271
+ : '';
272
+
273
+ console.log(` ${atom.id}. [${atom.kind}] ${atom.name}${deps}`);
274
+
275
+ if (atom.input && Object.keys(atom.input).length > 0) {
276
+ console.log(` Input: ${JSON.stringify(atom.input)}`);
277
+ }
278
+ }
279
+
280
+ console.log("=".repeat(70) + "\n");
281
+ }
282
+ }
helper/prompt-debugger.js ADDED
@@ -0,0 +1,350 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import {LlamaText} from "node-llama-cpp";
2
+ import path from "path";
3
+ import fs from "fs/promises";
4
+
5
+ /**
6
+ * Output types for debugging
7
+ */
8
+ const OutputTypes = {
9
+ EXACT_PROMPT: 'exactPrompt',
10
+ CONTEXT_STATE: 'contextState',
11
+ STRUCTURED: 'structured'
12
+ };
13
+
14
+ /**
15
+ * Helper class for debugging and logging LLM prompts
16
+ */
17
+ export class PromptDebugger {
18
+ constructor(options = {}) {
19
+ this.outputDir = options.outputDir || './';
20
+ this.filename = options.filename;
21
+ this.includeTimestamp = options.includeTimestamp ?? false;
22
+ this.appendMode = options.appendMode ?? false;
23
+ // Configure which outputs to include
24
+ this.outputTypes = options.outputTypes || [OutputTypes.EXACT_PROMPT];
25
+ // Ensure outputTypes is always an array
26
+ if (!Array.isArray(this.outputTypes)) {
27
+ this.outputTypes = [this.outputTypes];
28
+ }
29
+ }
30
+
31
+ /**
32
+ * Captures only the exact prompt (user input + system + functions)
33
+ * @param {Object} params
34
+ * @param {Object} params.session - The chat session
35
+ * @param {string} params.prompt - The user prompt
36
+ * @param {string} params.systemPrompt - System prompt (optional)
37
+ * @param {Object} params.functions - Available functions (optional)
38
+ * @returns {Object} The exact prompt data
39
+ */
40
+ captureExactPrompt(params) {
41
+ const { session, prompt, systemPrompt, functions } = params;
42
+
43
+ const chatWrapper = session.chatWrapper;
44
+
45
+ // Build minimal history for exact prompt
46
+ const history = [{ type: 'user', text: prompt }];
47
+
48
+ if (systemPrompt) {
49
+ history.unshift({ type: 'system', text: systemPrompt });
50
+ }
51
+
52
+ // Generate the context state with just the current prompt
53
+ const state = chatWrapper.generateContextState({
54
+ chatHistory: history,
55
+ availableFunctions: functions,
56
+ systemPrompt: systemPrompt
57
+ });
58
+
59
+ const formattedPrompt = state.contextText.toString();
60
+
61
+ return {
62
+ exactPrompt: formattedPrompt,
63
+ timestamp: new Date().toISOString(),
64
+ prompt,
65
+ systemPrompt,
66
+ functions: functions ? Object.keys(functions) : []
67
+ };
68
+ }
69
+
70
+ /**
71
+ * Captures the full context state (includes assistant responses)
72
+ * @param {Object} params
73
+ * @param {Object} params.session - The chat session
74
+ * @param {Object} params.model - The loaded model
75
+ * @returns {Object} The context state data
76
+ */
77
+ captureContextState(params) {
78
+ const { session, model } = params;
79
+
80
+ // Get the actual context from the session after responses
81
+ const contextState = model.detokenize(session.sequence.contextTokens, true);
82
+
83
+ return {
84
+ contextState,
85
+ timestamp: new Date().toISOString(),
86
+ tokenCount: session.sequence.contextTokens.length
87
+ };
88
+ }
89
+
90
+ /**
91
+ * Captures the structured token representation
92
+ * @param {Object} params
93
+ * @param {Object} params.session - The chat session
94
+ * @param {Object} params.model - The loaded model
95
+ * @returns {Object} The structured token data
96
+ */
97
+ captureStructured(params) {
98
+ const { session, model } = params;
99
+
100
+ const structured = LlamaText.fromTokens(model.tokenizer, session.sequence.contextTokens);
101
+
102
+ return {
103
+ structured,
104
+ timestamp: new Date().toISOString(),
105
+ tokenCount: session.sequence.contextTokens.length
106
+ };
107
+ }
108
+
109
+ /**
110
+ * Captures all configured output types
111
+ * @param {Object} params - Contains all possible parameters
112
+ * @returns {Object} Combined captured data based on configuration
113
+ */
114
+ captureAll(params) {
115
+ const result = {
116
+ timestamp: new Date().toISOString()
117
+ };
118
+
119
+ if (this.outputTypes.includes(OutputTypes.EXACT_PROMPT)) {
120
+ const exactData = this.captureExactPrompt(params);
121
+ result.exactPrompt = exactData.exactPrompt;
122
+ result.prompt = exactData.prompt;
123
+ result.systemPrompt = exactData.systemPrompt;
124
+ result.functions = exactData.functions;
125
+ }
126
+
127
+ if (this.outputTypes.includes(OutputTypes.CONTEXT_STATE)) {
128
+ const contextData = this.captureContextState(params);
129
+ result.contextState = contextData.contextState;
130
+ result.contextTokenCount = contextData.tokenCount;
131
+ }
132
+
133
+ if (this.outputTypes.includes(OutputTypes.STRUCTURED)) {
134
+ const structuredData = this.captureStructured(params);
135
+ result.structured = structuredData.structured;
136
+ result.structuredTokenCount = structuredData.tokenCount;
137
+ }
138
+
139
+ return result;
140
+ }
141
+
142
+ /**
143
+ * Formats the captured data based on configuration
144
+ * @param {Object} capturedData - Data from capture methods
145
+ * @returns {string} Formatted output
146
+ */
147
+ formatOutput(capturedData) {
148
+ let output = `\n========== PROMPT DEBUG OUTPUT ==========\n`;
149
+ output += `Timestamp: ${capturedData.timestamp}\n`;
150
+
151
+ if (capturedData.prompt) {
152
+ output += `Original Prompt: ${capturedData.prompt}\n`;
153
+ }
154
+
155
+ if (capturedData.systemPrompt) {
156
+ output += `System Prompt: ${capturedData.systemPrompt.substring(0, 50)}...\n`;
157
+ }
158
+
159
+ if (capturedData.functions && capturedData.functions.length > 0) {
160
+ output += `Functions: ${capturedData.functions.join(', ')}\n`;
161
+ }
162
+
163
+ if (capturedData.exactPrompt) {
164
+ output += `\n=== EXACT PROMPT ===\n`;
165
+ output += capturedData.exactPrompt;
166
+ output += `\n`;
167
+ }
168
+
169
+ if (capturedData.contextState) {
170
+ output += `Token Count: ${capturedData.contextTokenCount || 'N/A'}\n`;
171
+
172
+ output += `\n=== CONTEXT STATE ===\n`;
173
+ output += capturedData.contextState;
174
+ output += `\n`;
175
+ }
176
+
177
+ if (capturedData.structured) {
178
+ output += `\n=== STRUCTURED ===\n`;
179
+ output += `Token Count: ${capturedData.structuredTokenCount || 'N/A'}\n`;
180
+ output += JSON.stringify(capturedData.structured, null, 2);
181
+ output += `\n`;
182
+ }
183
+
184
+ output += `==========================================\n`;
185
+ return output;
186
+ }
187
+
188
+ /**
189
+ * Saves data to file
190
+ * @param {Object} capturedData - Data to save
191
+ * @param {null} customFilename - Optional custom filename
192
+ */
193
+ async saveToFile(capturedData, customFilename = null) {
194
+ const content = this.formatOutput(capturedData);
195
+
196
+ let filename = customFilename || this.filename;
197
+
198
+ if (this.includeTimestamp) {
199
+ const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
200
+ const ext = path.extname(filename);
201
+ const base = path.basename(filename, ext);
202
+ filename = `${base}_${timestamp}${ext}`;
203
+ }
204
+
205
+ const filepath = path.join(this.outputDir, filename);
206
+
207
+ if (this.appendMode) {
208
+ await fs.appendFile(filepath, content, 'utf8');
209
+ } else {
210
+ await fs.writeFile(filepath, content, 'utf8');
211
+ }
212
+
213
+ console.log(`Prompt debug output written to ${filepath}`);
214
+ return filepath;
215
+ }
216
+
217
+ /**
218
+ * Debug exact prompt only - minimal params needed
219
+ * @param {Object} params - session, prompt, systemPrompt (optional), functions (optional)
220
+ * @param customFilename
221
+ */
222
+ async debugExactPrompt(params, customFilename = null) {
223
+ const oldOutputTypes = this.outputTypes;
224
+ this.outputTypes = [OutputTypes.EXACT_PROMPT];
225
+ const capturedData = this.captureAll(params);
226
+ const filepath = await this.saveToFile(capturedData, customFilename);
227
+ this.outputTypes = oldOutputTypes;
228
+ return { capturedData, filepath };
229
+ }
230
+
231
+ /**
232
+ * Debug context state only - needs session and model
233
+ * @param {Object} params - session, model
234
+ * @param customFilename
235
+ */
236
+ async debugContextState(params, customFilename = null) {
237
+ const oldOutputTypes = this.outputTypes;
238
+ this.outputTypes = [OutputTypes.CONTEXT_STATE];
239
+ const capturedData = this.captureAll(params);
240
+ const filepath = await this.saveToFile(capturedData, customFilename);
241
+ this.outputTypes = oldOutputTypes;
242
+ return { capturedData, filepath };
243
+ }
244
+
245
+ /**
246
+ * Debug structured only - needs session and model
247
+ * @param {Object} params - session, model
248
+ * @param customFilename
249
+ */
250
+ async debugStructured(params, customFilename = null) {
251
+ const oldOutputTypes = this.outputTypes;
252
+ this.outputTypes = [OutputTypes.STRUCTURED];
253
+ const capturedData = this.captureAll(params);
254
+ const filepath = await this.saveToFile(capturedData, customFilename);
255
+ this.outputTypes = oldOutputTypes;
256
+ return { capturedData, filepath };
257
+ }
258
+
259
+ /**
260
+ * Debug with configured output types
261
+ * @param {Object} params - All parameters (session, model, prompt, etc.)
262
+ * @param customFilename
263
+ */
264
+ async debug(params, customFilename = null) {
265
+ const capturedData = this.captureAll(params);
266
+ //const filepath = await this.saveToFile(capturedData, customFilename);
267
+ return { capturedData };
268
+ }
269
+
270
+ /**
271
+ * Log to console only
272
+ * @param {Object} params - Parameters based on configured output types
273
+ */
274
+ logToConsole(params) {
275
+ const capturedData = this.captureAll(params);
276
+ console.log(this.formatOutput(capturedData));
277
+ return capturedData;
278
+ }
279
+
280
+ /**
281
+ * Log exact prompt to console
282
+ */
283
+ logExactPrompt(params) {
284
+ const capturedData = this.captureExactPrompt(params);
285
+ console.log(this.formatOutput(capturedData));
286
+ return capturedData;
287
+ }
288
+
289
+ /**
290
+ * Log context state to console
291
+ */
292
+ logContextState(params) {
293
+ const capturedData = this.captureContextState(params);
294
+ console.log(this.formatOutput(capturedData));
295
+ return capturedData;
296
+ }
297
+
298
+ /**
299
+ * Log structured to console
300
+ */
301
+ logStructured(params) {
302
+ const capturedData = this.captureStructured(params);
303
+ console.log(this.formatOutput(capturedData));
304
+ return capturedData;
305
+ }
306
+ }
307
+
308
+ /**
309
+ * Quick function to debug exact prompt only
310
+ */
311
+ async function debugExactPrompt(params, options = {}) {
312
+ const promptDebugger = new PromptDebugger({
313
+ ...options,
314
+ outputTypes: [OutputTypes.EXACT_PROMPT]
315
+ });
316
+ return await promptDebugger.debug(params);
317
+ }
318
+
319
+ /**
320
+ * Quick function to debug context state only
321
+ */
322
+ async function debugContextState(params, options = {}) {
323
+ const promptDebugger = new PromptDebugger({
324
+ ...options,
325
+ outputTypes: [OutputTypes.CONTEXT_STATE]
326
+ });
327
+ return await promptDebugger.debug(params);
328
+ }
329
+
330
+ /**
331
+ * Quick function to debug structured only
332
+ */
333
+ async function debugStructured(params, options = {}) {
334
+ const promptDebugger = new PromptDebugger({
335
+ ...options,
336
+ outputTypes: [OutputTypes.STRUCTURED]
337
+ });
338
+ return await promptDebugger.debug(params);
339
+ }
340
+
341
+ /**
342
+ * Quick function to debug all outputs
343
+ */
344
+ async function debugAll(params, options = {}) {
345
+ const promptDebugger = new PromptDebugger({
346
+ ...options,
347
+ outputTypes: [OutputTypes.EXACT_PROMPT, OutputTypes.CONTEXT_STATE, OutputTypes.STRUCTURED]
348
+ });
349
+ return await promptDebugger.debug(params);
350
+ }
logs/.gitkeep ADDED
File without changes
package-lock.json ADDED
The diff for this file is too large to render. See raw diff
 
package.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "ai-agents",
3
+ "version": "1.0.0",
4
+ "description": "",
5
+ "main": "index.js",
6
+ "scripts": {
7
+ "test": "echo \"Error: no test specified\" && exit 1"
8
+ },
9
+ "type": "module",
10
+ "keywords": [],
11
+ "author": "",
12
+ "license": "ISC",
13
+ "dependencies": {
14
+ "dotenv": "^17.2.3",
15
+ "node-llama-cpp": "^3.14.0",
16
+ "openai": "^6.7.0"
17
+ }
18
+ }
run_classifier.js ADDED
@@ -0,0 +1,349 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Part 1 Capstone Solution: Smart Email Classifier
3
+ *
4
+ * Build an AI system that organizes your inbox by classifying emails into categories.
5
+ *
6
+ * Skills Used:
7
+ * - Runnables for processing pipeline
8
+ * - Messages for structured classification
9
+ * - LLM wrapper for flexible model switching
10
+ * - Context for classification history
11
+ *
12
+ * Difficulty: ⭐⭐☆☆☆
13
+ */
14
+
15
+ import { SystemMessage, HumanMessage, Runnable, LlamaCppLLM } from './src/index.js';
16
+ import { BaseCallback } from './src/utils/callbacks.js';
17
+ import { readFileSync } from 'fs';
18
+
19
+ // ============================================================================
20
+ // EMAIL CLASSIFICATION CATEGORIES
21
+ // ============================================================================
22
+
23
+ const CATEGORIES = {
24
+ SPAM: 'Spam',
25
+ INVOICE: 'Invoice',
26
+ MEETING: 'Meeting Request',
27
+ URGENT: 'Urgent',
28
+ PERSONAL: 'Personal',
29
+ OTHER: 'Other'
30
+ };
31
+
32
+ // ============================================================================
33
+ // Email Parser Runnable
34
+ // ============================================================================
35
+
36
+ /**
37
+ * Parses raw email text into structured format
38
+ *
39
+ * Input: { subject: string, body: string, from: string }
40
+ * Output: { subject, body, from, timestamp }
41
+ */
42
+ class EmailParserRunnable extends Runnable {
43
+ async _call(input, config) {
44
+ // Validate required fields
45
+ if (!input.subject || !input.body || !input.from) {
46
+ throw new Error('Email must have subject, body, and from fields');
47
+ }
48
+
49
+ // Parse and structure the email
50
+ return {
51
+ subject: input.subject.trim(),
52
+ body: input.body.trim(),
53
+ from: input.from.trim(),
54
+ timestamp: new Date().toISOString()
55
+ };
56
+ }
57
+ }
58
+
59
+ // ============================================================================
60
+ // Email Classifier Runnable
61
+ // ============================================================================
62
+
63
+ /**
64
+ * Classifies email using LLM
65
+ *
66
+ * Input: { subject, body, from, timestamp }
67
+ * Output: { ...email, category, confidence, reason }
68
+ */
69
+ class EmailClassifierRunnable extends Runnable {
70
+ constructor(llm) {
71
+ super();
72
+ this.llm = llm;
73
+ }
74
+
75
+ async _call(input, config) {
76
+ // Build the classification prompt
77
+ const messages = this._buildPrompt(input);
78
+
79
+ // Call the LLM
80
+ const response = await this.llm.invoke(messages, config);
81
+
82
+ // Parse the LLM response
83
+ const classification = this._parseClassification(response.content);
84
+
85
+ // Return email with classification
86
+ return {
87
+ ...input,
88
+ category: classification.category,
89
+ confidence: classification.confidence,
90
+ reason: classification.reason
91
+ };
92
+ }
93
+
94
+ _buildPrompt(email) {
95
+ const systemPrompt = new SystemMessage(`You are an email classification assistant. Your task is to classify emails into one of these categories:
96
+
97
+ Categories:
98
+ - Spam: Unsolicited promotional emails, advertisements with excessive punctuation/caps, phishing attempts, scams
99
+ - Invoice: Bills, payment requests, financial documents, receipts
100
+ - Meeting Request: Meeting invitations, calendar requests, scheduling, availability inquiries
101
+ - Urgent: Time-sensitive matters requiring immediate attention, security alerts, critical notifications
102
+ - Personal: Personal correspondence from friends/family (look for personal tone and familiar email addresses)
103
+ - Other: Legitimate newsletters, updates, informational content, everything else that doesn't fit above
104
+
105
+ Important distinctions:
106
+ - Legitimate newsletters (tech updates, subscriptions) should be "Other", not Spam
107
+ - Spam has excessive punctuation (!!!, ALL CAPS), pushy language, or suspicious intent
108
+ - Personal emails have familiar sender addresses and casual tone
109
+
110
+ Respond in this exact JSON format:
111
+ {
112
+ "category": "Category Name",
113
+ "confidence": 0.95,
114
+ "reason": "Brief explanation"
115
+ }
116
+
117
+ Confidence should be between 0 and 1.`);
118
+
119
+ const userPrompt = new HumanMessage(`Classify this email:
120
+
121
+ From: ${email.from}
122
+ Subject: ${email.subject}
123
+ Body: ${email.body}
124
+
125
+ Provide your classification in JSON format.`);
126
+
127
+ return [systemPrompt, userPrompt];
128
+ }
129
+
130
+ _parseClassification(response) {
131
+ try {
132
+ // Try to find JSON in the response
133
+ const jsonMatch = response.match(/\{[\s\S]*\}/);
134
+ if (!jsonMatch) {
135
+ throw new Error('No JSON found in response');
136
+ }
137
+
138
+ const parsed = JSON.parse(jsonMatch[0]);
139
+
140
+ // Validate the parsed response
141
+ if (!parsed.category || parsed.confidence === undefined || !parsed.reason) {
142
+ throw new Error('Invalid classification format');
143
+ }
144
+
145
+ // Ensure confidence is a number between 0 and 1
146
+ const confidence = Math.max(0, Math.min(1, parseFloat(parsed.confidence)));
147
+
148
+ return {
149
+ category: parsed.category,
150
+ confidence: confidence,
151
+ reason: parsed.reason
152
+ };
153
+ } catch (error) {
154
+ // Fallback classification if parsing fails
155
+ console.warn('Failed to parse LLM response, using fallback:', error.message);
156
+ return {
157
+ category: CATEGORIES.OTHER,
158
+ confidence: 0.5,
159
+ reason: 'Failed to parse classification'
160
+ };
161
+ }
162
+ }
163
+ }
164
+
165
+ // ============================================================================
166
+ // Classification History Callback
167
+ // ============================================================================
168
+
169
+ /**
170
+ * Tracks classification history using callbacks
171
+ */
172
+ class ClassificationHistoryCallback extends BaseCallback {
173
+ constructor() {
174
+ super();
175
+ this.history = [];
176
+ }
177
+
178
+ async onEnd(runnable, output, config) {
179
+ // Only track EmailClassifierRunnable results
180
+ if (runnable.name === 'EmailClassifierRunnable' && output.category) {
181
+ this.history.push({
182
+ timestamp: output.timestamp,
183
+ from: output.from,
184
+ subject: output.subject,
185
+ category: output.category,
186
+ confidence: output.confidence,
187
+ reason: output.reason
188
+ });
189
+ }
190
+ }
191
+
192
+ getHistory() {
193
+ return this.history;
194
+ }
195
+
196
+ getStatistics() {
197
+ if (this.history.length === 0) {
198
+ return {
199
+ total: 0,
200
+ byCategory: {},
201
+ averageConfidence: 0
202
+ };
203
+ }
204
+
205
+ // Count by category
206
+ const byCategory = {};
207
+ let totalConfidence = 0;
208
+
209
+ for (const entry of this.history) {
210
+ byCategory[entry.category] = (byCategory[entry.category] || 0) + 1;
211
+ totalConfidence += entry.confidence;
212
+ }
213
+
214
+ return {
215
+ total: this.history.length,
216
+ byCategory: byCategory,
217
+ averageConfidence: totalConfidence / this.history.length
218
+ };
219
+ }
220
+
221
+ printHistory() {
222
+ console.log('\n📧 Classification History:');
223
+ console.log('─'.repeat(70));
224
+
225
+ for (const entry of this.history) {
226
+ console.log(`\n✉️ From: ${entry.from}`);
227
+ console.log(` Subject: ${entry.subject}`);
228
+ console.log(` Category: ${entry.category}`);
229
+ console.log(` Confidence: ${(entry.confidence * 100).toFixed(1)}%`);
230
+ console.log(` Reason: ${entry.reason}`);
231
+ }
232
+ }
233
+
234
+ printStatistics() {
235
+ const stats = this.getStatistics();
236
+
237
+ console.log('\n📊 Classification Statistics:');
238
+ console.log('─'.repeat(70));
239
+ console.log(`Total Emails: ${stats.total}\n`);
240
+
241
+ if (stats.total > 0) {
242
+ console.log('By Category:');
243
+ for (const [category, count] of Object.entries(stats.byCategory)) {
244
+ const percentage = ((count / stats.total) * 100).toFixed(1);
245
+ console.log(` ${category}: ${count} (${percentage}%)`);
246
+ }
247
+
248
+ console.log(`\nAverage Confidence: ${(stats.averageConfidence * 100).toFixed(1)}%`);
249
+ }
250
+ }
251
+ }
252
+
253
+ // ============================================================================
254
+ // Email Classification Pipeline
255
+ // ============================================================================
256
+
257
+ /**
258
+ * Complete pipeline: Parse → Classify → Store
259
+ */
260
+ class EmailClassificationPipeline {
261
+ constructor(llm) {
262
+ this.parser = new EmailParserRunnable();
263
+ this.classifier = new EmailClassifierRunnable(llm);
264
+ this.historyCallback = new ClassificationHistoryCallback();
265
+
266
+ // Build the pipeline: parser -> classifier
267
+ this.pipeline = this.parser.pipe(this.classifier);
268
+ }
269
+
270
+ async classify(email) {
271
+ // Run the email through the pipeline with history callback
272
+ const config = {
273
+ callbacks: [this.historyCallback]
274
+ };
275
+
276
+ return await this.pipeline.invoke(email, config);
277
+ }
278
+
279
+ getHistory() {
280
+ return this.historyCallback.getHistory();
281
+ }
282
+
283
+ getStatistics() {
284
+ return this.historyCallback.getStatistics();
285
+ }
286
+
287
+ printHistory() {
288
+ this.historyCallback.printHistory();
289
+ }
290
+
291
+ printStatistics() {
292
+ this.historyCallback.printStatistics();
293
+ }
294
+ }
295
+
296
+ // ============================================================================
297
+ // TEST DATA
298
+ // ============================================================================
299
+
300
+ const TEST_EMAILS = JSON.parse(
301
+ readFileSync(new URL('./test-emails.json', import.meta.url), 'utf-8')
302
+ );
303
+
304
+ // ============================================================================
305
+ // MAIN FUNCTION
306
+ // ============================================================================
307
+
308
+ async function main() {
309
+ console.log('=== Part 1 Capstone: Smart Email Classifier ===\n');
310
+
311
+ // Initialize the LLM
312
+ const llm = new LlamaCppLLM({
313
+ modelPath: './models/Qwen3-1.7B-Q8_0.gguf', // Adjust to your model
314
+ temperature: 0.1, // Low temperature for consistent classification
315
+ maxTokens: 200
316
+ });
317
+
318
+ // Create the classification pipeline
319
+ const pipeline = new EmailClassificationPipeline(llm);
320
+
321
+ console.log('📬 Processing emails...\n');
322
+
323
+ // Classify each test email
324
+ for (const email of TEST_EMAILS) {
325
+ try {
326
+ const result = await pipeline.classify(email);
327
+
328
+ console.log(`✉️ Email from: ${result.from}`);
329
+ console.log(` Subject: ${result.subject}`);
330
+ console.log(` Category: ${result.category}`);
331
+ console.log(` Confidence: ${(result.confidence * 100).toFixed(1)}%`);
332
+ console.log(` Reason: ${result.reason}\n`);
333
+ } catch (error) {
334
+ console.error(`❌ Error classifying email from ${email.from}:`, error.message);
335
+ }
336
+ }
337
+
338
+ // Print history and statistics
339
+ pipeline.printHistory();
340
+ pipeline.printStatistics();
341
+
342
+ // Cleanup
343
+ await llm.dispose();
344
+
345
+ console.log('\n✓ Capstone Project Complete!');
346
+ }
347
+
348
+ // Run the project
349
+ main().catch(console.error);
secrets.local.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🔐 ANTIGRAVITY SECURE VAULT (LOCAL ONLY)
2
+
3
+ > [!CAUTION]
4
+ > **KHÔNG BAO GIỜ PUSH FILE NÀY LÊN GITHUB.**
5
+ > File này chứa thông tin nhạy cảm. Đã được cấu hình để Git bỏ qua.
6
+
7
+ ## 🔑 GITHUB TOKENS
8
+ - **dahanhstudio**: `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`
9
+ - **NungLon01**: `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`
10
+ - **lenzcomvth**: `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`
11
+
12
+ ---
13
+
14
+ ## ☁️ CLOUDFLARE TOKENS
15
+ - **API Token (Workers)**: `UE6zJ6_3uwlSZbrqhxeJieEnsHO01frIjxRyBlA7`
16
+
17
+
18
+ ##HuggingFace TOKEN :
19
+ - ** lenzcom account ** : `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`
20
+ ---
21
+ *Generated by Antigravity Secure Protocol*
22
+