File size: 3,664 Bytes
e706de2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# Code Explanation: intro.js

This file demonstrates the most basic interaction with a local LLM (Large Language Model) using node-llama-cpp.

## Step-by-Step Code Breakdown

### 1. Import Required Modules
```javascript

import {

    getLlama,

    LlamaChatSession,

} from "node-llama-cpp";

import {fileURLToPath} from "url";

import path from "path";

```
- **getLlama**: Main function to initialize the llama.cpp runtime
- **LlamaChatSession**: Class for managing chat conversations with the model
- **fileURLToPath** and **path**: Standard Node.js modules for handling file paths

### 2. Set Up Directory Path
```javascript

const __dirname = path.dirname(fileURLToPath(import.meta.url));

```
- Since ES modules don't have `__dirname` by default, we create it manually
- This gives us the directory path of the current file
- Needed to locate the model file relative to this script

### 3. Initialize Llama Runtime
```javascript

const llama = await getLlama();

```
- Creates the main llama.cpp instance
- This initializes the underlying C++ runtime for model inference
- Must be done before loading any models

### 4. Load the Model
```javascript

const model = await llama.loadModel({

    modelPath: path.join(

        __dirname,

        "../",

        "models",

        "Qwen3-1.7B-Q8_0.gguf"

    )

});

```
- Loads a quantized model file (GGUF format)
- **Qwen3-1.7B-Q8_0.gguf**: A 1.7 billion parameter model, quantized to 8-bit

- The model is stored in the `models` folder at the repository root

- Loading the model into memory takes a few seconds



### 5. Create a Context

```javascript

const context = await model.createContext();

```

- A **context** represents the model's working memory

- It holds the conversation history and current state

- Has a fixed size limit (default: model's maximum context size)

- All prompts and responses are stored in this context



### 6. Create a Chat Session

```javascript

const session = new LlamaChatSession({

    contextSequence: context.getSequence(),

});

```

- **LlamaChatSession**: High-level API for chat-style interactions

- Uses a sequence from the context to maintain conversation state

- Automatically handles prompt formatting and response parsing



### 7. Define the Prompt

```javascript

const prompt = `do you know node-llama-cpp`;

```

- Simple question to test if the model knows about the library we're using

- This will be sent to the model for processing



### 8. Send Prompt and Get Response

```javascript

const a1 = await session.prompt(prompt);

console.log("AI: " + a1);

```

- **session.prompt()**: Sends the prompt to the model and waits for completion

- The model generates a response based on its training

- We log the response to the console with "AI:" prefix



### 9. Clean Up Resources

```javascript

session.dispose()

context.dispose()

model.dispose()

llama.dispose()

```

- **Important**: Always dispose of resources when done

- Frees up memory and GPU resources

- Prevents memory leaks in long-running applications

- Must be done in this order (session → context → model → llama)



## Key Concepts Demonstrated



1. **Basic LLM initialization**: Loading a model and creating inference context

2. **Simple prompting**: Sending a question and receiving a response

3. **Resource management**: Proper cleanup of allocated resources



## Expected Output



When you run this script, you should see output like:

```

AI: Yes, I'm familiar with node-llama-cpp. It's a Node.js binding for llama.cpp...

```



The exact response will vary based on the model's training data and generation parameters.