File size: 11,633 Bytes
e706de2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# Code Explanation: OpenAI Intro

This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up.

## Requirements

Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method.

### Get API Key

https://platform.openai.com/api-keys

### Add Billing Method

https://platform.openai.com/settings/organization/billing/overview

### Configure environment variables

```bash

   cp .env.example .env

```
Then edit `.env` and add your actual API key.

## Setup and Initialization

```javascript

import OpenAI from 'openai';

import 'dotenv/config';



const client = new OpenAI({

    apiKey: process.env.OPENAI_API_KEY,

});

```

**What's happening:**
- `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js
- `import 'dotenv/config'` - Load environment variables from `.env` file
- `new OpenAI({...})` - Create a client instance that handles API authentication and requests
- `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!)

**Why it matters:** The client object is your interface to OpenAI's models. All API calls go through this client.

---

## Example 1: Basic Chat Completion

```javascript

const response = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [

        { role: 'user', content: 'What is node-llama-cpp?' }

    ],

});



console.log(response.choices[0].message.content);

```

**What's happening:**
- `chat.completions.create()` - The primary method for sending messages to ChatGPT models
- `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model)
- `messages` array - Contains the conversation history
- `role: 'user'` - Indicates this message comes from the user (you)
- `response.choices[0]` - The API returns an array of possible responses; we take the first one
- `message.content` - The actual text response from the AI

**Response structure:**
```javascript

{

  id: 'chatcmpl-...',

  object: 'chat.completion',

  created: 1234567890,

  model: 'gpt-4o',

  choices: [

    {

      index: 0,

      message: {

        role: 'assistant',

        content: 'node-llama-cpp is a...'

      },

      finish_reason: 'stop'

    }

  ],

  usage: {

    prompt_tokens: 10,

    completion_tokens: 50,

    total_tokens: 60

  }

}

```

---

## Example 2: System Prompts

```javascript

const response = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [

        { role: 'system', content: 'You are a coding assistant that talks like a pirate.' },

        { role: 'user', content: 'Explain what async/await does in JavaScript.' }

    ],

});

```

**What's happening:**
- `role: 'system'` - Special message type that sets the AI's behavior and personality
- System messages are processed first and influence all subsequent responses
- The model will maintain this behavior throughout the conversation

**Why it matters:** System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.).

**Key insight:** Same model + different system prompts = completely different agents!

---

## Example 3: Temperature Control

```javascript

// Focused response

const focusedResponse = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [{ role: 'user', content: prompt }],

    temperature: 0.2,

});



// Creative response

const creativeResponse = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [{ role: 'user', content: prompt }],

    temperature: 1.5,

});

```

**What's happening:**
- `temperature` - Controls randomness in the output (range: 0.0 to 2.0)
- **Low temperature (0.0 - 0.3):**
    - More focused and deterministic
    - Same input → similar output
    - Best for: factual answers, code generation, data extraction
- **Medium temperature (0.7 - 1.0):**
    - Balanced creativity and coherence
    - Default for most use cases
- **High temperature (1.2 - 2.0):**
    - More creative and varied
    - Same input → very different outputs
    - Best for: creative writing, brainstorming, story generation

**Real-world usage:**
- Code completion: temperature 0.2
- Customer support: temperature 0.5
- Creative content: temperature 1.2

---

## Example 4: Conversation Context

```javascript

const messages = [

    { role: 'system', content: 'You are a helpful coding tutor.' },

    { role: 'user', content: 'What is a Promise in JavaScript?' },

];



const response1 = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: messages,

});



// Add AI response to history

messages.push(response1.choices[0].message);



// Add follow-up question

messages.push({ role: 'user', content: 'Can you show me a simple example?' });



// Second request with full context

const response2 = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: messages,

});

```

**What's happening:**
- OpenAI models are **stateless** - they don't remember previous conversations
- We maintain context by sending the entire conversation history with each request
- Each request is independent; you must include all relevant messages

**Message order in the array:**
1. System prompt (optional, but recommended first)
2. Previous user message
3. Previous assistant response
4. Current user message

**Why it matters:** This is how chatbots remember context. The full conversation is sent every time.

**Performance consideration:**
- More messages = more tokens = higher cost
- Longer conversations eventually hit token limits
- Real applications need conversation trimming or summarization strategies

---

## Example 5: Streaming Responses

```javascript

const stream = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [

        { role: 'user', content: 'Write a haiku about programming.' }

    ],

    stream: true,  // Enable streaming

});



for await (const chunk of stream) {

    const content = chunk.choices[0]?.delta?.content || '';

    process.stdout.write(content);

}

```

**What's happening:**
- `stream: true` - Instead of waiting for the complete response, receive it token-by-token
- `for await...of` - Iterate over the stream as chunks arrive
- `delta.content` - Each chunk contains a small piece of text (often just a word or partial word)
- `process.stdout.write()` - Write without newline to display text progressively

**Streaming vs. Non-streaming:**

**Non-streaming (default):**
```

[Request sent]

[Wait 5 seconds...]

[Full response arrives]

```

**Streaming:**
```

[Request sent]

Once [chunk arrives: "Once"]

upon [chunk arrives: " upon"]

a [chunk arrives: " a"]

time [chunk arrives: " time"]

...

```

**Why it matters:**
- Better user experience (immediate feedback)
- Appears faster even though total time is similar
- Essential for real-time chat interfaces
- Allows early processing/display of partial results

**When to use streaming:**
- Interactive chat applications
- Long-form content generation
- When user experience matters more than simplicity

**When to NOT use streaming:**
- Simple scripts or automation
- When you need the complete response before processing
- Batch processing

---

## Example 6: Token Usage

```javascript

const response = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [

        { role: 'user', content: 'Explain recursion in 3 sentences.' }

    ],

    max_tokens: 100,

});



console.log("Token usage:");

console.log("- Prompt tokens: " + response.usage.prompt_tokens);

console.log("- Completion tokens: " + response.usage.completion_tokens);

console.log("- Total tokens: " + response.usage.total_tokens);

```

**What's happening:**
- `max_tokens` - Limits the length of the AI's response
- `response.usage` - Contains token consumption details
- **Prompt tokens:** Your input (messages you sent)
- **Completion tokens:** AI's output (the response)
- **Total tokens:** Sum of both (what you're billed for)

**Understanding tokens:**
- Tokens ≠ words
- 1 token ≈ 0.75 words (in English)
- "hello" = 1 token
- "chatbot" = 2 tokens ("chat" + "bot")
- Punctuation and spaces count as tokens

**Why it matters:**
1. **Cost control:** You pay per token
2. **Context limits:** Models have maximum token limits (e.g., gpt-4o: 128,000 tokens)
3. **Response control:** Use `max_tokens` to prevent overly long responses

**Practical limits:**
```javascript

// Prevent runaway responses

max_tokens: 150,  // ~100 words



// Brief responses

max_tokens: 50,   // ~35 words



// Longer content

max_tokens: 1000, // ~750 words

```

**Cost estimation (approximate):**
- GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens
- GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens

---

## Example 7: Model Comparison

```javascript

// GPT-4o - Most capable

const gpt4Response = await client.chat.completions.create({

    model: 'gpt-4o',

    messages: [{ role: 'user', content: prompt }],

});



// GPT-3.5-turbo - Faster and cheaper

const gpt35Response = await client.chat.completions.create({

    model: 'gpt-3.5-turbo',

    messages: [{ role: 'user', content: prompt }],

});

```

**Available models:**

| Model | Best For | Speed | Cost | Context Window |
|-------|----------|-------|------|----------------|
| `gpt-4o` | Complex tasks, reasoning, accuracy | Medium | $$$ | 128K tokens |
| `gpt-4o-mini` | Balanced performance/cost | Fast | $$ | 128K tokens |
| `gpt-3.5-turbo` | Simple tasks, high volume | Very Fast | $ | 16K tokens |

**Choosing the right model:**
- **Use GPT-4o when:**
    - Complex reasoning required
    - High accuracy is critical
    - Working with code or technical content
    - Quality > speed/cost

- **Use GPT-4o-mini when:**
    - Need good performance at lower cost
    - Most general-purpose tasks

- **Use GPT-3.5-turbo when:**
    - Simple classification or extraction
    - High-volume, low-complexity tasks
    - Speed is critical
    - Budget constraints

**Pro tip:** Start with gpt-4o for development, then evaluate if cheaper models work for your use case.

---

## Error Handling

```javascript

try {

    await basicCompletion();

} catch (error) {

    console.error("Error:", error.message);

    if (error.message.includes('API key')) {

        console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");

    }

}

```

**Common errors:**
- `401 Unauthorized` - Invalid or missing API key
- `429 Too Many Requests` - Rate limit exceeded
- `500 Internal Server Error` - OpenAI service issue
- `Context length exceeded` - Too many tokens in conversation

**Best practices:**
- Always use try-catch with async calls
- Check error types and provide helpful messages
- Implement retry logic for transient failures
- Monitor token usage to avoid limit errors

---

## Key Takeaways

1. **Stateless Nature:** Models don't remember. You send full context each time.
2. **Message Roles:** `system` (behavior), `user` (input), `assistant` (AI response)
3. **Temperature:** Controls creativity (0 = focused, 2 = creative)
4. **Streaming:** Better UX for real-time applications
5. **Token Management:** Monitor usage for cost and limits
6. **Model Selection:** Choose based on task complexity and budget