# Code Explanation: OpenAI Intro

This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up.

## Requirements

Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method.

### Get API Key

https://platform.openai.com/api-keys

### Add Billing Method

https://platform.openai.com/settings/organization/billing/overview

### Configure environment variables

```bash
   cp .env.example .env
```
Then edit `.env` and add your actual API key.

## Setup and Initialization

```javascript
import OpenAI from 'openai';
import 'dotenv/config';

const client = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
});
```

**What's happening:**
- `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js
- `import 'dotenv/config'` - Load environment variables from `.env` file
- `new OpenAI({...})` - Create a client instance that handles API authentication and requests
- `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!)

**Why it matters:** The client object is your interface to OpenAI's models. All API calls go through this client.

---

## Example 1: Basic Chat Completion

```javascript
const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
        { role: 'user', content: 'What is node-llama-cpp?' }
    ],
});

console.log(response.choices[0].message.content);
```

**What's happening:**
- `chat.completions.create()` - The primary method for sending messages to ChatGPT models
- `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model)
- `messages` array - Contains the conversation history
- `role: 'user'` - Indicates this message comes from the user (you)
- `response.choices[0]` - The API returns an array of possible responses; we take the first one
- `message.content` - The actual text response from the AI

**Response structure:**
```javascript
{
  id: 'chatcmpl-...',
  object: 'chat.completion',
  created: 1234567890,
  model: 'gpt-4o',
  choices: [
    {
      index: 0,
      message: {
        role: 'assistant',
        content: 'node-llama-cpp is a...'
      },
      finish_reason: 'stop'
    }
  ],
  usage: {
    prompt_tokens: 10,
    completion_tokens: 50,
    total_tokens: 60
  }
}
```

---

## Example 2: System Prompts

```javascript
const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
        { role: 'system', content: 'You are a coding assistant that talks like a pirate.' },
        { role: 'user', content: 'Explain what async/await does in JavaScript.' }
    ],
});
```

**What's happening:**
- `role: 'system'` - Special message type that sets the AI's behavior and personality
- System messages are processed first and influence all subsequent responses
- The model will maintain this behavior throughout the conversation

**Why it matters:** System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.).

**Key insight:** Same model + different system prompts = completely different agents!

---

## Example 3: Temperature Control

```javascript
// Focused response
const focusedResponse = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0.2,
});

// Creative response
const creativeResponse = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
    temperature: 1.5,
});
```

**What's happening:**
- `temperature` - Controls randomness in the output (range: 0.0 to 2.0)
- **Low temperature (0.0 - 0.3):**
    - More focused and deterministic
    - Same input → similar output
    - Best for: factual answers, code generation, data extraction
- **Medium temperature (0.7 - 1.0):**
    - Balanced creativity and coherence
    - Default for most use cases
- **High temperature (1.2 - 2.0):**
    - More creative and varied
    - Same input → very different outputs
    - Best for: creative writing, brainstorming, story generation

**Real-world usage:**
- Code completion: temperature 0.2
- Customer support: temperature 0.5
- Creative content: temperature 1.2

---

## Example 4: Conversation Context

```javascript
const messages = [
    { role: 'system', content: 'You are a helpful coding tutor.' },
    { role: 'user', content: 'What is a Promise in JavaScript?' },
];

const response1 = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: messages,
});

// Add AI response to history
messages.push(response1.choices[0].message);

// Add follow-up question
messages.push({ role: 'user', content: 'Can you show me a simple example?' });

// Second request with full context
const response2 = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: messages,
});
```

**What's happening:**
- OpenAI models are **stateless** - they don't remember previous conversations
- We maintain context by sending the entire conversation history with each request
- Each request is independent; you must include all relevant messages

**Message order in the array:**
1. System prompt (optional, but recommended first)
2. Previous user message
3. Previous assistant response
4. Current user message

**Why it matters:** This is how chatbots remember context. The full conversation is sent every time.

**Performance consideration:**
- More messages = more tokens = higher cost
- Longer conversations eventually hit token limits
- Real applications need conversation trimming or summarization strategies

---

## Example 5: Streaming Responses

```javascript
const stream = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
        { role: 'user', content: 'Write a haiku about programming.' }
    ],
    stream: true,  // Enable streaming
});

for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
}
```

**What's happening:**
- `stream: true` - Instead of waiting for the complete response, receive it token-by-token
- `for await...of` - Iterate over the stream as chunks arrive
- `delta.content` - Each chunk contains a small piece of text (often just a word or partial word)
- `process.stdout.write()` - Write without newline to display text progressively

**Streaming vs. Non-streaming:**

**Non-streaming (default):**
```
[Request sent]
[Wait 5 seconds...]
[Full response arrives]
```

**Streaming:**
```
[Request sent]
Once [chunk arrives: "Once"]
upon [chunk arrives: " upon"]
a [chunk arrives: " a"]
time [chunk arrives: " time"]
...
```

**Why it matters:**
- Better user experience (immediate feedback)
- Appears faster even though total time is similar
- Essential for real-time chat interfaces
- Allows early processing/display of partial results

**When to use streaming:**
- Interactive chat applications
- Long-form content generation
- When user experience matters more than simplicity

**When to NOT use streaming:**
- Simple scripts or automation
- When you need the complete response before processing
- Batch processing

---

## Example 6: Token Usage

```javascript
const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
        { role: 'user', content: 'Explain recursion in 3 sentences.' }
    ],
    max_tokens: 100,
});

console.log("Token usage:");
console.log("- Prompt tokens: " + response.usage.prompt_tokens);
console.log("- Completion tokens: " + response.usage.completion_tokens);
console.log("- Total tokens: " + response.usage.total_tokens);
```

**What's happening:**
- `max_tokens` - Limits the length of the AI's response
- `response.usage` - Contains token consumption details
- **Prompt tokens:** Your input (messages you sent)
- **Completion tokens:** AI's output (the response)
- **Total tokens:** Sum of both (what you're billed for)

**Understanding tokens:**
- Tokens ≠ words
- 1 token ≈ 0.75 words (in English)
- "hello" = 1 token
- "chatbot" = 2 tokens ("chat" + "bot")
- Punctuation and spaces count as tokens

**Why it matters:**
1. **Cost control:** You pay per token
2. **Context limits:** Models have maximum token limits (e.g., gpt-4o: 128,000 tokens)
3. **Response control:** Use `max_tokens` to prevent overly long responses

**Practical limits:**
```javascript
// Prevent runaway responses
max_tokens: 150,  // ~100 words

// Brief responses
max_tokens: 50,   // ~35 words

// Longer content
max_tokens: 1000, // ~750 words
```

**Cost estimation (approximate):**
- GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens
- GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens

---

## Example 7: Model Comparison

```javascript
// GPT-4o - Most capable
const gpt4Response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
});

// GPT-3.5-turbo - Faster and cheaper
const gpt35Response = await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: prompt }],
});
```

**Available models:**

| Model | Best For | Speed | Cost | Context Window |
|-------|----------|-------|------|----------------|
| `gpt-4o` | Complex tasks, reasoning, accuracy | Medium | $$$ | 128K tokens |
| `gpt-4o-mini` | Balanced performance/cost | Fast | $$ | 128K tokens |
| `gpt-3.5-turbo` | Simple tasks, high volume | Very Fast | $ | 16K tokens |

**Choosing the right model:**
- **Use GPT-4o when:**
    - Complex reasoning required
    - High accuracy is critical
    - Working with code or technical content
    - Quality > speed/cost

- **Use GPT-4o-mini when:**
    - Need good performance at lower cost
    - Most general-purpose tasks

- **Use GPT-3.5-turbo when:**
    - Simple classification or extraction
    - High-volume, low-complexity tasks
    - Speed is critical
    - Budget constraints

**Pro tip:** Start with gpt-4o for development, then evaluate if cheaper models work for your use case.

---

## Error Handling

```javascript
try {
    await basicCompletion();
} catch (error) {
    console.error("Error:", error.message);
    if (error.message.includes('API key')) {
        console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");
    }
}
```

**Common errors:**
- `401 Unauthorized` - Invalid or missing API key
- `429 Too Many Requests` - Rate limit exceeded
- `500 Internal Server Error` - OpenAI service issue
- `Context length exceeded` - Too many tokens in conversation

**Best practices:**
- Always use try-catch with async calls
- Check error types and provide helpful messages
- Implement retry logic for transient failures
- Monitor token usage to avoid limit errors

---

## Key Takeaways

1. **Stateless Nature:** Models don't remember. You send full context each time.
2. **Message Roles:** `system` (behavior), `user` (input), `assistant` (AI response)
3. **Temperature:** Controls creativity (0 = focused, 2 = creative)
4. **Streaming:** Better UX for real-time applications
5. **Token Management:** Monitor usage for cost and limits
6. **Model Selection:** Choose based on task complexity and budget