# Code Explanation: OpenAI Intro This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up. ## Requirements Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method. ### Get API Key https://platform.openai.com/api-keys ### Add Billing Method https://platform.openai.com/settings/organization/billing/overview ### Configure environment variables ```bash cp .env.example .env ``` Then edit `.env` and add your actual API key. ## Setup and Initialization ```javascript import OpenAI from 'openai'; import 'dotenv/config'; const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); ``` **What's happening:** - `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js - `import 'dotenv/config'` - Load environment variables from `.env` file - `new OpenAI({...})` - Create a client instance that handles API authentication and requests - `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!) **Why it matters:** The client object is your interface to OpenAI's models. All API calls go through this client. --- ## Example 1: Basic Chat Completion ```javascript const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'user', content: 'What is node-llama-cpp?' } ], }); console.log(response.choices[0].message.content); ``` **What's happening:** - `chat.completions.create()` - The primary method for sending messages to ChatGPT models - `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model) - `messages` array - Contains the conversation history - `role: 'user'` - Indicates this message comes from the user (you) - `response.choices[0]` - The API returns an array of possible responses; we take the first one - `message.content` - The actual text response from the AI **Response structure:** ```javascript { id: 'chatcmpl-...', object: 'chat.completion', created: 1234567890, model: 'gpt-4o', choices: [ { index: 0, message: { role: 'assistant', content: 'node-llama-cpp is a...' }, finish_reason: 'stop' } ], usage: { prompt_tokens: 10, completion_tokens: 50, total_tokens: 60 } } ``` --- ## Example 2: System Prompts ```javascript const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'system', content: 'You are a coding assistant that talks like a pirate.' }, { role: 'user', content: 'Explain what async/await does in JavaScript.' } ], }); ``` **What's happening:** - `role: 'system'` - Special message type that sets the AI's behavior and personality - System messages are processed first and influence all subsequent responses - The model will maintain this behavior throughout the conversation **Why it matters:** System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.). **Key insight:** Same model + different system prompts = completely different agents! --- ## Example 3: Temperature Control ```javascript // Focused response const focusedResponse = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: prompt }], temperature: 0.2, }); // Creative response const creativeResponse = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: prompt }], temperature: 1.5, }); ``` **What's happening:** - `temperature` - Controls randomness in the output (range: 0.0 to 2.0) - **Low temperature (0.0 - 0.3):** - More focused and deterministic - Same input → similar output - Best for: factual answers, code generation, data extraction - **Medium temperature (0.7 - 1.0):** - Balanced creativity and coherence - Default for most use cases - **High temperature (1.2 - 2.0):** - More creative and varied - Same input → very different outputs - Best for: creative writing, brainstorming, story generation **Real-world usage:** - Code completion: temperature 0.2 - Customer support: temperature 0.5 - Creative content: temperature 1.2 --- ## Example 4: Conversation Context ```javascript const messages = [ { role: 'system', content: 'You are a helpful coding tutor.' }, { role: 'user', content: 'What is a Promise in JavaScript?' }, ]; const response1 = await client.chat.completions.create({ model: 'gpt-4o', messages: messages, }); // Add AI response to history messages.push(response1.choices[0].message); // Add follow-up question messages.push({ role: 'user', content: 'Can you show me a simple example?' }); // Second request with full context const response2 = await client.chat.completions.create({ model: 'gpt-4o', messages: messages, }); ``` **What's happening:** - OpenAI models are **stateless** - they don't remember previous conversations - We maintain context by sending the entire conversation history with each request - Each request is independent; you must include all relevant messages **Message order in the array:** 1. System prompt (optional, but recommended first) 2. Previous user message 3. Previous assistant response 4. Current user message **Why it matters:** This is how chatbots remember context. The full conversation is sent every time. **Performance consideration:** - More messages = more tokens = higher cost - Longer conversations eventually hit token limits - Real applications need conversation trimming or summarization strategies --- ## Example 5: Streaming Responses ```javascript const stream = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'user', content: 'Write a haiku about programming.' } ], stream: true, // Enable streaming }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; process.stdout.write(content); } ``` **What's happening:** - `stream: true` - Instead of waiting for the complete response, receive it token-by-token - `for await...of` - Iterate over the stream as chunks arrive - `delta.content` - Each chunk contains a small piece of text (often just a word or partial word) - `process.stdout.write()` - Write without newline to display text progressively **Streaming vs. Non-streaming:** **Non-streaming (default):** ``` [Request sent] [Wait 5 seconds...] [Full response arrives] ``` **Streaming:** ``` [Request sent] Once [chunk arrives: "Once"] upon [chunk arrives: " upon"] a [chunk arrives: " a"] time [chunk arrives: " time"] ... ``` **Why it matters:** - Better user experience (immediate feedback) - Appears faster even though total time is similar - Essential for real-time chat interfaces - Allows early processing/display of partial results **When to use streaming:** - Interactive chat applications - Long-form content generation - When user experience matters more than simplicity **When to NOT use streaming:** - Simple scripts or automation - When you need the complete response before processing - Batch processing --- ## Example 6: Token Usage ```javascript const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'user', content: 'Explain recursion in 3 sentences.' } ], max_tokens: 100, }); console.log("Token usage:"); console.log("- Prompt tokens: " + response.usage.prompt_tokens); console.log("- Completion tokens: " + response.usage.completion_tokens); console.log("- Total tokens: " + response.usage.total_tokens); ``` **What's happening:** - `max_tokens` - Limits the length of the AI's response - `response.usage` - Contains token consumption details - **Prompt tokens:** Your input (messages you sent) - **Completion tokens:** AI's output (the response) - **Total tokens:** Sum of both (what you're billed for) **Understanding tokens:** - Tokens ≠ words - 1 token ≈ 0.75 words (in English) - "hello" = 1 token - "chatbot" = 2 tokens ("chat" + "bot") - Punctuation and spaces count as tokens **Why it matters:** 1. **Cost control:** You pay per token 2. **Context limits:** Models have maximum token limits (e.g., gpt-4o: 128,000 tokens) 3. **Response control:** Use `max_tokens` to prevent overly long responses **Practical limits:** ```javascript // Prevent runaway responses max_tokens: 150, // ~100 words // Brief responses max_tokens: 50, // ~35 words // Longer content max_tokens: 1000, // ~750 words ``` **Cost estimation (approximate):** - GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens - GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens --- ## Example 7: Model Comparison ```javascript // GPT-4o - Most capable const gpt4Response = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: prompt }], }); // GPT-3.5-turbo - Faster and cheaper const gpt35Response = await client.chat.completions.create({ model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }], }); ``` **Available models:** | Model | Best For | Speed | Cost | Context Window | |-------|----------|-------|------|----------------| | `gpt-4o` | Complex tasks, reasoning, accuracy | Medium | $$$ | 128K tokens | | `gpt-4o-mini` | Balanced performance/cost | Fast | $$ | 128K tokens | | `gpt-3.5-turbo` | Simple tasks, high volume | Very Fast | $ | 16K tokens | **Choosing the right model:** - **Use GPT-4o when:** - Complex reasoning required - High accuracy is critical - Working with code or technical content - Quality > speed/cost - **Use GPT-4o-mini when:** - Need good performance at lower cost - Most general-purpose tasks - **Use GPT-3.5-turbo when:** - Simple classification or extraction - High-volume, low-complexity tasks - Speed is critical - Budget constraints **Pro tip:** Start with gpt-4o for development, then evaluate if cheaper models work for your use case. --- ## Error Handling ```javascript try { await basicCompletion(); } catch (error) { console.error("Error:", error.message); if (error.message.includes('API key')) { console.error("\nMake sure to set your OPENAI_API_KEY in a .env file"); } } ``` **Common errors:** - `401 Unauthorized` - Invalid or missing API key - `429 Too Many Requests` - Rate limit exceeded - `500 Internal Server Error` - OpenAI service issue - `Context length exceeded` - Too many tokens in conversation **Best practices:** - Always use try-catch with async calls - Check error types and provide helpful messages - Implement retry logic for transient failures - Monitor token usage to avoid limit errors --- ## Key Takeaways 1. **Stateless Nature:** Models don't remember. You send full context each time. 2. **Message Roles:** `system` (behavior), `user` (input), `assistant` (AI response) 3. **Temperature:** Controls creativity (0 = focused, 2 = creative) 4. **Streaming:** Better UX for real-time applications 5. **Token Management:** Monitor usage for cost and limits 6. **Model Selection:** Choose based on task complexity and budget