| # Code Explanation: OpenAI Intro | |
| This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up. | |
| ## Requirements | |
| Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method. | |
| ### Get API Key | |
| https://platform.openai.com/api-keys | |
| ### Add Billing Method | |
| https://platform.openai.com/settings/organization/billing/overview | |
| ### Configure environment variables | |
| ```bash | |
| cp .env.example .env | |
| ``` | |
| Then edit `.env` and add your actual API key. | |
| ## Setup and Initialization | |
| ```javascript | |
| import OpenAI from 'openai'; | |
| import 'dotenv/config'; | |
| const client = new OpenAI({ | |
| apiKey: process.env.OPENAI_API_KEY, | |
| }); | |
| ``` | |
| **What's happening:** | |
| - `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js | |
| - `import 'dotenv/config'` - Load environment variables from `.env` file | |
| - `new OpenAI({...})` - Create a client instance that handles API authentication and requests | |
| - `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!) | |
| **Why it matters:** The client object is your interface to OpenAI's models. All API calls go through this client. | |
| --- | |
| ## Example 1: Basic Chat Completion | |
| ```javascript | |
| const response = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [ | |
| { role: 'user', content: 'What is node-llama-cpp?' } | |
| ], | |
| }); | |
| console.log(response.choices[0].message.content); | |
| ``` | |
| **What's happening:** | |
| - `chat.completions.create()` - The primary method for sending messages to ChatGPT models | |
| - `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model) | |
| - `messages` array - Contains the conversation history | |
| - `role: 'user'` - Indicates this message comes from the user (you) | |
| - `response.choices[0]` - The API returns an array of possible responses; we take the first one | |
| - `message.content` - The actual text response from the AI | |
| **Response structure:** | |
| ```javascript | |
| { | |
| id: 'chatcmpl-...', | |
| object: 'chat.completion', | |
| created: 1234567890, | |
| model: 'gpt-4o', | |
| choices: [ | |
| { | |
| index: 0, | |
| message: { | |
| role: 'assistant', | |
| content: 'node-llama-cpp is a...' | |
| }, | |
| finish_reason: 'stop' | |
| } | |
| ], | |
| usage: { | |
| prompt_tokens: 10, | |
| completion_tokens: 50, | |
| total_tokens: 60 | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Example 2: System Prompts | |
| ```javascript | |
| const response = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [ | |
| { role: 'system', content: 'You are a coding assistant that talks like a pirate.' }, | |
| { role: 'user', content: 'Explain what async/await does in JavaScript.' } | |
| ], | |
| }); | |
| ``` | |
| **What's happening:** | |
| - `role: 'system'` - Special message type that sets the AI's behavior and personality | |
| - System messages are processed first and influence all subsequent responses | |
| - The model will maintain this behavior throughout the conversation | |
| **Why it matters:** System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.). | |
| **Key insight:** Same model + different system prompts = completely different agents! | |
| --- | |
| ## Example 3: Temperature Control | |
| ```javascript | |
| // Focused response | |
| const focusedResponse = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [{ role: 'user', content: prompt }], | |
| temperature: 0.2, | |
| }); | |
| // Creative response | |
| const creativeResponse = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [{ role: 'user', content: prompt }], | |
| temperature: 1.5, | |
| }); | |
| ``` | |
| **What's happening:** | |
| - `temperature` - Controls randomness in the output (range: 0.0 to 2.0) | |
| - **Low temperature (0.0 - 0.3):** | |
| - More focused and deterministic | |
| - Same input → similar output | |
| - Best for: factual answers, code generation, data extraction | |
| - **Medium temperature (0.7 - 1.0):** | |
| - Balanced creativity and coherence | |
| - Default for most use cases | |
| - **High temperature (1.2 - 2.0):** | |
| - More creative and varied | |
| - Same input → very different outputs | |
| - Best for: creative writing, brainstorming, story generation | |
| **Real-world usage:** | |
| - Code completion: temperature 0.2 | |
| - Customer support: temperature 0.5 | |
| - Creative content: temperature 1.2 | |
| --- | |
| ## Example 4: Conversation Context | |
| ```javascript | |
| const messages = [ | |
| { role: 'system', content: 'You are a helpful coding tutor.' }, | |
| { role: 'user', content: 'What is a Promise in JavaScript?' }, | |
| ]; | |
| const response1 = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: messages, | |
| }); | |
| // Add AI response to history | |
| messages.push(response1.choices[0].message); | |
| // Add follow-up question | |
| messages.push({ role: 'user', content: 'Can you show me a simple example?' }); | |
| // Second request with full context | |
| const response2 = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: messages, | |
| }); | |
| ``` | |
| **What's happening:** | |
| - OpenAI models are **stateless** - they don't remember previous conversations | |
| - We maintain context by sending the entire conversation history with each request | |
| - Each request is independent; you must include all relevant messages | |
| **Message order in the array:** | |
| 1. System prompt (optional, but recommended first) | |
| 2. Previous user message | |
| 3. Previous assistant response | |
| 4. Current user message | |
| **Why it matters:** This is how chatbots remember context. The full conversation is sent every time. | |
| **Performance consideration:** | |
| - More messages = more tokens = higher cost | |
| - Longer conversations eventually hit token limits | |
| - Real applications need conversation trimming or summarization strategies | |
| --- | |
| ## Example 5: Streaming Responses | |
| ```javascript | |
| const stream = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [ | |
| { role: 'user', content: 'Write a haiku about programming.' } | |
| ], | |
| stream: true, // Enable streaming | |
| }); | |
| for await (const chunk of stream) { | |
| const content = chunk.choices[0]?.delta?.content || ''; | |
| process.stdout.write(content); | |
| } | |
| ``` | |
| **What's happening:** | |
| - `stream: true` - Instead of waiting for the complete response, receive it token-by-token | |
| - `for await...of` - Iterate over the stream as chunks arrive | |
| - `delta.content` - Each chunk contains a small piece of text (often just a word or partial word) | |
| - `process.stdout.write()` - Write without newline to display text progressively | |
| **Streaming vs. Non-streaming:** | |
| **Non-streaming (default):** | |
| ``` | |
| [Request sent] | |
| [Wait 5 seconds...] | |
| [Full response arrives] | |
| ``` | |
| **Streaming:** | |
| ``` | |
| [Request sent] | |
| Once [chunk arrives: "Once"] | |
| upon [chunk arrives: " upon"] | |
| a [chunk arrives: " a"] | |
| time [chunk arrives: " time"] | |
| ... | |
| ``` | |
| **Why it matters:** | |
| - Better user experience (immediate feedback) | |
| - Appears faster even though total time is similar | |
| - Essential for real-time chat interfaces | |
| - Allows early processing/display of partial results | |
| **When to use streaming:** | |
| - Interactive chat applications | |
| - Long-form content generation | |
| - When user experience matters more than simplicity | |
| **When to NOT use streaming:** | |
| - Simple scripts or automation | |
| - When you need the complete response before processing | |
| - Batch processing | |
| --- | |
| ## Example 6: Token Usage | |
| ```javascript | |
| const response = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [ | |
| { role: 'user', content: 'Explain recursion in 3 sentences.' } | |
| ], | |
| max_tokens: 100, | |
| }); | |
| console.log("Token usage:"); | |
| console.log("- Prompt tokens: " + response.usage.prompt_tokens); | |
| console.log("- Completion tokens: " + response.usage.completion_tokens); | |
| console.log("- Total tokens: " + response.usage.total_tokens); | |
| ``` | |
| **What's happening:** | |
| - `max_tokens` - Limits the length of the AI's response | |
| - `response.usage` - Contains token consumption details | |
| - **Prompt tokens:** Your input (messages you sent) | |
| - **Completion tokens:** AI's output (the response) | |
| - **Total tokens:** Sum of both (what you're billed for) | |
| **Understanding tokens:** | |
| - Tokens ≠ words | |
| - 1 token ≈ 0.75 words (in English) | |
| - "hello" = 1 token | |
| - "chatbot" = 2 tokens ("chat" + "bot") | |
| - Punctuation and spaces count as tokens | |
| **Why it matters:** | |
| 1. **Cost control:** You pay per token | |
| 2. **Context limits:** Models have maximum token limits (e.g., gpt-4o: 128,000 tokens) | |
| 3. **Response control:** Use `max_tokens` to prevent overly long responses | |
| **Practical limits:** | |
| ```javascript | |
| // Prevent runaway responses | |
| max_tokens: 150, // ~100 words | |
| // Brief responses | |
| max_tokens: 50, // ~35 words | |
| // Longer content | |
| max_tokens: 1000, // ~750 words | |
| ``` | |
| **Cost estimation (approximate):** | |
| - GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens | |
| - GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens | |
| --- | |
| ## Example 7: Model Comparison | |
| ```javascript | |
| // GPT-4o - Most capable | |
| const gpt4Response = await client.chat.completions.create({ | |
| model: 'gpt-4o', | |
| messages: [{ role: 'user', content: prompt }], | |
| }); | |
| // GPT-3.5-turbo - Faster and cheaper | |
| const gpt35Response = await client.chat.completions.create({ | |
| model: 'gpt-3.5-turbo', | |
| messages: [{ role: 'user', content: prompt }], | |
| }); | |
| ``` | |
| **Available models:** | |
| | Model | Best For | Speed | Cost | Context Window | | |
| |-------|----------|-------|------|----------------| | |
| | `gpt-4o` | Complex tasks, reasoning, accuracy | Medium | $$$ | 128K tokens | | |
| | `gpt-4o-mini` | Balanced performance/cost | Fast | $$ | 128K tokens | | |
| | `gpt-3.5-turbo` | Simple tasks, high volume | Very Fast | $ | 16K tokens | | |
| **Choosing the right model:** | |
| - **Use GPT-4o when:** | |
| - Complex reasoning required | |
| - High accuracy is critical | |
| - Working with code or technical content | |
| - Quality > speed/cost | |
| - **Use GPT-4o-mini when:** | |
| - Need good performance at lower cost | |
| - Most general-purpose tasks | |
| - **Use GPT-3.5-turbo when:** | |
| - Simple classification or extraction | |
| - High-volume, low-complexity tasks | |
| - Speed is critical | |
| - Budget constraints | |
| **Pro tip:** Start with gpt-4o for development, then evaluate if cheaper models work for your use case. | |
| --- | |
| ## Error Handling | |
| ```javascript | |
| try { | |
| await basicCompletion(); | |
| } catch (error) { | |
| console.error("Error:", error.message); | |
| if (error.message.includes('API key')) { | |
| console.error("\nMake sure to set your OPENAI_API_KEY in a .env file"); | |
| } | |
| } | |
| ``` | |
| **Common errors:** | |
| - `401 Unauthorized` - Invalid or missing API key | |
| - `429 Too Many Requests` - Rate limit exceeded | |
| - `500 Internal Server Error` - OpenAI service issue | |
| - `Context length exceeded` - Too many tokens in conversation | |
| **Best practices:** | |
| - Always use try-catch with async calls | |
| - Check error types and provide helpful messages | |
| - Implement retry logic for transient failures | |
| - Monitor token usage to avoid limit errors | |
| --- | |
| ## Key Takeaways | |
| 1. **Stateless Nature:** Models don't remember. You send full context each time. | |
| 2. **Message Roles:** `system` (behavior), `user` (input), `assistant` (AI response) | |
| 3. **Temperature:** Controls creativity (0 = focused, 2 = creative) | |
| 4. **Streaming:** Better UX for real-time applications | |
| 5. **Token Management:** Monitor usage for cost and limits | |
| 6. **Model Selection:** Choose based on task complexity and budget |