Spaces:

lenzcom
/

Email

Running

App Files Files Community

Email / examples /02_openai-intro /CODE.md

lenzcom

Upload folder using huggingface_hub

e706de2 verified about 20 hours ago

preview code

raw

history blame contribute delete

11.6 kB

	# Code Explanation: OpenAI Intro

	This guide walks through each example in `openai-intro.js`, explaining how to work with OpenAI's API from the ground up.

	## Requirements

	Before running this example, you’ll need an OpenAI account, an API key, and a valid billing method.

	### Get API Key

	https://platform.openai.com/api-keys

	### Add Billing Method

	https://platform.openai.com/settings/organization/billing/overview

	### Configure environment variables

	```bash
	cp .env.example .env
	```
	Then edit `.env` and add your actual API key.

	## Setup and Initialization

	```javascript
	import OpenAI from 'openai';
	import 'dotenv/config';

	const client = new OpenAI({
	apiKey: process.env.OPENAI_API_KEY,
	});
	```

	What's happening:
	- `import OpenAI from 'openai'` - Import the official OpenAI SDK for Node.js
	- `import 'dotenv/config'` - Load environment variables from `.env` file
	- `new OpenAI({...})` - Create a client instance that handles API authentication and requests
	- `process.env.OPENAI_API_KEY` - Your API key from platform.openai.com (never hardcode this!)

	Why it matters: The client object is your interface to OpenAI's models. All API calls go through this client.

	---

	## Example 1: Basic Chat Completion

	```javascript
	const response = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [
	{ role: 'user', content: 'What is node-llama-cpp?' }
	],
	});

	console.log(response.choices[0].message.content);
	```

	What's happening:
	- `chat.completions.create()` - The primary method for sending messages to ChatGPT models
	- `model: 'gpt-4o'` - Specifies which model to use (gpt-4o is the latest, most capable model)
	- `messages` array - Contains the conversation history
	- `role: 'user'` - Indicates this message comes from the user (you)
	- `response.choices[0]` - The API returns an array of possible responses; we take the first one
	- `message.content` - The actual text response from the AI

	Response structure:
	```javascript
	{
	id: 'chatcmpl-...',
	object: 'chat.completion',
	created: 1234567890,
	model: 'gpt-4o',
	choices: [
	{
	index: 0,
	message: {
	role: 'assistant',
	content: 'node-llama-cpp is a...'
	},
	finish_reason: 'stop'
	}
	],
	usage: {
	prompt_tokens: 10,
	completion_tokens: 50,
	total_tokens: 60
	}
	}
	```

	---

	## Example 2: System Prompts

	```javascript
	const response = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [
	{ role: 'system', content: 'You are a coding assistant that talks like a pirate.' },
	{ role: 'user', content: 'Explain what async/await does in JavaScript.' }
	],
	});
	```

	What's happening:
	- `role: 'system'` - Special message type that sets the AI's behavior and personality
	- System messages are processed first and influence all subsequent responses
	- The model will maintain this behavior throughout the conversation

	Why it matters: System prompts are how you specialize AI behavior. They're the foundation of creating focused agents with specific roles (translator, coder, analyst, etc.).

	Key insight: Same model + different system prompts = completely different agents!

	---

	## Example 3: Temperature Control

	```javascript
	// Focused response
	const focusedResponse = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [{ role: 'user', content: prompt }],
	temperature: 0.2,
	});

	// Creative response
	const creativeResponse = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [{ role: 'user', content: prompt }],
	temperature: 1.5,
	});
	```

	What's happening:
	- `temperature` - Controls randomness in the output (range: 0.0 to 2.0)
	- Low temperature (0.0 - 0.3):
	- More focused and deterministic
	- Same input → similar output
	- Best for: factual answers, code generation, data extraction
	- Medium temperature (0.7 - 1.0):
	- Balanced creativity and coherence
	- Default for most use cases
	- High temperature (1.2 - 2.0):
	- More creative and varied
	- Same input → very different outputs
	- Best for: creative writing, brainstorming, story generation

	Real-world usage:
	- Code completion: temperature 0.2
	- Customer support: temperature 0.5
	- Creative content: temperature 1.2

	---

	## Example 4: Conversation Context

	```javascript
	const messages = [
	{ role: 'system', content: 'You are a helpful coding tutor.' },
	{ role: 'user', content: 'What is a Promise in JavaScript?' },
	];

	const response1 = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: messages,
	});

	// Add AI response to history
	messages.push(response1.choices[0].message);

	// Add follow-up question
	messages.push({ role: 'user', content: 'Can you show me a simple example?' });

	// Second request with full context
	const response2 = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: messages,
	});
	```

	What's happening:
	- OpenAI models are stateless - they don't remember previous conversations
	- We maintain context by sending the entire conversation history with each request
	- Each request is independent; you must include all relevant messages

	Message order in the array:
	1. System prompt (optional, but recommended first)
	2. Previous user message
	3. Previous assistant response
	4. Current user message

	Why it matters: This is how chatbots remember context. The full conversation is sent every time.

	Performance consideration:
	- More messages = more tokens = higher cost
	- Longer conversations eventually hit token limits
	- Real applications need conversation trimming or summarization strategies

	---

	## Example 5: Streaming Responses

	```javascript
	const stream = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [
	{ role: 'user', content: 'Write a haiku about programming.' }
	],
	stream: true, // Enable streaming
	});

	for await (const chunk of stream) {
	const content = chunk.choices[0]?.delta?.content \|\| '';
	process.stdout.write(content);
	}
	```

	What's happening:
	- `stream: true` - Instead of waiting for the complete response, receive it token-by-token
	- `for await...of` - Iterate over the stream as chunks arrive
	- `delta.content` - Each chunk contains a small piece of text (often just a word or partial word)
	- `process.stdout.write()` - Write without newline to display text progressively

	Streaming vs. Non-streaming:

	Non-streaming (default):
	```
	[Request sent]
	[Wait 5 seconds...]
	[Full response arrives]
	```

	Streaming:
	```
	[Request sent]
	Once [chunk arrives: "Once"]
	upon [chunk arrives: " upon"]
	a [chunk arrives: " a"]
	time [chunk arrives: " time"]
	...
	```

	Why it matters:
	- Better user experience (immediate feedback)
	- Appears faster even though total time is similar
	- Essential for real-time chat interfaces
	- Allows early processing/display of partial results

	When to use streaming:
	- Interactive chat applications
	- Long-form content generation
	- When user experience matters more than simplicity

	When to NOT use streaming:
	- Simple scripts or automation
	- When you need the complete response before processing
	- Batch processing

	---

	## Example 6: Token Usage

	```javascript
	const response = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [
	{ role: 'user', content: 'Explain recursion in 3 sentences.' }
	],
	max_tokens: 100,
	});

	console.log("Token usage:");
	console.log("- Prompt tokens: " + response.usage.prompt_tokens);
	console.log("- Completion tokens: " + response.usage.completion_tokens);
	console.log("- Total tokens: " + response.usage.total_tokens);
	```

	What's happening:
	- `max_tokens` - Limits the length of the AI's response
	- `response.usage` - Contains token consumption details
	- Prompt tokens: Your input (messages you sent)
	- Completion tokens: AI's output (the response)
	- Total tokens: Sum of both (what you're billed for)

	Understanding tokens:
	- Tokens ≠ words
	- 1 token ≈ 0.75 words (in English)
	- "hello" = 1 token
	- "chatbot" = 2 tokens ("chat" + "bot")
	- Punctuation and spaces count as tokens

	Why it matters:
	1. Cost control: You pay per token
	2. Context limits: Models have maximum token limits (e.g., gpt-4o: 128,000 tokens)
	3. Response control: Use `max_tokens` to prevent overly long responses

	Practical limits:
	```javascript
	// Prevent runaway responses
	max_tokens: 150, // ~100 words

	// Brief responses
	max_tokens: 50, // ~35 words

	// Longer content
	max_tokens: 1000, // ~750 words
	```

	Cost estimation (approximate):
	- GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens
	- GPT-3.5-turbo: $0.50 per 1M input tokens, $1.50 per 1M output tokens

	---

	## Example 7: Model Comparison

	```javascript
	// GPT-4o - Most capable
	const gpt4Response = await client.chat.completions.create({
	model: 'gpt-4o',
	messages: [{ role: 'user', content: prompt }],
	});

	// GPT-3.5-turbo - Faster and cheaper
	const gpt35Response = await client.chat.completions.create({
	model: 'gpt-3.5-turbo',
	messages: [{ role: 'user', content: prompt }],
	});
	```

	Available models:

	\| Model \| Best For \| Speed \| Cost \| Context Window \|
	\|-------\|----------\|-------\|------\|----------------\|
	\| `gpt-4o` \| Complex tasks, reasoning, accuracy \| Medium \| $$$ \| 128K tokens \|
	\| `gpt-4o-mini` \| Balanced performance/cost \| Fast \| $$ \| 128K tokens \|
	\| `gpt-3.5-turbo` \| Simple tasks, high volume \| Very Fast \| $ \| 16K tokens \|

	Choosing the right model:
	- Use GPT-4o when:
	- Complex reasoning required
	- High accuracy is critical
	- Working with code or technical content
	- Quality > speed/cost

	- Use GPT-4o-mini when:
	- Need good performance at lower cost
	- Most general-purpose tasks

	- Use GPT-3.5-turbo when:
	- Simple classification or extraction
	- High-volume, low-complexity tasks
	- Speed is critical
	- Budget constraints

	Pro tip: Start with gpt-4o for development, then evaluate if cheaper models work for your use case.

	---

	## Error Handling

	```javascript
	try {
	await basicCompletion();
	} catch (error) {
	console.error("Error:", error.message);
	if (error.message.includes('API key')) {
	console.error("\nMake sure to set your OPENAI_API_KEY in a .env file");
	}
	}
	```

	Common errors:
	- `401 Unauthorized` - Invalid or missing API key
	- `429 Too Many Requests` - Rate limit exceeded
	- `500 Internal Server Error` - OpenAI service issue
	- `Context length exceeded` - Too many tokens in conversation

	Best practices:
	- Always use try-catch with async calls
	- Check error types and provide helpful messages
	- Implement retry logic for transient failures
	- Monitor token usage to avoid limit errors

	---

	## Key Takeaways

	1. Stateless Nature: Models don't remember. You send full context each time.
	2. Message Roles: `system` (behavior), `user` (input), `assistant` (AI response)
	3. Temperature: Controls creativity (0 = focused, 2 = creative)
	4. Streaming: Better UX for real-time applications
	5. Token Management: Monitor usage for cost and limits
	6. Model Selection: Choose based on task complexity and budget