# Developer Guide Complete guide for integrating Rox AI into your applications. **Base URL**: `https://Rox-Turbo-API.hf.space` ## Table of Contents 1. [Quick Start](#quick-start) 2. [Authentication](#authentication) 3. [Making Requests](#making-requests) 4. [Streaming Responses](#streaming-responses) 5. [Model Selection](#model-selection) 6. [Parameters](#parameters) 7. [Conversation Management](#conversation-management) 8. [Error Handling](#error-handling) 9. [Best Practices](#best-practices) 10. [Code Examples](#code-examples) 11. [OpenAI SDK Compatibility](#openai-sdk-compatibility) --- ## Quick Start Send your first request in under 30 seconds. ### cURL ```bash curl -X POST https://Rox-Turbo-API.hf.space/chat \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"user","content":"Hello"}]}' ``` ### Python ```python import requests response = requests.post( 'https://Rox-Turbo-API.hf.space/chat', json={'messages': [{'role': 'user', 'content': 'Hello'}]} ) print(response.json()['content']) ``` ### JavaScript ```javascript const response = await fetch('https://Rox-Turbo-API.hf.space/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [{ role: 'user', content: 'Hello' }] }) }); const data = await response.json(); console.log(data.content); ``` --- ## Authentication No API key required. All endpoints are publicly accessible. --- ## Making Requests ### Request Format All endpoints accept POST requests with JSON body. **Required Fields:** - `messages`: Array of message objects **Optional Fields:** - `temperature`: Float (0.0 - 2.0, default: 0.7) - `top_p`: Float (0.0 - 1.0, default: 0.95) - `max_tokens`: Integer (1 - 32768, default: 8192) - `stream`: Boolean (default: false) ### Message Object ```json { "role": "user" | "assistant" | "system", "content": "message text" } ``` ### Complete Request Example ```json { "messages": [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "What is AI?"} ], "temperature": 0.7, "top_p": 0.95, "max_tokens": 8192, "stream": false } ``` ### Response Format **Standard Response:** ```json { "content": "AI stands for Artificial Intelligence..." } ``` **Streaming Response:** ``` data: {"content": "AI"} data: {"content": " stands"} data: {"content": " for"} data: [DONE] ``` --- ## Streaming Responses Streaming provides real-time token-by-token responses for better user experience. ### When to Use Streaming - Long-form content generation - Interactive chat applications - Real-time feedback requirements - Improved perceived performance ### Python Implementation ```python import requests import json def stream_chat(message, model='chat'): response = requests.post( f'https://Rox-Turbo-API.hf.space/{model}', json={ 'messages': [{'role': 'user', 'content': message}], 'stream': True }, stream=True ) for line in response.iter_lines(): if line: line = line.decode('utf-8') if line.startswith('data: '): data = line[6:] if data == '[DONE]': break try: parsed = json.loads(data) if 'content' in parsed: print(parsed['content'], end='', flush=True) yield parsed['content'] except json.JSONDecodeError: pass # Usage for token in stream_chat('Tell me a story'): pass # Tokens printed in real-time ``` ### JavaScript Implementation ```javascript async function streamChat(message, model = 'chat') { const response = await fetch(`https://Rox-Turbo-API.hf.space/${model}`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [{ role: 'user', content: message }], stream: true }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); let fullContent = ''; while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value, { stream: true }); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6).trim(); if (data === '[DONE]') break; try { const parsed = JSON.parse(data); if (parsed.content) { fullContent += parsed.content; console.log(parsed.content); // Process each token } } catch (e) {} } } } return fullContent; } // Usage await streamChat('Tell me a story'); ``` ### Node.js Implementation ```javascript const https = require('https'); function streamChat(message, model = 'chat') { const data = JSON.stringify({ messages: [{ role: 'user', content: message }], stream: true }); const options = { hostname: 'Rox-Turbo-API.hf.space', path: `/${model}`, method: 'POST', headers: { 'Content-Type': 'application/json', 'Content-Length': data.length } }; const req = https.request(options, (res) => { res.on('data', (chunk) => { const lines = chunk.toString().split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6).trim(); if (data === '[DONE]') return; try { const parsed = JSON.parse(data); if (parsed.content) { process.stdout.write(parsed.content); } } catch (e) {} } } }); }); req.write(data); req.end(); } // Usage streamChat('Tell me a story'); ``` --- ## Model Selection Choose the right model for your use case. ### Available Models | Model | Endpoint | Best For | Speed | Quality | |-------|----------|----------|-------|---------| | Rox Core | `/chat` | General conversation | Medium | High | | Rox 2.1 Turbo | `/turbo` | Quick responses | Fast | Good | | Rox 3.5 Coder | `/coder` | Code generation | Medium | High | | Rox 4.5 Turbo | `/turbo45` | Fast reasoning | Fast | High | | Rox 5 Ultra | `/ultra` | Complex tasks | Slow | Highest | | Rox 6 Dyno | `/dyno` | Long context | Medium | High | | Rox 7 Coder | `/coder7` | Advanced coding | Medium | Highest | | Rox Vision Max | `/vision` | Visual tasks | Medium | High | ### Model Selection Guide ```python def select_model(task_type): models = { 'chat': 'chat', # General conversation 'quick': 'turbo', # Fast responses 'code': 'coder', # Code generation 'reasoning': 'turbo45', # Complex reasoning 'complex': 'ultra', # Highest quality 'long': 'dyno', # Long documents 'advanced_code': 'coder7',# Advanced coding 'vision': 'vision' # Visual tasks } return models.get(task_type, 'chat') # Usage model = select_model('code') response = ask_rox('Write a function', model=model) ``` --- ## Parameters ### temperature Controls randomness in responses. **Range**: 0.0 to 2.0 **Default**: 0.7 - **0.0 - 0.3**: Deterministic, focused (math, facts, code) - **0.4 - 0.8**: Balanced (general conversation) - **0.9 - 2.0**: Creative, varied (stories, brainstorming) ```python # Factual response response = requests.post(url, json={ 'messages': [{'role': 'user', 'content': 'What is 2+2?'}], 'temperature': 0.2 }) # Creative response response = requests.post(url, json={ 'messages': [{'role': 'user', 'content': 'Write a poem'}], 'temperature': 1.5 }) ``` ### top_p Controls diversity via nucleus sampling. **Range**: 0.0 to 1.0 **Default**: 0.95 - **0.1 - 0.5**: Narrow, focused - **0.6 - 0.9**: Balanced - **0.9 - 1.0**: Diverse ```python response = requests.post(url, json={ 'messages': [{'role': 'user', 'content': 'Tell me about AI'}], 'top_p': 0.9 }) ``` ### max_tokens Maximum tokens in response. **Range**: 1 to 32768 **Default**: 8192 Token estimation: ~1 token = 0.75 words ```python # Short response response = requests.post(url, json={ 'messages': [{'role': 'user', 'content': 'Brief summary'}], 'max_tokens': 100 }) # Long response response = requests.post(url, json={ 'messages': [{'role': 'user', 'content': 'Detailed explanation'}], 'max_tokens': 4096 }) ``` ### stream Enable streaming responses. **Type**: Boolean **Default**: false ```python response = requests.post(url, json={ 'messages': [{'role': 'user', 'content': 'Hello'}], 'stream': True }, stream=True) ``` --- ## Conversation Management ### Single Turn ```python def ask_once(question): response = requests.post( 'https://Rox-Turbo-API.hf.space/chat', json={'messages': [{'role': 'user', 'content': question}]} ) return response.json()['content'] ``` ### Multi-Turn Conversation ```python class Conversation: def __init__(self, model='chat', system_prompt=None): self.model = model self.messages = [] if system_prompt: self.messages.append({'role': 'system', 'content': system_prompt}) def ask(self, message): self.messages.append({'role': 'user', 'content': message}) response = requests.post( f'https://Rox-Turbo-API.hf.space/{self.model}', json={'messages': self.messages} ) reply = response.json()['content'] self.messages.append({'role': 'assistant', 'content': reply}) return reply def clear(self): system_msg = [m for m in self.messages if m['role'] == 'system'] self.messages = system_msg # Usage conv = Conversation(system_prompt='You are a helpful assistant') print(conv.ask('Hello')) print(conv.ask('What is AI?')) print(conv.ask('Tell me more')) ``` ### JavaScript Conversation Manager ```javascript class Conversation { constructor(model = 'chat', systemPrompt = null) { this.model = model; this.messages = []; if (systemPrompt) { this.messages.push({ role: 'system', content: systemPrompt }); } } async ask(message) { this.messages.push({ role: 'user', content: message }); const response = await fetch(`https://Rox-Turbo-API.hf.space/${this.model}`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: this.messages }) }); const data = await response.json(); const reply = data.content; this.messages.push({ role: 'assistant', content: reply }); return reply; } clear() { const systemMsg = this.messages.filter(m => m.role === 'system'); this.messages = systemMsg; } } // Usage const conv = new Conversation('chat', 'You are a helpful assistant'); console.log(await conv.ask('Hello')); console.log(await conv.ask('What is AI?')); ``` ### System Prompts System prompts define the assistant's behavior. ```python def ask_with_personality(message, personality): system_prompts = { 'professional': 'You are a professional business consultant.', 'casual': 'You are a friendly, casual assistant.', 'technical': 'You are a technical expert. Be precise and detailed.', 'creative': 'You are a creative writer. Be imaginative and expressive.' } messages = [ {'role': 'system', 'content': system_prompts.get(personality, '')}, {'role': 'user', 'content': message} ] response = requests.post( 'https://Rox-Turbo-API.hf.space/chat', json={'messages': messages} ) return response.json()['content'] # Usage answer = ask_with_personality('Explain AI', 'technical') ``` --- ## Error Handling ### Basic Error Handling ```python def safe_request(message, model='chat'): try: response = requests.post( f'https://Rox-Turbo-API.hf.space/{model}', json={'messages': [{'role': 'user', 'content': message}]}, timeout=30 ) response.raise_for_status() return response.json()['content'] except requests.exceptions.Timeout: return "Request timed out. Please try again." except requests.exceptions.HTTPError as e: return f"HTTP error: {e.response.status_code}" except requests.exceptions.RequestException as e: return f"Request failed: {str(e)}" except KeyError: return "Invalid response format" ``` ### Advanced Error Handling with Retry ```python import time def request_with_retry(message, model='chat', max_retries=3): for attempt in range(max_retries): try: response = requests.post( f'https://Rox-Turbo-API.hf.space/{model}', json={'messages': [{'role': 'user', 'content': message}]}, timeout=30 ) response.raise_for_status() return response.json()['content'] except requests.exceptions.RequestException as e: if attempt == max_retries - 1: raise wait_time = 2 ** attempt # Exponential backoff time.sleep(wait_time) ``` ### JavaScript Error Handling ```javascript async function safeRequest(message, model = 'chat') { try { const response = await fetch(`https://Rox-Turbo-API.hf.space/${model}`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [{ role: 'user', content: message }] }) }); if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } const data = await response.json(); return data.content; } catch (error) { console.error('Request failed:', error); throw error; } } ``` --- ## Best Practices ### 1. Use Appropriate Models Choose models based on your needs: - Use `turbo` for simple, fast responses - Use `coder` for code-related tasks - Use `ultra` for complex reasoning - Use `dyno` for long documents ### 2. Optimize Parameters ```python # For factual questions params = {'temperature': 0.2, 'max_tokens': 500} # For creative tasks params = {'temperature': 1.2, 'max_tokens': 2000} # For code generation params = {'temperature': 0.3, 'max_tokens': 4096} ``` ### 3. Manage Context Length ```python def trim_conversation(messages, max_messages=10): """Keep only recent messages to manage context""" system_msgs = [m for m in messages if m['role'] == 'system'] other_msgs = [m for m in messages if m['role'] != 'system'] return system_msgs + other_msgs[-max_messages:] ``` ### 4. Implement Caching ```python from functools import lru_cache import hashlib @lru_cache(maxsize=100) def cached_request(message_hash, model): # Actual request implementation pass def ask_with_cache(message, model='chat'): message_hash = hashlib.md5(message.encode()).hexdigest() return cached_request(message_hash, model) ``` ### 5. Rate Limiting ```python import time from collections import deque class RateLimiter: def __init__(self, max_requests=10, time_window=60): self.max_requests = max_requests self.time_window = time_window self.requests = deque() def wait_if_needed(self): now = time.time() # Remove old requests while self.requests and now - self.requests[0] > self.time_window: self.requests.popleft() # Wait if at limit if len(self.requests) >= self.max_requests: sleep_time = self.time_window - (now - self.requests[0]) if sleep_time > 0: time.sleep(sleep_time) self.requests.append(now) limiter = RateLimiter(10, 60) def rate_limited_request(message): limiter.wait_if_needed() return ask_rox(message) ``` ### 6. Streaming for Long Responses Use streaming for responses over 500 tokens to improve user experience. ### 7. Error Recovery ```python def robust_request(message, model='chat'): fallback_models = ['chat', 'turbo', 'coder'] for fallback_model in fallback_models: try: return request_with_retry(message, fallback_model) except Exception as e: if fallback_model == fallback_models[-1]: raise continue ``` --- ## Code Examples ### Complete Chatbot (Python) ```python import requests import json class RoxChatbot: def __init__(self, model='chat', system_prompt=None): self.model = model self.base_url = 'https://Rox-Turbo-API.hf.space' self.conversation = [] if system_prompt: self.conversation.append({ 'role': 'system', 'content': system_prompt }) def chat(self, message, stream=False): self.conversation.append({'role': 'user', 'content': message}) if stream: return self._stream_chat() else: return self._standard_chat() def _standard_chat(self): response = requests.post( f'{self.base_url}/{self.model}', json={'messages': self.conversation} ) reply = response.json()['content'] self.conversation.append({'role': 'assistant', 'content': reply}) return reply def _stream_chat(self): response = requests.post( f'{self.base_url}/{self.model}', json={'messages': self.conversation, 'stream': True}, stream=True ) full_content = '' for line in response.iter_lines(): if line: line = line.decode('utf-8') if line.startswith('data: '): data = line[6:] if data == '[DONE]': break try: parsed = json.loads(data) if 'content' in parsed: full_content += parsed['content'] print(parsed['content'], end='', flush=True) except json.JSONDecodeError: pass print() # New line after streaming self.conversation.append({'role': 'assistant', 'content': full_content}) return full_content def clear(self): system_msgs = [m for m in self.conversation if m['role'] == 'system'] self.conversation = system_msgs # Usage bot = RoxChatbot(system_prompt='You are a helpful assistant') print(bot.chat('Hello')) print(bot.chat('What is AI?')) bot.chat('Tell me a story', stream=True) ``` ### Complete Chatbot (JavaScript) ```javascript class RoxChatbot { constructor(model = 'chat', systemPrompt = null) { this.model = model; this.baseUrl = 'https://Rox-Turbo-API.hf.space'; this.conversation = []; if (systemPrompt) { this.conversation.push({ role: 'system', content: systemPrompt }); } } async chat(message, stream = false) { this.conversation.push({ role: 'user', content: message }); if (stream) { return await this._streamChat(); } else { return await this._standardChat(); } } async _standardChat() { const response = await fetch(`${this.baseUrl}/${this.model}`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: this.conversation }) }); const data = await response.json(); const reply = data.content; this.conversation.push({ role: 'assistant', content: reply }); return reply; } async _streamChat() { const response = await fetch(`${this.baseUrl}/${this.model}`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: this.conversation, stream: true }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); let fullContent = ''; while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value, { stream: true }); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6).trim(); if (data === '[DONE]') break; try { const parsed = JSON.parse(data); if (parsed.content) { fullContent += parsed.content; process.stdout.write(parsed.content); } } catch (e) {} } } } console.log(); this.conversation.push({ role: 'assistant', content: fullContent }); return fullContent; } clear() { const systemMsgs = this.conversation.filter(m => m.role === 'system'); this.conversation = systemMsgs; } } // Usage const bot = new RoxChatbot('chat', 'You are a helpful assistant'); console.log(await bot.chat('Hello')); console.log(await bot.chat('What is AI?')); await bot.chat('Tell me a story', true); ``` --- ## OpenAI SDK Compatibility Rox AI is compatible with the OpenAI SDK. ### Python with OpenAI SDK ```python from openai import OpenAI client = OpenAI( base_url="https://Rox-Turbo-API.hf.space", api_key="not-needed" # No API key required ) # Standard request response = client.chat.completions.create( model="chat", messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content) # Streaming request stream = client.chat.completions.create( model="chat", messages=[{"role": "user", "content": "Tell me a story"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` ### JavaScript with OpenAI SDK ```javascript import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://Rox-Turbo-API.hf.space', apiKey: 'not-needed' }); // Standard request const response = await client.chat.completions.create({ model: 'chat', messages: [{ role: 'user', content: 'Hello' }] }); console.log(response.choices[0].message.content); // Streaming request const stream = await client.chat.completions.create({ model: 'chat', messages: [{ role: 'user', content: 'Tell me a story' }], stream: true }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { process.stdout.write(chunk.choices[0].delta.content); } } ``` --- ## Additional Resources - [API Reference](API_REFERENCE.md) - Complete API documentation - [Code Examples](CODE.md) - Ready-to-use code snippets - [Model Guide](MODELS.md) - Detailed model information --- Built by Mohammad Faiz