Developer Guide
Complete guide for integrating Rox AI into your applications.
Base URL: https://Rox-Turbo-API.hf.space
Table of Contents
- Quick Start
- Authentication
- Making Requests
- Streaming Responses
- Model Selection
- Parameters
- Conversation Management
- Error Handling
- Best Practices
- Code Examples
- OpenAI SDK Compatibility
Quick Start
Send your first request in under 30 seconds.
cURL
curl -X POST https://Rox-Turbo-API.hf.space/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}]}'
Python
import requests
response = requests.post(
'https://Rox-Turbo-API.hf.space/chat',
json={'messages': [{'role': 'user', 'content': 'Hello'}]}
)
print(response.json()['content'])
JavaScript
const response = await fetch('https://Rox-Turbo-API.hf.space/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [{ role: 'user', content: 'Hello' }]
})
});
const data = await response.json();
console.log(data.content);
Authentication
No API key required. All endpoints are publicly accessible.
Making Requests
Request Format
All endpoints accept POST requests with JSON body.
Required Fields:
messages: Array of message objects
Optional Fields:
temperature: Float (0.0 - 2.0, default: 0.7)top_p: Float (0.0 - 1.0, default: 0.95)max_tokens: Integer (1 - 32768, default: 8192)stream: Boolean (default: false)
Message Object
{
"role": "user" | "assistant" | "system",
"content": "message text"
}
Complete Request Example
{
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is AI?"}
],
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 8192,
"stream": false
}
Response Format
Standard Response:
{
"content": "AI stands for Artificial Intelligence..."
}
Streaming Response:
data: {"content": "AI"}
data: {"content": " stands"}
data: {"content": " for"}
data: [DONE]
Streaming Responses
Streaming provides real-time token-by-token responses for better user experience.
When to Use Streaming
- Long-form content generation
- Interactive chat applications
- Real-time feedback requirements
- Improved perceived performance
Python Implementation
import requests
import json
def stream_chat(message, model='chat'):
response = requests.post(
f'https://Rox-Turbo-API.hf.space/{model}',
json={
'messages': [{'role': 'user', 'content': message}],
'stream': True
},
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
try:
parsed = json.loads(data)
if 'content' in parsed:
print(parsed['content'], end='', flush=True)
yield parsed['content']
except json.JSONDecodeError:
pass
# Usage
for token in stream_chat('Tell me a story'):
pass # Tokens printed in real-time
JavaScript Implementation
async function streamChat(message, model = 'chat') {
const response = await fetch(`https://Rox-Turbo-API.hf.space/${model}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [{ role: 'user', content: message }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullContent = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6).trim();
if (data === '[DONE]') break;
try {
const parsed = JSON.parse(data);
if (parsed.content) {
fullContent += parsed.content;
console.log(parsed.content); // Process each token
}
} catch (e) {}
}
}
}
return fullContent;
}
// Usage
await streamChat('Tell me a story');
Node.js Implementation
const https = require('https');
function streamChat(message, model = 'chat') {
const data = JSON.stringify({
messages: [{ role: 'user', content: message }],
stream: true
});
const options = {
hostname: 'Rox-Turbo-API.hf.space',
path: `/${model}`,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': data.length
}
};
const req = https.request(options, (res) => {
res.on('data', (chunk) => {
const lines = chunk.toString().split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6).trim();
if (data === '[DONE]') return;
try {
const parsed = JSON.parse(data);
if (parsed.content) {
process.stdout.write(parsed.content);
}
} catch (e) {}
}
}
});
});
req.write(data);
req.end();
}
// Usage
streamChat('Tell me a story');
Model Selection
Choose the right model for your use case.
Available Models
| Model | Endpoint | Best For | Speed | Quality |
|---|---|---|---|---|
| Rox Core | /chat |
General conversation | Medium | High |
| Rox 2.1 Turbo | /turbo |
Quick responses | Fast | Good |
| Rox 3.5 Coder | /coder |
Code generation | Medium | High |
| Rox 4.5 Turbo | /turbo45 |
Fast reasoning | Fast | High |
| Rox 5 Ultra | /ultra |
Complex tasks | Slow | Highest |
| Rox 6 Dyno | /dyno |
Long context | Medium | High |
| Rox 7 Coder | /coder7 |
Advanced coding | Medium | Highest |
| Rox Vision Max | /vision |
Visual tasks | Medium | High |
Model Selection Guide
def select_model(task_type):
models = {
'chat': 'chat', # General conversation
'quick': 'turbo', # Fast responses
'code': 'coder', # Code generation
'reasoning': 'turbo45', # Complex reasoning
'complex': 'ultra', # Highest quality
'long': 'dyno', # Long documents
'advanced_code': 'coder7',# Advanced coding
'vision': 'vision' # Visual tasks
}
return models.get(task_type, 'chat')
# Usage
model = select_model('code')
response = ask_rox('Write a function', model=model)
Parameters
temperature
Controls randomness in responses.
Range: 0.0 to 2.0
Default: 0.7
- 0.0 - 0.3: Deterministic, focused (math, facts, code)
- 0.4 - 0.8: Balanced (general conversation)
- 0.9 - 2.0: Creative, varied (stories, brainstorming)
# Factual response
response = requests.post(url, json={
'messages': [{'role': 'user', 'content': 'What is 2+2?'}],
'temperature': 0.2
})
# Creative response
response = requests.post(url, json={
'messages': [{'role': 'user', 'content': 'Write a poem'}],
'temperature': 1.5
})
top_p
Controls diversity via nucleus sampling.
Range: 0.0 to 1.0
Default: 0.95
- 0.1 - 0.5: Narrow, focused
- 0.6 - 0.9: Balanced
- 0.9 - 1.0: Diverse
response = requests.post(url, json={
'messages': [{'role': 'user', 'content': 'Tell me about AI'}],
'top_p': 0.9
})
max_tokens
Maximum tokens in response.
Range: 1 to 32768
Default: 8192
Token estimation: ~1 token = 0.75 words
# Short response
response = requests.post(url, json={
'messages': [{'role': 'user', 'content': 'Brief summary'}],
'max_tokens': 100
})
# Long response
response = requests.post(url, json={
'messages': [{'role': 'user', 'content': 'Detailed explanation'}],
'max_tokens': 4096
})
stream
Enable streaming responses.
Type: Boolean
Default: false
response = requests.post(url, json={
'messages': [{'role': 'user', 'content': 'Hello'}],
'stream': True
}, stream=True)
Conversation Management
Single Turn
def ask_once(question):
response = requests.post(
'https://Rox-Turbo-API.hf.space/chat',
json={'messages': [{'role': 'user', 'content': question}]}
)
return response.json()['content']
Multi-Turn Conversation
class Conversation:
def __init__(self, model='chat', system_prompt=None):
self.model = model
self.messages = []
if system_prompt:
self.messages.append({'role': 'system', 'content': system_prompt})
def ask(self, message):
self.messages.append({'role': 'user', 'content': message})
response = requests.post(
f'https://Rox-Turbo-API.hf.space/{self.model}',
json={'messages': self.messages}
)
reply = response.json()['content']
self.messages.append({'role': 'assistant', 'content': reply})
return reply
def clear(self):
system_msg = [m for m in self.messages if m['role'] == 'system']
self.messages = system_msg
# Usage
conv = Conversation(system_prompt='You are a helpful assistant')
print(conv.ask('Hello'))
print(conv.ask('What is AI?'))
print(conv.ask('Tell me more'))
JavaScript Conversation Manager
class Conversation {
constructor(model = 'chat', systemPrompt = null) {
this.model = model;
this.messages = [];
if (systemPrompt) {
this.messages.push({ role: 'system', content: systemPrompt });
}
}
async ask(message) {
this.messages.push({ role: 'user', content: message });
const response = await fetch(`https://Rox-Turbo-API.hf.space/${this.model}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: this.messages })
});
const data = await response.json();
const reply = data.content;
this.messages.push({ role: 'assistant', content: reply });
return reply;
}
clear() {
const systemMsg = this.messages.filter(m => m.role === 'system');
this.messages = systemMsg;
}
}
// Usage
const conv = new Conversation('chat', 'You are a helpful assistant');
console.log(await conv.ask('Hello'));
console.log(await conv.ask('What is AI?'));
System Prompts
System prompts define the assistant's behavior.
def ask_with_personality(message, personality):
system_prompts = {
'professional': 'You are a professional business consultant.',
'casual': 'You are a friendly, casual assistant.',
'technical': 'You are a technical expert. Be precise and detailed.',
'creative': 'You are a creative writer. Be imaginative and expressive.'
}
messages = [
{'role': 'system', 'content': system_prompts.get(personality, '')},
{'role': 'user', 'content': message}
]
response = requests.post(
'https://Rox-Turbo-API.hf.space/chat',
json={'messages': messages}
)
return response.json()['content']
# Usage
answer = ask_with_personality('Explain AI', 'technical')
Error Handling
Basic Error Handling
def safe_request(message, model='chat'):
try:
response = requests.post(
f'https://Rox-Turbo-API.hf.space/{model}',
json={'messages': [{'role': 'user', 'content': message}]},
timeout=30
)
response.raise_for_status()
return response.json()['content']
except requests.exceptions.Timeout:
return "Request timed out. Please try again."
except requests.exceptions.HTTPError as e:
return f"HTTP error: {e.response.status_code}"
except requests.exceptions.RequestException as e:
return f"Request failed: {str(e)}"
except KeyError:
return "Invalid response format"
Advanced Error Handling with Retry
import time
def request_with_retry(message, model='chat', max_retries=3):
for attempt in range(max_retries):
try:
response = requests.post(
f'https://Rox-Turbo-API.hf.space/{model}',
json={'messages': [{'role': 'user', 'content': message}]},
timeout=30
)
response.raise_for_status()
return response.json()['content']
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
JavaScript Error Handling
async function safeRequest(message, model = 'chat') {
try {
const response = await fetch(`https://Rox-Turbo-API.hf.space/${model}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [{ role: 'user', content: message }]
})
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const data = await response.json();
return data.content;
} catch (error) {
console.error('Request failed:', error);
throw error;
}
}
Best Practices
1. Use Appropriate Models
Choose models based on your needs:
- Use
turbofor simple, fast responses - Use
coderfor code-related tasks - Use
ultrafor complex reasoning - Use
dynofor long documents
2. Optimize Parameters
# For factual questions
params = {'temperature': 0.2, 'max_tokens': 500}
# For creative tasks
params = {'temperature': 1.2, 'max_tokens': 2000}
# For code generation
params = {'temperature': 0.3, 'max_tokens': 4096}
3. Manage Context Length
def trim_conversation(messages, max_messages=10):
"""Keep only recent messages to manage context"""
system_msgs = [m for m in messages if m['role'] == 'system']
other_msgs = [m for m in messages if m['role'] != 'system']
return system_msgs + other_msgs[-max_messages:]
4. Implement Caching
from functools import lru_cache
import hashlib
@lru_cache(maxsize=100)
def cached_request(message_hash, model):
# Actual request implementation
pass
def ask_with_cache(message, model='chat'):
message_hash = hashlib.md5(message.encode()).hexdigest()
return cached_request(message_hash, model)
5. Rate Limiting
import time
from collections import deque
class RateLimiter:
def __init__(self, max_requests=10, time_window=60):
self.max_requests = max_requests
self.time_window = time_window
self.requests = deque()
def wait_if_needed(self):
now = time.time()
# Remove old requests
while self.requests and now - self.requests[0] > self.time_window:
self.requests.popleft()
# Wait if at limit
if len(self.requests) >= self.max_requests:
sleep_time = self.time_window - (now - self.requests[0])
if sleep_time > 0:
time.sleep(sleep_time)
self.requests.append(now)
limiter = RateLimiter(10, 60)
def rate_limited_request(message):
limiter.wait_if_needed()
return ask_rox(message)
6. Streaming for Long Responses
Use streaming for responses over 500 tokens to improve user experience.
7. Error Recovery
def robust_request(message, model='chat'):
fallback_models = ['chat', 'turbo', 'coder']
for fallback_model in fallback_models:
try:
return request_with_retry(message, fallback_model)
except Exception as e:
if fallback_model == fallback_models[-1]:
raise
continue
Code Examples
Complete Chatbot (Python)
import requests
import json
class RoxChatbot:
def __init__(self, model='chat', system_prompt=None):
self.model = model
self.base_url = 'https://Rox-Turbo-API.hf.space'
self.conversation = []
if system_prompt:
self.conversation.append({
'role': 'system',
'content': system_prompt
})
def chat(self, message, stream=False):
self.conversation.append({'role': 'user', 'content': message})
if stream:
return self._stream_chat()
else:
return self._standard_chat()
def _standard_chat(self):
response = requests.post(
f'{self.base_url}/{self.model}',
json={'messages': self.conversation}
)
reply = response.json()['content']
self.conversation.append({'role': 'assistant', 'content': reply})
return reply
def _stream_chat(self):
response = requests.post(
f'{self.base_url}/{self.model}',
json={'messages': self.conversation, 'stream': True},
stream=True
)
full_content = ''
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
try:
parsed = json.loads(data)
if 'content' in parsed:
full_content += parsed['content']
print(parsed['content'], end='', flush=True)
except json.JSONDecodeError:
pass
print() # New line after streaming
self.conversation.append({'role': 'assistant', 'content': full_content})
return full_content
def clear(self):
system_msgs = [m for m in self.conversation if m['role'] == 'system']
self.conversation = system_msgs
# Usage
bot = RoxChatbot(system_prompt='You are a helpful assistant')
print(bot.chat('Hello'))
print(bot.chat('What is AI?'))
bot.chat('Tell me a story', stream=True)
Complete Chatbot (JavaScript)
class RoxChatbot {
constructor(model = 'chat', systemPrompt = null) {
this.model = model;
this.baseUrl = 'https://Rox-Turbo-API.hf.space';
this.conversation = [];
if (systemPrompt) {
this.conversation.push({ role: 'system', content: systemPrompt });
}
}
async chat(message, stream = false) {
this.conversation.push({ role: 'user', content: message });
if (stream) {
return await this._streamChat();
} else {
return await this._standardChat();
}
}
async _standardChat() {
const response = await fetch(`${this.baseUrl}/${this.model}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: this.conversation })
});
const data = await response.json();
const reply = data.content;
this.conversation.push({ role: 'assistant', content: reply });
return reply;
}
async _streamChat() {
const response = await fetch(`${this.baseUrl}/${this.model}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: this.conversation,
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullContent = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6).trim();
if (data === '[DONE]') break;
try {
const parsed = JSON.parse(data);
if (parsed.content) {
fullContent += parsed.content;
process.stdout.write(parsed.content);
}
} catch (e) {}
}
}
}
console.log();
this.conversation.push({ role: 'assistant', content: fullContent });
return fullContent;
}
clear() {
const systemMsgs = this.conversation.filter(m => m.role === 'system');
this.conversation = systemMsgs;
}
}
// Usage
const bot = new RoxChatbot('chat', 'You are a helpful assistant');
console.log(await bot.chat('Hello'));
console.log(await bot.chat('What is AI?'));
await bot.chat('Tell me a story', true);
OpenAI SDK Compatibility
Rox AI is compatible with the OpenAI SDK.
Python with OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://Rox-Turbo-API.hf.space",
api_key="not-needed" # No API key required
)
# Standard request
response = client.chat.completions.create(
model="chat",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
# Streaming request
stream = client.chat.completions.create(
model="chat",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='', flush=True)
JavaScript with OpenAI SDK
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://Rox-Turbo-API.hf.space',
apiKey: 'not-needed'
});
// Standard request
const response = await client.chat.completions.create({
model: 'chat',
messages: [{ role: 'user', content: 'Hello' }]
});
console.log(response.choices[0].message.content);
// Streaming request
const stream = await client.chat.completions.create({
model: 'chat',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
Additional Resources
- API Reference - Complete API documentation
- Code Examples - Ready-to-use code snippets
- Model Guide - Detailed model information
Built by Mohammad Faiz