API / docs /DEVELOPER_GUIDE.md
Rox-Turbo's picture
Upload 4 files
46ec487 verified

Developer Guide

Complete guide for integrating Rox AI into your applications.

Base URL: https://Rox-Turbo-API.hf.space

Table of Contents

  1. Quick Start
  2. Authentication
  3. Making Requests
  4. Streaming Responses
  5. Model Selection
  6. Parameters
  7. Conversation Management
  8. Error Handling
  9. Best Practices
  10. Code Examples
  11. OpenAI SDK Compatibility

Quick Start

Send your first request in under 30 seconds.

cURL

curl -X POST https://Rox-Turbo-API.hf.space/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}]}'

Python

import requests

response = requests.post(
    'https://Rox-Turbo-API.hf.space/chat',
    json={'messages': [{'role': 'user', 'content': 'Hello'}]}
)
print(response.json()['content'])

JavaScript

const response = await fetch('https://Rox-Turbo-API.hf.space/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [{ role: 'user', content: 'Hello' }]
  })
});
const data = await response.json();
console.log(data.content);

Authentication

No API key required. All endpoints are publicly accessible.


Making Requests

Request Format

All endpoints accept POST requests with JSON body.

Required Fields:

  • messages: Array of message objects

Optional Fields:

  • temperature: Float (0.0 - 2.0, default: 0.7)
  • top_p: Float (0.0 - 1.0, default: 0.95)
  • max_tokens: Integer (1 - 32768, default: 8192)
  • stream: Boolean (default: false)

Message Object

{
  "role": "user" | "assistant" | "system",
  "content": "message text"
}

Complete Request Example

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is AI?"}
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 8192,
  "stream": false
}

Response Format

Standard Response:

{
  "content": "AI stands for Artificial Intelligence..."
}

Streaming Response:

data: {"content": "AI"}
data: {"content": " stands"}
data: {"content": " for"}
data: [DONE]

Streaming Responses

Streaming provides real-time token-by-token responses for better user experience.

When to Use Streaming

  • Long-form content generation
  • Interactive chat applications
  • Real-time feedback requirements
  • Improved perceived performance

Python Implementation

import requests
import json

def stream_chat(message, model='chat'):
    response = requests.post(
        f'https://Rox-Turbo-API.hf.space/{model}',
        json={
            'messages': [{'role': 'user', 'content': message}],
            'stream': True
        },
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data = line[6:]
                if data == '[DONE]':
                    break
                try:
                    parsed = json.loads(data)
                    if 'content' in parsed:
                        print(parsed['content'], end='', flush=True)
                        yield parsed['content']
                except json.JSONDecodeError:
                    pass

# Usage
for token in stream_chat('Tell me a story'):
    pass  # Tokens printed in real-time

JavaScript Implementation

async function streamChat(message, model = 'chat') {
  const response = await fetch(`https://Rox-Turbo-API.hf.space/${model}`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let fullContent = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value, { stream: true });
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6).trim();
        if (data === '[DONE]') break;

        try {
          const parsed = JSON.parse(data);
          if (parsed.content) {
            fullContent += parsed.content;
            console.log(parsed.content); // Process each token
          }
        } catch (e) {}
      }
    }
  }

  return fullContent;
}

// Usage
await streamChat('Tell me a story');

Node.js Implementation

const https = require('https');

function streamChat(message, model = 'chat') {
  const data = JSON.stringify({
    messages: [{ role: 'user', content: message }],
    stream: true
  });

  const options = {
    hostname: 'Rox-Turbo-API.hf.space',
    path: `/${model}`,
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Content-Length': data.length
    }
  };

  const req = https.request(options, (res) => {
    res.on('data', (chunk) => {
      const lines = chunk.toString().split('\n');
      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6).trim();
          if (data === '[DONE]') return;
          
          try {
            const parsed = JSON.parse(data);
            if (parsed.content) {
              process.stdout.write(parsed.content);
            }
          } catch (e) {}
        }
      }
    });
  });

  req.write(data);
  req.end();
}

// Usage
streamChat('Tell me a story');

Model Selection

Choose the right model for your use case.

Available Models

Model Endpoint Best For Speed Quality
Rox Core /chat General conversation Medium High
Rox 2.1 Turbo /turbo Quick responses Fast Good
Rox 3.5 Coder /coder Code generation Medium High
Rox 4.5 Turbo /turbo45 Fast reasoning Fast High
Rox 5 Ultra /ultra Complex tasks Slow Highest
Rox 6 Dyno /dyno Long context Medium High
Rox 7 Coder /coder7 Advanced coding Medium Highest
Rox Vision Max /vision Visual tasks Medium High

Model Selection Guide

def select_model(task_type):
    models = {
        'chat': 'chat',           # General conversation
        'quick': 'turbo',         # Fast responses
        'code': 'coder',          # Code generation
        'reasoning': 'turbo45',   # Complex reasoning
        'complex': 'ultra',       # Highest quality
        'long': 'dyno',           # Long documents
        'advanced_code': 'coder7',# Advanced coding
        'vision': 'vision'        # Visual tasks
    }
    return models.get(task_type, 'chat')

# Usage
model = select_model('code')
response = ask_rox('Write a function', model=model)

Parameters

temperature

Controls randomness in responses.

Range: 0.0 to 2.0
Default: 0.7

  • 0.0 - 0.3: Deterministic, focused (math, facts, code)
  • 0.4 - 0.8: Balanced (general conversation)
  • 0.9 - 2.0: Creative, varied (stories, brainstorming)
# Factual response
response = requests.post(url, json={
    'messages': [{'role': 'user', 'content': 'What is 2+2?'}],
    'temperature': 0.2
})

# Creative response
response = requests.post(url, json={
    'messages': [{'role': 'user', 'content': 'Write a poem'}],
    'temperature': 1.5
})

top_p

Controls diversity via nucleus sampling.

Range: 0.0 to 1.0
Default: 0.95

  • 0.1 - 0.5: Narrow, focused
  • 0.6 - 0.9: Balanced
  • 0.9 - 1.0: Diverse
response = requests.post(url, json={
    'messages': [{'role': 'user', 'content': 'Tell me about AI'}],
    'top_p': 0.9
})

max_tokens

Maximum tokens in response.

Range: 1 to 32768
Default: 8192

Token estimation: ~1 token = 0.75 words

# Short response
response = requests.post(url, json={
    'messages': [{'role': 'user', 'content': 'Brief summary'}],
    'max_tokens': 100
})

# Long response
response = requests.post(url, json={
    'messages': [{'role': 'user', 'content': 'Detailed explanation'}],
    'max_tokens': 4096
})

stream

Enable streaming responses.

Type: Boolean
Default: false

response = requests.post(url, json={
    'messages': [{'role': 'user', 'content': 'Hello'}],
    'stream': True
}, stream=True)

Conversation Management

Single Turn

def ask_once(question):
    response = requests.post(
        'https://Rox-Turbo-API.hf.space/chat',
        json={'messages': [{'role': 'user', 'content': question}]}
    )
    return response.json()['content']

Multi-Turn Conversation

class Conversation:
    def __init__(self, model='chat', system_prompt=None):
        self.model = model
        self.messages = []
        if system_prompt:
            self.messages.append({'role': 'system', 'content': system_prompt})
    
    def ask(self, message):
        self.messages.append({'role': 'user', 'content': message})
        
        response = requests.post(
            f'https://Rox-Turbo-API.hf.space/{self.model}',
            json={'messages': self.messages}
        )
        
        reply = response.json()['content']
        self.messages.append({'role': 'assistant', 'content': reply})
        return reply
    
    def clear(self):
        system_msg = [m for m in self.messages if m['role'] == 'system']
        self.messages = system_msg

# Usage
conv = Conversation(system_prompt='You are a helpful assistant')
print(conv.ask('Hello'))
print(conv.ask('What is AI?'))
print(conv.ask('Tell me more'))

JavaScript Conversation Manager

class Conversation {
  constructor(model = 'chat', systemPrompt = null) {
    this.model = model;
    this.messages = [];
    if (systemPrompt) {
      this.messages.push({ role: 'system', content: systemPrompt });
    }
  }
  
  async ask(message) {
    this.messages.push({ role: 'user', content: message });
    
    const response = await fetch(`https://Rox-Turbo-API.hf.space/${this.model}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: this.messages })
    });
    
    const data = await response.json();
    const reply = data.content;
    
    this.messages.push({ role: 'assistant', content: reply });
    return reply;
  }
  
  clear() {
    const systemMsg = this.messages.filter(m => m.role === 'system');
    this.messages = systemMsg;
  }
}

// Usage
const conv = new Conversation('chat', 'You are a helpful assistant');
console.log(await conv.ask('Hello'));
console.log(await conv.ask('What is AI?'));

System Prompts

System prompts define the assistant's behavior.

def ask_with_personality(message, personality):
    system_prompts = {
        'professional': 'You are a professional business consultant.',
        'casual': 'You are a friendly, casual assistant.',
        'technical': 'You are a technical expert. Be precise and detailed.',
        'creative': 'You are a creative writer. Be imaginative and expressive.'
    }
    
    messages = [
        {'role': 'system', 'content': system_prompts.get(personality, '')},
        {'role': 'user', 'content': message}
    ]
    
    response = requests.post(
        'https://Rox-Turbo-API.hf.space/chat',
        json={'messages': messages}
    )
    return response.json()['content']

# Usage
answer = ask_with_personality('Explain AI', 'technical')

Error Handling

Basic Error Handling

def safe_request(message, model='chat'):
    try:
        response = requests.post(
            f'https://Rox-Turbo-API.hf.space/{model}',
            json={'messages': [{'role': 'user', 'content': message}]},
            timeout=30
        )
        response.raise_for_status()
        return response.json()['content']
    except requests.exceptions.Timeout:
        return "Request timed out. Please try again."
    except requests.exceptions.HTTPError as e:
        return f"HTTP error: {e.response.status_code}"
    except requests.exceptions.RequestException as e:
        return f"Request failed: {str(e)}"
    except KeyError:
        return "Invalid response format"

Advanced Error Handling with Retry

import time

def request_with_retry(message, model='chat', max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                f'https://Rox-Turbo-API.hf.space/{model}',
                json={'messages': [{'role': 'user', 'content': message}]},
                timeout=30
            )
            response.raise_for_status()
            return response.json()['content']
        except requests.exceptions.RequestException as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)

JavaScript Error Handling

async function safeRequest(message, model = 'chat') {
  try {
    const response = await fetch(`https://Rox-Turbo-API.hf.space/${model}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        messages: [{ role: 'user', content: message }]
      })
    });

    if (!response.ok) {
      throw new Error(`HTTP ${response.status}: ${response.statusText}`);
    }

    const data = await response.json();
    return data.content;
  } catch (error) {
    console.error('Request failed:', error);
    throw error;
  }
}

Best Practices

1. Use Appropriate Models

Choose models based on your needs:

  • Use turbo for simple, fast responses
  • Use coder for code-related tasks
  • Use ultra for complex reasoning
  • Use dyno for long documents

2. Optimize Parameters

# For factual questions
params = {'temperature': 0.2, 'max_tokens': 500}

# For creative tasks
params = {'temperature': 1.2, 'max_tokens': 2000}

# For code generation
params = {'temperature': 0.3, 'max_tokens': 4096}

3. Manage Context Length

def trim_conversation(messages, max_messages=10):
    """Keep only recent messages to manage context"""
    system_msgs = [m for m in messages if m['role'] == 'system']
    other_msgs = [m for m in messages if m['role'] != 'system']
    return system_msgs + other_msgs[-max_messages:]

4. Implement Caching

from functools import lru_cache
import hashlib

@lru_cache(maxsize=100)
def cached_request(message_hash, model):
    # Actual request implementation
    pass

def ask_with_cache(message, model='chat'):
    message_hash = hashlib.md5(message.encode()).hexdigest()
    return cached_request(message_hash, model)

5. Rate Limiting

import time
from collections import deque

class RateLimiter:
    def __init__(self, max_requests=10, time_window=60):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = deque()
    
    def wait_if_needed(self):
        now = time.time()
        
        # Remove old requests
        while self.requests and now - self.requests[0] > self.time_window:
            self.requests.popleft()
        
        # Wait if at limit
        if len(self.requests) >= self.max_requests:
            sleep_time = self.time_window - (now - self.requests[0])
            if sleep_time > 0:
                time.sleep(sleep_time)
        
        self.requests.append(now)

limiter = RateLimiter(10, 60)

def rate_limited_request(message):
    limiter.wait_if_needed()
    return ask_rox(message)

6. Streaming for Long Responses

Use streaming for responses over 500 tokens to improve user experience.

7. Error Recovery

def robust_request(message, model='chat'):
    fallback_models = ['chat', 'turbo', 'coder']
    
    for fallback_model in fallback_models:
        try:
            return request_with_retry(message, fallback_model)
        except Exception as e:
            if fallback_model == fallback_models[-1]:
                raise
            continue

Code Examples

Complete Chatbot (Python)

import requests
import json

class RoxChatbot:
    def __init__(self, model='chat', system_prompt=None):
        self.model = model
        self.base_url = 'https://Rox-Turbo-API.hf.space'
        self.conversation = []
        
        if system_prompt:
            self.conversation.append({
                'role': 'system',
                'content': system_prompt
            })
    
    def chat(self, message, stream=False):
        self.conversation.append({'role': 'user', 'content': message})
        
        if stream:
            return self._stream_chat()
        else:
            return self._standard_chat()
    
    def _standard_chat(self):
        response = requests.post(
            f'{self.base_url}/{self.model}',
            json={'messages': self.conversation}
        )
        
        reply = response.json()['content']
        self.conversation.append({'role': 'assistant', 'content': reply})
        return reply
    
    def _stream_chat(self):
        response = requests.post(
            f'{self.base_url}/{self.model}',
            json={'messages': self.conversation, 'stream': True},
            stream=True
        )
        
        full_content = ''
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data = line[6:]
                    if data == '[DONE]':
                        break
                    try:
                        parsed = json.loads(data)
                        if 'content' in parsed:
                            full_content += parsed['content']
                            print(parsed['content'], end='', flush=True)
                    except json.JSONDecodeError:
                        pass
        
        print()  # New line after streaming
        self.conversation.append({'role': 'assistant', 'content': full_content})
        return full_content
    
    def clear(self):
        system_msgs = [m for m in self.conversation if m['role'] == 'system']
        self.conversation = system_msgs

# Usage
bot = RoxChatbot(system_prompt='You are a helpful assistant')
print(bot.chat('Hello'))
print(bot.chat('What is AI?'))
bot.chat('Tell me a story', stream=True)

Complete Chatbot (JavaScript)

class RoxChatbot {
  constructor(model = 'chat', systemPrompt = null) {
    this.model = model;
    this.baseUrl = 'https://Rox-Turbo-API.hf.space';
    this.conversation = [];
    
    if (systemPrompt) {
      this.conversation.push({ role: 'system', content: systemPrompt });
    }
  }
  
  async chat(message, stream = false) {
    this.conversation.push({ role: 'user', content: message });
    
    if (stream) {
      return await this._streamChat();
    } else {
      return await this._standardChat();
    }
  }
  
  async _standardChat() {
    const response = await fetch(`${this.baseUrl}/${this.model}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: this.conversation })
    });
    
    const data = await response.json();
    const reply = data.content;
    
    this.conversation.push({ role: 'assistant', content: reply });
    return reply;
  }
  
  async _streamChat() {
    const response = await fetch(`${this.baseUrl}/${this.model}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        messages: this.conversation,
        stream: true
      })
    });
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let fullContent = '';
    
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      
      const chunk = decoder.decode(value, { stream: true });
      const lines = chunk.split('\n');
      
      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6).trim();
          if (data === '[DONE]') break;
          
          try {
            const parsed = JSON.parse(data);
            if (parsed.content) {
              fullContent += parsed.content;
              process.stdout.write(parsed.content);
            }
          } catch (e) {}
        }
      }
    }
    
    console.log();
    this.conversation.push({ role: 'assistant', content: fullContent });
    return fullContent;
  }
  
  clear() {
    const systemMsgs = this.conversation.filter(m => m.role === 'system');
    this.conversation = systemMsgs;
  }
}

// Usage
const bot = new RoxChatbot('chat', 'You are a helpful assistant');
console.log(await bot.chat('Hello'));
console.log(await bot.chat('What is AI?'));
await bot.chat('Tell me a story', true);

OpenAI SDK Compatibility

Rox AI is compatible with the OpenAI SDK.

Python with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://Rox-Turbo-API.hf.space",
    api_key="not-needed"  # No API key required
)

# Standard request
response = client.chat.completions.create(
    model="chat",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

# Streaming request
stream = client.chat.completions.create(
    model="chat",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='', flush=True)

JavaScript with OpenAI SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://Rox-Turbo-API.hf.space',
  apiKey: 'not-needed'
});

// Standard request
const response = await client.chat.completions.create({
  model: 'chat',
  messages: [{ role: 'user', content: 'Hello' }]
});
console.log(response.choices[0].message.content);

// Streaming request
const stream = await client.chat.completions.create({
  model: 'chat',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

Additional Resources


Built by Mohammad Faiz