lightweight-ai-backend / API_REFERENCE.md
AI Backend Deploy
Deploy Lightweight AI Backend (2026-02-23 19:37:44)
d39e477
# API REFERENCE - Lightweight AI Backend
Quick reference for integrating the API endpoints into your frontend projects.
## πŸ”— Base URL
```
https://your-username-lightweight-ai-backend.hf.space
```
---
## πŸ“‘ Available Endpoints
All endpoints are accessible via HTTP POST requests to `/api/predict` with different parameters.
### 1. Generate Chat
**Purpose:** General conversational AI responses
**Endpoint:** `POST /api/predict`
**Request:**
```json
{
"data": [
"Your question or prompt here",
150,
0.7
]
}
```
**Parameters:**
| Index | Name | Type | Range | Default | Description |
|-------|------|------|-------|---------|-------------|
| 0 | prompt | string | N/A | N/A | The user's question or message |
| 1 | max_tokens | int | 50-200 | 150 | Maximum length of response |
| 2 | temperature | float | 0.1-1.0 | 0.7 | Randomness (0=deterministic, 1=creative) |
**Response:**
```json
{
"data": [
"Your question or prompt here response from the model..."
]
}
```
**Examples:**
**Python:**
```python
import requests
response = requests.post(
"https://your-space-url/api/predict",
json={"data": ["What is Python?", 150, 0.7]}
)
result = response.json()["data"][0]
print(result)
```
**JavaScript:**
```javascript
const response = await fetch('https://your-space-url/api/predict', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({data: ["What is AI?", 150, 0.7]})
});
const result = await response.json();
console.log(result.data[0]);
```
**cURL:**
```bash
curl -X POST https://your-space-url/api/predict \
-H "Content-Type: application/json" \
-d '{"data": ["Hello!", 150, 0.7]}'
```
---
### 2. Generate Code
**Purpose:** Generate code based on descriptions
**Endpoint:** `POST /api/predict`
**Request:**
```json
{
"data": [
"Write a Python function to reverse a string",
256,
0.3
]
}
```
**Parameters:**
| Index | Name | Type | Range | Default | Description |
|-------|------|------|-------|---------|-------------|
| 0 | prompt | string | N/A | N/A | Description of the code to generate |
| 1 | max_tokens | int | 100-300 | 256 | Maximum code length |
| 2 | temperature | float | 0.1-1.0 | 0.3 | Lower = more deterministic code |
**Response:**
```json
{
"data": [
"def reverse_string(s):\n return s[::-1]\n\n# Usage\nprint(reverse_string('hello'))..."
]
}
```
**Example:**
**Python:**
```python
response = requests.post(
"https://your-space-url/api/predict",
json={"data": ["Create a function that calculates factorial", 256, 0.3]}
)
code = response.json()["data"][0]
print(code)
```
---
### 3. Summarize Text
**Purpose:** Generate summaries of long text
**Endpoint:** `POST /api/predict`
**Request:**
```json
{
"data": [
"Long text to summarize goes here... at least 50 characters.",
100
]
}
```
**Parameters:**
| Index | Name | Type | Range | Default | Description |
|-------|------|------|-------|---------|-------------|
| 0 | text | string | 50+ chars | N/A | Text to summarize |
| 1 | max_length | int | 20-150 | 100 | Maximum summary length |
**Response:**
```json
{
"data": [
"Summary of the provided text..."
]
}
```
**Example:**
**Python:**
```python
long_text = """
Machine learning is a subset of artificial intelligence (AI) that focuses
on enabling systems to learn from and make decisions based on data...
"""
response = requests.post(
"https://your-space-url/api/predict",
json={"data": [long_text, 100]}
)
summary = response.json()["data"][0]
print(summary)
```
---
### 4. Generate Image
**Purpose:** Generate images from text descriptions
**Endpoint:** `POST /api/predict`
**Request:**
```json
{
"data": [
"A sunset over mountains",
256,
256
]
}
```
**Parameters:**
| Index | Name | Type | Range | Default | Description |
|-------|------|------|-------|---------|-------------|
| 0 | prompt | string | N/A | N/A | Image description |
| 1 | width | int | 128-256 | 256 | Image width in pixels |
| 2 | height | int | 128-256 | 256 | Image height in pixels |
**Response:**
Image returned as binary data (PNG format)
**Example:**
**Python:**
```python
from PIL import Image
from io import BytesIO
response = requests.post(
"https://your-space-url/api/predict",
json={"data": ["A red sunset", 256, 256]}
)
# Save image from response
with open('generated_image.png', 'wb') as f:
f.write(response.content)
# Or load as PIL Image
img = Image.open(BytesIO(response.content))
img.show()
```
**JavaScript (for frontend):**
```javascript
const response = await fetch('https://your-space-url/api/predict', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({data: ["A blue ocean", 256, 256]})
});
// Get image blob
const blob = await response.blob();
const url = URL.createObjectURL(blob);
// Display in image element
document.getElementById('image').src = url;
```
---
## πŸ”„ Response Codes
| Code | Meaning | Solution |
|------|---------|----------|
| 200 | Success | Response contains generated output |
| 400 | Bad Request | Check parameters (wrong JSON format) |
| 503 | Service Unavailable | Space is starting/restarting (wait 1-2 min) |
| 504 | Timeout | Request took too long (try shorter max_tokens) |
---
## ⏱️ Performance Tips
### Reduce Latency
1. **Use lower max_tokens:**
```python
# Fast: 50-100 tokens
max_tokens = 75 # ~2-3 seconds
# Medium: 100-200 tokens
max_tokens = 150 # ~4-6 seconds
# Slow: 200-300 tokens
max_tokens = 250 # ~8-12 seconds
```
2. **Warm up the model:**
- First request loads the model (5-10 seconds)
- Subsequent requests are faster
- Consider sending a "warm-up" request on app startup
3. **Batch similar requests:**
- Queue requests intelligently
- Don't send all at once
### Error Handling
```python
import requests
import time
def call_api_with_retry(url, data, max_retries=3):
"""Call API with retry logic"""
for attempt in range(max_retries):
try:
response = requests.post(
url,
json={"data": data},
timeout=60
)
if response.status_code == 200:
return response.json()["data"][0]
elif response.status_code == 503:
# Service restarting, wait and retry
time.sleep(5)
continue
else:
return f"Error: {response.status_code}"
except requests.exceptions.Timeout:
if attempt < max_retries - 1:
print("Timeout, retrying...")
time.sleep(2)
else:
return "Error: Request timeout"
return "Error: Max retries exceeded"
# Usage
result = call_api_with_retry(
"https://your-space-url/api/predict",
["Your prompt", 150, 0.7]
)
print(result)
```
---
## πŸ’‘ Integration Examples
### React Frontend
```jsx
import React, { useState } from 'react';
export default function ChatApp() {
const [input, setInput] = useState('');
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);
const handleSubmit = async (e) => {
e.preventDefault();
setLoading(true);
try {
const result = await fetch(
'https://your-space-url/api/predict',
{
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({data: [input, 150, 0.7]})
}
);
const data = await result.json();
setResponse(data.data[0]);
} catch (error) {
setResponse('Error: ' + error.message);
} finally {
setLoading(false);
}
};
return (
<div>
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask me anything..."
/>
<button type="submit" disabled={loading}>
{loading ? 'Generating...' : 'Send'}
</button>
</form>
{response && <div>{response}</div>}
</div>
);
}
```
### Vue.js
```vue
<template>
<div>
<input v-model="prompt" placeholder="Ask a question..." />
<button @click="generateResponse" :disabled="loading">
{{ loading ? 'Generating...' : 'Send' }}
</button>
<p v-if="response">{{ response }}</p>
</div>
</template>
<script>
export default {
data() {
return {
prompt: '',
response: '',
loading: false
};
},
methods: {
async generateResponse() {
this.loading = true;
try {
const res = await fetch(
'https://your-space-url/api/predict',
{
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({data: [this.prompt, 150, 0.7]})
}
);
const data = await res.json();
this.response = data.data[0];
} catch (error) {
this.response = 'Error: ' + error.message;
} finally {
this.loading = false;
}
}
}
};
</script>
```
### Node.js Backend
```javascript
const express = require('express');
const axios = require('axios');
const app = express();
app.use(express.json());
app.post('/chat', async (req, res) => {
const { prompt } = req.body;
try {
const response = await axios.post(
'https://your-space-url/api/predict',
{
data: [prompt, 150, 0.7]
}
);
res.json({ response: response.data.data[0] });
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => console.log('Server running on :3000'));
```
---
## πŸ” Important Notes
### Rate Limiting
- Free tier: ~2 requests per second
- Space sleeps after 48h inactivity (wakes on request)
- No hard quota, but be respectful
### Data Privacy
- All requests processed on Space server
- No data sent to external APIs
- Check Hugging Face privacy policy
### Bandwidth
- Requests are queued and processed sequentially
- Typical response: < 2MB
- No file uploads supported
---
## πŸ“ž Troubleshooting API Calls
### 503 Service Unavailable
```
Cause: Space restarting or models loading
Solution: Wait 30-60 seconds and retry
```
### 504 Gateway Timeout
```
Cause: Request took >60 seconds
Solution: Reduce max_tokens or try simpler prompt
```
### Empty Response
```
Cause: Model failed silently
Solution: Check Space logs, try different prompt
```
### Wrong Response Format
```
Cause: Endpoint called incorrectly
Solution: Ensure {"data": [arg1, arg2, ...]} structure
```
---
## 🎯 Production Checklist
- [ ] Replace `your-space-url` with actual URL
- [ ] Add error handling for API failures
- [ ] Implement request timeout (60s)
- [ ] Add retry logic (exponential backoff)
- [ ] Monitor API response times
- [ ] Cache responses if possible
- [ ] Set up alerting for 503/504 errors
- [ ] Test under expected load
- [ ] Document API usage in your project
---
**API Reference v1.0**
**Last Updated: 2024**