lightweight-ai-backend / API_REFERENCE.md
AI Backend Deploy
Deploy Lightweight AI Backend (2026-02-23 19:37:44)
d39e477

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

API REFERENCE - Lightweight AI Backend

Quick reference for integrating the API endpoints into your frontend projects.

πŸ”— Base URL

https://your-username-lightweight-ai-backend.hf.space

πŸ“‘ Available Endpoints

All endpoints are accessible via HTTP POST requests to /api/predict with different parameters.

1. Generate Chat

Purpose: General conversational AI responses

Endpoint: POST /api/predict

Request:

{
  "data": [
    "Your question or prompt here",
    150,
    0.7
  ]
}

Parameters:

Index Name Type Range Default Description
0 prompt string N/A N/A The user's question or message
1 max_tokens int 50-200 150 Maximum length of response
2 temperature float 0.1-1.0 0.7 Randomness (0=deterministic, 1=creative)

Response:

{
  "data": [
    "Your question or prompt here response from the model..."
  ]
}

Examples:

Python:

import requests

response = requests.post(
    "https://your-space-url/api/predict",
    json={"data": ["What is Python?", 150, 0.7]}
)
result = response.json()["data"][0]
print(result)

JavaScript:

const response = await fetch('https://your-space-url/api/predict', {
  method: 'POST',
  headers: {'Content-Type': 'application/json'},
  body: JSON.stringify({data: ["What is AI?", 150, 0.7]})
});
const result = await response.json();
console.log(result.data[0]);

cURL:

curl -X POST https://your-space-url/api/predict \
  -H "Content-Type: application/json" \
  -d '{"data": ["Hello!", 150, 0.7]}'

2. Generate Code

Purpose: Generate code based on descriptions

Endpoint: POST /api/predict

Request:

{
  "data": [
    "Write a Python function to reverse a string",
    256,
    0.3
  ]
}

Parameters:

Index Name Type Range Default Description
0 prompt string N/A N/A Description of the code to generate
1 max_tokens int 100-300 256 Maximum code length
2 temperature float 0.1-1.0 0.3 Lower = more deterministic code

Response:

{
  "data": [
    "def reverse_string(s):\n    return s[::-1]\n\n# Usage\nprint(reverse_string('hello'))..."
  ]
}

Example:

Python:

response = requests.post(
    "https://your-space-url/api/predict",
    json={"data": ["Create a function that calculates factorial", 256, 0.3]}
)
code = response.json()["data"][0]
print(code)

3. Summarize Text

Purpose: Generate summaries of long text

Endpoint: POST /api/predict

Request:

{
  "data": [
    "Long text to summarize goes here... at least 50 characters.",
    100
  ]
}

Parameters:

Index Name Type Range Default Description
0 text string 50+ chars N/A Text to summarize
1 max_length int 20-150 100 Maximum summary length

Response:

{
  "data": [
    "Summary of the provided text..."
  ]
}

Example:

Python:

long_text = """
Machine learning is a subset of artificial intelligence (AI) that focuses 
on enabling systems to learn from and make decisions based on data...
"""

response = requests.post(
    "https://your-space-url/api/predict",
    json={"data": [long_text, 100]}
)
summary = response.json()["data"][0]
print(summary)

4. Generate Image

Purpose: Generate images from text descriptions

Endpoint: POST /api/predict

Request:

{
  "data": [
    "A sunset over mountains",
    256,
    256
  ]
}

Parameters:

Index Name Type Range Default Description
0 prompt string N/A N/A Image description
1 width int 128-256 256 Image width in pixels
2 height int 128-256 256 Image height in pixels

Response: Image returned as binary data (PNG format)

Example:

Python:

from PIL import Image
from io import BytesIO

response = requests.post(
    "https://your-space-url/api/predict",
    json={"data": ["A red sunset", 256, 256]}
)

# Save image from response
with open('generated_image.png', 'wb') as f:
    f.write(response.content)

# Or load as PIL Image
img = Image.open(BytesIO(response.content))
img.show()

JavaScript (for frontend):

const response = await fetch('https://your-space-url/api/predict', {
  method: 'POST',
  headers: {'Content-Type': 'application/json'},
  body: JSON.stringify({data: ["A blue ocean", 256, 256]})
});

// Get image blob
const blob = await response.blob();
const url = URL.createObjectURL(blob);

// Display in image element
document.getElementById('image').src = url;

πŸ”„ Response Codes

Code Meaning Solution
200 Success Response contains generated output
400 Bad Request Check parameters (wrong JSON format)
503 Service Unavailable Space is starting/restarting (wait 1-2 min)
504 Timeout Request took too long (try shorter max_tokens)

⏱️ Performance Tips

Reduce Latency

  1. Use lower max_tokens:

    # Fast: 50-100 tokens
    max_tokens = 75  # ~2-3 seconds
    
    # Medium: 100-200 tokens
    max_tokens = 150  # ~4-6 seconds
    
    # Slow: 200-300 tokens
    max_tokens = 250  # ~8-12 seconds
    
  2. Warm up the model:

    • First request loads the model (5-10 seconds)
    • Subsequent requests are faster
    • Consider sending a "warm-up" request on app startup
  3. Batch similar requests:

    • Queue requests intelligently
    • Don't send all at once

Error Handling

import requests
import time

def call_api_with_retry(url, data, max_retries=3):
    """Call API with retry logic"""
    for attempt in range(max_retries):
        try:
            response = requests.post(
                url,
                json={"data": data},
                timeout=60
            )
            if response.status_code == 200:
                return response.json()["data"][0]
            elif response.status_code == 503:
                # Service restarting, wait and retry
                time.sleep(5)
                continue
            else:
                return f"Error: {response.status_code}"
        except requests.exceptions.Timeout:
            if attempt < max_retries - 1:
                print("Timeout, retrying...")
                time.sleep(2)
            else:
                return "Error: Request timeout"
    
    return "Error: Max retries exceeded"

# Usage
result = call_api_with_retry(
    "https://your-space-url/api/predict",
    ["Your prompt", 150, 0.7]
)
print(result)

πŸ’‘ Integration Examples

React Frontend

import React, { useState } from 'react';

export default function ChatApp() {
  const [input, setInput] = useState('');
  const [response, setResponse] = useState('');
  const [loading, setLoading] = useState(false);

  const handleSubmit = async (e) => {
    e.preventDefault();
    setLoading(true);

    try {
      const result = await fetch(
        'https://your-space-url/api/predict',
        {
          method: 'POST',
          headers: {'Content-Type': 'application/json'},
          body: JSON.stringify({data: [input, 150, 0.7]})
        }
      );
      
      const data = await result.json();
      setResponse(data.data[0]);
    } catch (error) {
      setResponse('Error: ' + error.message);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask me anything..."
        />
        <button type="submit" disabled={loading}>
          {loading ? 'Generating...' : 'Send'}
        </button>
      </form>
      {response && <div>{response}</div>}
    </div>
  );
}

Vue.js

<template>
  <div>
    <input v-model="prompt" placeholder="Ask a question..." />
    <button @click="generateResponse" :disabled="loading">
      {{ loading ? 'Generating...' : 'Send' }}
    </button>
    <p v-if="response">{{ response }}</p>
  </div>
</template>

<script>
export default {
  data() {
    return {
      prompt: '',
      response: '',
      loading: false
    };
  },
  methods: {
    async generateResponse() {
      this.loading = true;
      try {
        const res = await fetch(
          'https://your-space-url/api/predict',
          {
            method: 'POST',
            headers: {'Content-Type': 'application/json'},
            body: JSON.stringify({data: [this.prompt, 150, 0.7]})
          }
        );
        const data = await res.json();
        this.response = data.data[0];
      } catch (error) {
        this.response = 'Error: ' + error.message;
      } finally {
        this.loading = false;
      }
    }
  }
};
</script>

Node.js Backend

const express = require('express');
const axios = require('axios');

const app = express();
app.use(express.json());

app.post('/chat', async (req, res) => {
  const { prompt } = req.body;

  try {
    const response = await axios.post(
      'https://your-space-url/api/predict',
      {
        data: [prompt, 150, 0.7]
      }
    );

    res.json({ response: response.data.data[0] });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.listen(3000, () => console.log('Server running on :3000'));

πŸ” Important Notes

Rate Limiting

  • Free tier: ~2 requests per second
  • Space sleeps after 48h inactivity (wakes on request)
  • No hard quota, but be respectful

Data Privacy

  • All requests processed on Space server
  • No data sent to external APIs
  • Check Hugging Face privacy policy

Bandwidth

  • Requests are queued and processed sequentially
  • Typical response: < 2MB
  • No file uploads supported

πŸ“ž Troubleshooting API Calls

503 Service Unavailable

Cause: Space restarting or models loading
Solution: Wait 30-60 seconds and retry

504 Gateway Timeout

Cause: Request took >60 seconds
Solution: Reduce max_tokens or try simpler prompt

Empty Response

Cause: Model failed silently
Solution: Check Space logs, try different prompt

Wrong Response Format

Cause: Endpoint called incorrectly
Solution: Ensure {"data": [arg1, arg2, ...]} structure

🎯 Production Checklist

  • Replace your-space-url with actual URL
  • Add error handling for API failures
  • Implement request timeout (60s)
  • Add retry logic (exponential backoff)
  • Monitor API response times
  • Cache responses if possible
  • Set up alerting for 503/504 errors
  • Test under expected load
  • Document API usage in your project

API Reference v1.0 Last Updated: 2024