---
title: Streaming LLM API
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
---

# Hugging Face Space Streaming LLM Inference API

A lightweight Hugging Face Space API server for real-time token streaming with **Qwen2.5-0.5B-Instruct**.

## Features

- FastAPI server with SSE streaming endpoint
- One-time model/tokenizer loading during startup
- Configurable generation parameters (`max_tokens`, `temperature`, `top_p`)
- Efficient inference with `torch.no_grad()` and `device_map="auto"`
- Request validation and clear error responses

## Model

- **Primary model:** `Qwen/Qwen2.5-0.5B-Instruct`
- Automatically downloaded from Hugging Face at startup

## File Structure

- `app.py`
- `requirements.txt`
- `README.md`
- `Dockerfile`

## Requirements

```txt
transformers
accelerate
torch
fastapi
uvicorn
pydantic
```

## Run Locally

```bash
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
```

## API

### `POST /generate_stream`

Request JSON:

```json
{
  "prompt": "user prompt text",
  "max_tokens": 512,
  "temperature": 0.7,
  "top_p": 0.9
}
```

- `prompt` is required and must not be empty.
- `max_tokens`, `temperature`, and `top_p` are optional.

Response:

- Content type: `text/event-stream`
- Streams generated text chunks incrementally as SSE events.

## Example cURL

```bash
curl -N -X POST "https://your-space-name.hf.space/generate_stream" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Explain artificial intelligence"}'
```

## Backend Integration Flow

1. Backend sends prompt to Hugging Face Space.
2. Space generates and streams tokens.
3. Backend relays streamed tokens to client in real time.

## Hugging Face Space Setup

- Space SDK: **Docker**
- Ensure app starts with `uvicorn app:app --host 0.0.0.0 --port 7860`
- Expose port `7860`

## Notes

- The first startup may take longer due to model download.
- Keep model loading in startup lifecycle so it is initialized once.