File size: 2,501 Bytes
e7916fb
877b44a
 
 
 
e7916fb
877b44a
 
e7916fb
877b44a
e7916fb
 
877b44a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
title: LLM Error Classifier API
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 20.10.24
app_file: main.py
pinned: false
license: mit
---

# LLM Error Classifier API

FastAPI backend serving the fine-tuned Llama-3.2-3B model for tool-use error classification.

## API Endpoints

- `POST /api/classify` - Classify a tool call
- `GET /api/examples` - Get example inputs
- `GET /health` - Health check

## Model

Model: `daoqm123/llm-error-classifier`

## Usage

The API will automatically load the model from HuggingFace Hub on startup.

## Deploying to Hugging Face Spaces

1. **Create a Space**
   - Go to https://huggingface.co/spaces/new and choose `Docker` as the SDK (this repo already contains a Dockerfile).
   - Give the space a name such as `llm-error-classifier-api` and select the desired hardware (CPU is fine unless you need GPU acceleration).
   - After the space is created, copy the Git commands shown in the “Files” tab; you will push the contents of this `api/` folder there.

2. **Authenticate locally**
   ```bash
   pip install -U "huggingface_hub[cli]"
   huggingface-cli login
   ```
   Use a write token from https://huggingface.co/settings/tokens.

3. **Push the backend code**
   ```bash
   cd /work/cssema416/202610/12/llm-frontend-for-quang\ \(1\)/api
   rm -rf .git
   git init
   git remote add origin https://huggingface.co/spaces/<username>/<space-name>
   git add .
   git commit -m "Deploy FastAPI backend"
   git push origin main
   ```
   Replace `<username>` and `<space-name>` with your actual values. Hugging Face will build the Docker image automatically; the server becomes available at `https://<space-name>.<username>.hf.space`.

4. **Configure runtime behavior (optional)**
   - Set a custom `MODEL_PATH` or other environment variables from the “Settings → Repository secrets” tab inside the Space.
   - If you need GPU, request the proper hardware tier in the hardware selector.

5. **Wire up the Vercel frontend**
   - In `frontend/lib/api.ts` the app reads `process.env.NEXT_PUBLIC_API_URL`.
   - On Vercel, set `NEXT_PUBLIC_API_URL=https://<space-name>.<username>.hf.space` (no trailing slash) and redeploy the frontend so calls go directly to the Space backend.

6. **Verify**
   - Open the Space URL to confirm the FastAPI app is live (you should see the default 404 JSON from FastAPI or add a `/health` suffix).
   - Visit your Vercel deployment and ensure inference requests succeed using the new backend endpoint.