Spaces:
Sleeping
Sleeping
| title: LLM Error Classifier API | |
| emoji: 🚀 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| sdk_version: 20.10.24 | |
| app_file: main.py | |
| pinned: false | |
| license: mit | |
| # LLM Error Classifier API | |
| FastAPI backend serving the fine-tuned Llama-3.2-3B model for tool-use error classification. | |
| ## API Endpoints | |
| - `POST /api/classify` - Classify a tool call | |
| - `GET /api/examples` - Get example inputs | |
| - `GET /health` - Health check | |
| ## Model | |
| Model: `daoqm123/llm-error-classifier` | |
| ## Usage | |
| The API will automatically load the model from HuggingFace Hub on startup. | |
| ## Deploying to Hugging Face Spaces | |
| 1. **Create a Space** | |
| - Go to https://huggingface.co/spaces/new and choose `Docker` as the SDK (this repo already contains a Dockerfile). | |
| - Give the space a name such as `llm-error-classifier-api` and select the desired hardware (CPU is fine unless you need GPU acceleration). | |
| - After the space is created, copy the Git commands shown in the “Files” tab; you will push the contents of this `api/` folder there. | |
| 2. **Authenticate locally** | |
| ```bash | |
| pip install -U "huggingface_hub[cli]" | |
| huggingface-cli login | |
| ``` | |
| Use a write token from https://huggingface.co/settings/tokens. | |
| 3. **Push the backend code** | |
| ```bash | |
| cd /work/cssema416/202610/12/llm-frontend-for-quang\ \(1\)/api | |
| rm -rf .git | |
| git init | |
| git remote add origin https://huggingface.co/spaces/<username>/<space-name> | |
| git add . | |
| git commit -m "Deploy FastAPI backend" | |
| git push origin main | |
| ``` | |
| Replace `<username>` and `<space-name>` with your actual values. Hugging Face will build the Docker image automatically; the server becomes available at `https://<space-name>.<username>.hf.space`. | |
| 4. **Configure runtime behavior (optional)** | |
| - Set a custom `MODEL_PATH` or other environment variables from the “Settings → Repository secrets” tab inside the Space. | |
| - If you need GPU, request the proper hardware tier in the hardware selector. | |
| 5. **Wire up the Vercel frontend** | |
| - In `frontend/lib/api.ts` the app reads `process.env.NEXT_PUBLIC_API_URL`. | |
| - On Vercel, set `NEXT_PUBLIC_API_URL=https://<space-name>.<username>.hf.space` (no trailing slash) and redeploy the frontend so calls go directly to the Space backend. | |
| 6. **Verify** | |
| - Open the Space URL to confirm the FastAPI app is live (you should see the default 404 JSON from FastAPI or add a `/health` suffix). | |
| - Visit your Vercel deployment and ensure inference requests succeed using the new backend endpoint. | |