Spaces:

daoqm123
/

llm-error-classifier-api

Sleeping

App Files Files Community

llm-error-classifier-api / README.md

daoqm123

Deploy FastAPI backend

877b44a 3 months ago

preview code

raw

history blame contribute delete

2.5 kB

	---
	title: LLM Error Classifier API
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: docker
	sdk_version: 20.10.24
	app_file: main.py
	pinned: false
	license: mit
	---

	# LLM Error Classifier API

	FastAPI backend serving the fine-tuned Llama-3.2-3B model for tool-use error classification.

	## API Endpoints

	- `POST /api/classify` - Classify a tool call
	- `GET /api/examples` - Get example inputs
	- `GET /health` - Health check

	## Model

	Model: `daoqm123/llm-error-classifier`

	## Usage

	The API will automatically load the model from HuggingFace Hub on startup.

	## Deploying to Hugging Face Spaces

	1. Create a Space
	- Go to https://huggingface.co/spaces/new and choose `Docker` as the SDK (this repo already contains a Dockerfile).
	- Give the space a name such as `llm-error-classifier-api` and select the desired hardware (CPU is fine unless you need GPU acceleration).
	- After the space is created, copy the Git commands shown in the “Files” tab; you will push the contents of this `api/` folder there.

	2. Authenticate locally
	```bash
	pip install -U "huggingface_hub[cli]"
	huggingface-cli login
	```
	Use a write token from https://huggingface.co/settings/tokens.

	3. Push the backend code
	```bash
	cd /work/cssema416/202610/12/llm-frontend-for-quang\ \(1\)/api
	rm -rf .git
	git init
	git remote add origin https://huggingface.co/spaces/<username>/<space-name>
	git add .
	git commit -m "Deploy FastAPI backend"
	git push origin main
	```
	Replace `<username>` and `<space-name>` with your actual values. Hugging Face will build the Docker image automatically; the server becomes available at `https://<space-name>.<username>.hf.space`.

	4. Configure runtime behavior (optional)
	- Set a custom `MODEL_PATH` or other environment variables from the “Settings → Repository secrets” tab inside the Space.
	- If you need GPU, request the proper hardware tier in the hardware selector.

	5. Wire up the Vercel frontend
	- In `frontend/lib/api.ts` the app reads `process.env.NEXT_PUBLIC_API_URL`.
	- On Vercel, set `NEXT_PUBLIC_API_URL=https://<space-name>.<username>.hf.space` (no trailing slash) and redeploy the frontend so calls go directly to the Space backend.

	6. Verify
	- Open the Space URL to confirm the FastAPI app is live (you should see the default 404 JSON from FastAPI or add a `/health` suffix).
	- Visit your Vercel deployment and ensure inference requests succeed using the new backend endpoint.