daoqm123's picture
Deploy FastAPI backend
877b44a
metadata
title: LLM Error Classifier API
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 20.10.24
app_file: main.py
pinned: false
license: mit

LLM Error Classifier API

FastAPI backend serving the fine-tuned Llama-3.2-3B model for tool-use error classification.

API Endpoints

  • POST /api/classify - Classify a tool call
  • GET /api/examples - Get example inputs
  • GET /health - Health check

Model

Model: daoqm123/llm-error-classifier

Usage

The API will automatically load the model from HuggingFace Hub on startup.

Deploying to Hugging Face Spaces

  1. Create a Space

    • Go to https://huggingface.co/spaces/new and choose Docker as the SDK (this repo already contains a Dockerfile).
    • Give the space a name such as llm-error-classifier-api and select the desired hardware (CPU is fine unless you need GPU acceleration).
    • After the space is created, copy the Git commands shown in the “Files” tab; you will push the contents of this api/ folder there.
  2. Authenticate locally

    pip install -U "huggingface_hub[cli]"
    huggingface-cli login
    

    Use a write token from https://huggingface.co/settings/tokens.

  3. Push the backend code

    cd /work/cssema416/202610/12/llm-frontend-for-quang\ \(1\)/api
    rm -rf .git
    git init
    git remote add origin https://huggingface.co/spaces/<username>/<space-name>
    git add .
    git commit -m "Deploy FastAPI backend"
    git push origin main
    

    Replace <username> and <space-name> with your actual values. Hugging Face will build the Docker image automatically; the server becomes available at https://<space-name>.<username>.hf.space.

  4. Configure runtime behavior (optional)

    • Set a custom MODEL_PATH or other environment variables from the “Settings → Repository secrets” tab inside the Space.
    • If you need GPU, request the proper hardware tier in the hardware selector.
  5. Wire up the Vercel frontend

    • In frontend/lib/api.ts the app reads process.env.NEXT_PUBLIC_API_URL.
    • On Vercel, set NEXT_PUBLIC_API_URL=https://<space-name>.<username>.hf.space (no trailing slash) and redeploy the frontend so calls go directly to the Space backend.
  6. Verify

    • Open the Space URL to confirm the FastAPI app is live (you should see the default 404 JSON from FastAPI or add a /health suffix).
    • Visit your Vercel deployment and ensure inference requests succeed using the new backend endpoint.