Spaces:

daoqm123
/

llm-error-classifier-api

Sleeping

App Files Files Community

daoqm123 commited on Nov 10, 2025

Commit

877b44a

1 Parent(s): e7916fb

Deploy FastAPI backend

Browse files

Files changed (4) hide show

Dockerfile +29 -0
README.md +62 -5
main.py +285 -0
requirements.txt +5 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,29 @@

+FROM python:3.11-slim
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Set working directory
+WORKDIR /app
+# Copy requirements
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY main.py .
+# Expose port (HuggingFace Spaces uses port 7860)
+EXPOSE 7860
+# Set environment variables
+ENV PORT=7860
+ENV PYTHONUNBUFFERED=1
+# Run the application
+CMD ["python", "main.py"]

README.md CHANGED Viewed

@@ -1,10 +1,67 @@
 ---
-title: Llm Error Classifier Api
-emoji: 👀
-colorFrom: gray
-colorTo: green
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: LLM Error Classifier API
+emoji: 🚀
+colorFrom: blue
+colorTo: purple
 sdk: docker
+sdk_version: 20.10.24
+app_file: main.py
 pinned: false
+license: mit
 ---
+# LLM Error Classifier API
+FastAPI backend serving the fine-tuned Llama-3.2-3B model for tool-use error classification.
+## API Endpoints
+- `POST /api/classify` - Classify a tool call
+- `GET /api/examples` - Get example inputs
+- `GET /health` - Health check
+## Model
+Model: `daoqm123/llm-error-classifier`
+## Usage
+The API will automatically load the model from HuggingFace Hub on startup.
+## Deploying to Hugging Face Spaces
+1. **Create a Space**
+   - Go to https://huggingface.co/spaces/new and choose `Docker` as the SDK (this repo already contains a Dockerfile).
+   - Give the space a name such as `llm-error-classifier-api` and select the desired hardware (CPU is fine unless you need GPU acceleration).
+   - After the space is created, copy the Git commands shown in the “Files” tab; you will push the contents of this `api/` folder there.
+2. **Authenticate locally**
+   ```bash
+   pip install -U "huggingface_hub[cli]"
+   huggingface-cli login
+   ```
+   Use a write token from https://huggingface.co/settings/tokens.
+3. **Push the backend code**
+   ```bash
+   cd /work/cssema416/202610/12/llm-frontend-for-quang\ \(1\)/api
+   rm -rf .git
+   git init
+   git remote add origin https://huggingface.co/spaces/<username>/<space-name>
+   git add .
+   git commit -m "Deploy FastAPI backend"
+   git push origin main
+   ```
+   Replace `<username>` and `<space-name>` with your actual values. Hugging Face will build the Docker image automatically; the server becomes available at `https://<space-name>.<username>.hf.space`.
+4. **Configure runtime behavior (optional)**
+   - Set a custom `MODEL_PATH` or other environment variables from the “Settings → Repository secrets” tab inside the Space.
+   - If you need GPU, request the proper hardware tier in the hardware selector.
+5. **Wire up the Vercel frontend**
+   - In `frontend/lib/api.ts` the app reads `process.env.NEXT_PUBLIC_API_URL`.
+   - On Vercel, set `NEXT_PUBLIC_API_URL=https://<space-name>.<username>.hf.space` (no trailing slash) and redeploy the frontend so calls go directly to the Space backend.
+6. **Verify**
+   - Open the Space URL to confirm the FastAPI app is live (you should see the default 404 JSON from FastAPI or add a `/health` suffix).
+   - Visit your Vercel deployment and ensure inference requests succeed using the new backend endpoint.

main.py ADDED Viewed

	@@ -0,0 +1,285 @@

+"""
+FastAPI Backend for LLM Tool-Use Error Classifier
+Serves predictions from the fine-tuned Llama-3.2-3B model
+"""
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+from typing import Dict, Any, List
+import json
+import os
+import time
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Global model and tokenizer
+model = None
+tokenizer = None
+device = None
+os.environ["CUDA_VISIBLE_DEVICES"] = "7"
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Lifespan context manager for startup and shutdown events"""
+    global model, tokenizer, device
+    # Startup
+    print("Loading model...")
+    # Get model path from environment variable, fallback to HuggingFace or local path
+    model_path = os.getenv("MODEL_PATH", "daoqm123/llm-error-classifier")
+    print(f"Model path: {model_path}")
+    # Determine device and dtype
+    if torch.cuda.is_available():
+        device = torch.device("cuda")
+        dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
+        print(f"Using GPU with dtype: {dtype}")
+    else:
+        device = torch.device("cpu")
+        dtype = torch.float32
+        print("Using CPU")
+    # Load tokenizer and model
+    # Supports both local paths and HuggingFace hub paths (e.g., "daoqm123/llm-error-classifier")
+    print(f"Loading tokenizer from: {model_path}")
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    print(f"Loading model from: {model_path}")
+    model = AutoModelForSequenceClassification.from_pretrained(
+        model_path,
+        torch_dtype=dtype,
+        device_map="auto" if torch.cuda.is_available() else None
+    )
+    if not torch.cuda.is_available():
+        model = model.to(device)
+    model.eval()
+    print("Model loaded successfully!")
+    yield  # Application runs here
+    # Shutdown (if needed)
+    # Cleanup code can go here
+app = FastAPI(title="LLM Error Classifier API", version="1.0.0", lifespan=lifespan)
+# Enable CORS for frontend
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # In production, specify exact origins
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Label mapping
+LABEL_MAP = {
+    0: "Correct",
+    1: "No_Tool_Available",
+    2: "Incorrect_Function_Name",
+    3: "Incorrect_Argument_Type",
+    4: "Wrong_Syntax",
+    5: "Wrong_Tool",
+    6: "Incorrect_Argument_Value",
+    7: "Incorrect_Argument_Name"
+}
+# Color mapping for frontend
+LABEL_COLORS = {
+    "Correct": "#10B981",
+    "No_Tool_Available": "#F59E0B",
+    "Incorrect_Function_Name": "#EF4444",
+    "Incorrect_Argument_Name": "#EC4899",
+    "Incorrect_Argument_Value": "#8B5CF6",
+    "Incorrect_Argument_Type": "#3B82F6",
+    "Wrong_Tool": "#F97316",
+    "Wrong_Syntax": "#DC2626"
+}
+class ClassificationRequest(BaseModel):
+    """Request body for classification endpoint"""
+    query: str
+    enabled_tools: List[Dict[str, Any]]
+    tool_calling: Dict[str, Any]
+class ClassificationResponse(BaseModel):
+    """Response from classification endpoint"""
+    label: str
+    confidence: float
+    all_probabilities: Dict[str, float]
+    processing_time_ms: int
+    category_color: str
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {
+        "status": "ok",
+        "model_loaded": model is not None,
+        "device": str(device) if device else "not initialized"
+    }
+@app.post("/api/classify", response_model=ClassificationResponse)
+async def classify(request: ClassificationRequest):
+    """
+    Classify a tool call as correct or identify the error type
+    """
+    if model is None or tokenizer is None:
+        raise HTTPException(status_code=503, detail="Model not loaded")
+    start_time = time.time()
+    try:
+        # Format input as JSON string (same format as training)
+        input_data = {
+            "query": request.query,
+            "enabled_tools": request.enabled_tools,
+            "tool_calling": request.tool_calling
+        }
+        input_text = json.dumps(input_data)
+        # Tokenize
+        inputs = tokenizer(
+            input_text,
+            return_tensors="pt",
+            truncation=True,
+            max_length=512,
+            padding=True
+        )
+        # Move to device
+        inputs = {k: v.to(device) for k, v in inputs.items()}
+        # Get prediction
+        with torch.no_grad():
+            outputs = model(**inputs)
+            logits = outputs.logits
+            probs = torch.softmax(logits, dim=-1)[0]
+            pred_idx = torch.argmax(probs).item()
+            confidence = probs[pred_idx].item()
+        # Get all probabilities
+        all_probs = {LABEL_MAP[i]: float(probs[i]) for i in range(len(probs))}
+        # Get predicted label
+        predicted_label = LABEL_MAP[pred_idx]
+        # Calculate processing time
+        processing_time_ms = int((time.time() - start_time) * 1000)
+        return ClassificationResponse(
+            label=predicted_label,
+            confidence=confidence,
+            all_probabilities=all_probs,
+            processing_time_ms=processing_time_ms,
+            category_color=LABEL_COLORS.get(predicted_label, "#6B7280")
+        )
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Classification error: {str(e)}")
+@app.get("/api/examples")
+async def get_examples():
+    """Return example inputs for testing"""
+    examples = [
+        {
+            "name": "Correct Example",
+            "description": "A properly formed tool call",
+            "data": {
+                "query": "What's the weather in New York?",
+                "enabled_tools": [
+                    {
+                        "name": "get_weather",
+                        "description": "Get current weather for a location",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "location": {"type": "string"},
+                                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
+                            },
+                            "required": ["location"]
+                        }
+                    }
+                ],
+                "tool_calling": {
+                    "name": "get_weather",
+                    "arguments": {
+                        "location": "New York",
+                        "units": "fahrenheit"
+                    }
+                }
+            }
+        },
+        {
+            "name": "Wrong Function Name",
+            "description": "Tool call uses incorrect function name",
+            "data": {
+                "query": "Calculate 25 * 4",
+                "enabled_tools": [
+                    {
+                        "name": "calculator",
+                        "description": "Perform calculations",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "expression": {"type": "string"}
+                            }
+                        }
+                    }
+                ],
+                "tool_calling": {
+                    "name": "calculate",  # Wrong name!
+                    "arguments": {
+                        "expression": "25 * 4"
+                    }
+                }
+            }
+        },
+        {
+            "name": "Incorrect Argument Type",
+            "description": "Argument has wrong data type",
+            "data": {
+                "query": "Set a reminder for 3pm",
+                "enabled_tools": [
+                    {
+                        "name": "set_reminder",
+                        "description": "Create a reminder",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "time": {"type": "string"},
+                                "message": {"type": "string"}
+                            }
+                        }
+                    }
+                ],
+                "tool_calling": {
+                    "name": "set_reminder",
+                    "arguments": {
+                        "time": 1500,  # Should be string!
+                        "message": "Meeting"
+                    }
+                }
+            }
+        }
+    ]
+    return {"examples": examples}
+if __name__ == "__main__":
+    import uvicorn
+    # HuggingFace Spaces uses port 7860, but allow override via environment variable
+    port = int(os.getenv("PORT", 7860))
+    # Use 0.0.0.0 to allow external connections
+    uvicorn.run(app, host="0.0.0.0", port=port)

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+pydantic==2.5.0
+torch>=2.0.0
+transformers>=4.35.0