Spaces:

wt1711
/

lovebird25

Paused

App Files Files Community

Paul commited on Nov 3, 2025

Commit

75146bf

1 Parent(s): a29f4ee

Initial commit

Browse files

Files changed (24) hide show

CURL_EXAMPLES.md +177 -0
DEPLOYMENT.md +111 -0
GOOGLE_DRIVE_SETUP.md +202 -0
QUICKSTART.md +93 -0
README copy.md +125 -0
VERCEL_ISSUES.md +286 -0
__pycache__/app.cpython-313.pyc +0 -0
__pycache__/main.cpython-313.pyc +0 -0
__pycache__/ml_service.cpython-313.pyc +0 -0
__pycache__/schemas.cpython-313.pyc +0 -0
api/index.py +15 -0
main.py +90 -0
ml_service.py +102 -0
model/.DS_Store +0 -0
model/config.json +80 -0
model/mlb.pkl +0 -0
model/special_tokens_map.json +7 -0
model/tokenizer.json +0 -0
model/tokenizer_config.json +56 -0
model/vocab.txt +0 -0
requirements.txt +10 -0
schemas.py +19 -0
start.sh +16 -0
vercel.json +14 -0

CURL_EXAMPLES.md ADDED Viewed

	@@ -0,0 +1,177 @@

+# CURL Examples for ML Text Classification API
+## Local Testing
+### 1. Health Check
+```bash
+curl http://localhost:8000/health
+```
+Expected Response:
+```json
+{"status": "healthy"}
+```
+### 2. Root Endpoint
+```bash
+curl http://localhost:8000/
+```
+Expected Response:
+```json
+{
+  "message": "ML Text Classification API",
+  "version": "1.0.0",
+  "endpoints": {
+    "health": "/health",
+    "predict": "/predict"
+  }
+}
+```
+### 3. Prediction Endpoint (Main)
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}'
+```
+Expected Response:
+```json
+{
+  "results": [
+    {"label": "label_name_1", "score": 0.95},
+    {"label": "label_name_2", "score": 0.03},
+    ...
+  ]
+}
+```
+## Vercel / Production Deployment
+After deploying, replace `your-project.vercel.app` with your actual domain:
+### 1. Health Check
+```bash
+curl https://your-project.vercel.app/health
+```
+### 2. Root Endpoint
+```bash
+curl https://your-project.vercel.app/
+```
+### 3. Prediction Endpoint
+```bash
+curl -X POST "https://your-project.vercel.app/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}'
+```
+## Advanced Examples
+### Using jq for Pretty Output
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}' | jq '.'
+```
+### Sorting Results by Score
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}' | jq '.results | sort_by(.score) | reverse'
+```
+### Getting Top 3 Predictions
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}' | jq '.results | sort_by(.score) | reverse | .[0:3]'
+```
+## Testing with Different Text
+### Example 1
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello ||| Hi there"}'
+```
+### Example 2
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Xin chào ||| Chào bạn"}'
+```
+## Error Handling
+### Empty Text (400 Error)
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": ""}'
+```
+Expected Response:
+```json
+{
+  "detail": "Text field is required and cannot be empty"
+}
+```
+### Missing Text Field (422 Error)
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{}'
+```
+Expected Response:
+```json
+{
+  "detail": [
+    {
+      "type": "missing",
+      "loc": ["body", "text"],
+      "msg": "Field required",
+      "input": {}
+    }
+  ]
+}
+```
+## Python Requests Examples
+If you prefer Python instead of curl:
+```python
+import requests
+# Prediction
+response = requests.post(
+    "http://localhost:8000/predict",
+    json={"text": "Hi ||| Hi anh"}
+)
+print(response.json())
+```
+## JavaScript/Fetch Examples
+```javascript
+fetch('http://localhost:8000/predict', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+  },
+  body: JSON.stringify({
+    text: 'Hi ||| Hi anh'
+  })
+})
+.then(response => response.json())
+.then(data => console.log(data));
+```

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,111 @@

+# Deployment Guide
+## Vercel Deployment
+### Quick Deploy
+1. Install Vercel CLI:
+```bash
+npm i -g vercel
+```
+2. Login to Vercel:
+```bash
+vercel login
+```
+3. Deploy:
+```bash
+vercel
+```
+### Important Limitations
+⚠️ **WARNING:** Your model weights are **256MB**, which exceeds Vercel's free tier limit (50MB limit).
+**Current Stack:**
+- FastAPI ✅ (Fully configured)
+- PyTorch (~800MB)
+- Transformers (~100-300MB)
+- Your model: **256MB**
+- **Total: ~1.4GB+**
+### Solutions
+#### Option 1: Use Vercel Pro with Larger Limits
+Upgrade to Vercel Pro for increased limits, but still may have issues.
+#### Option 2: Alternative Platforms (Recommended)
+**AWS Lambda + Lambda Layers:**
+- Better for ML workloads
+- Supports larger packages
+- Cost-effective
+**Google Cloud Run:**
+- Docker-based deployment
+- Auto-scaling
+- Better ML support
+**Railway / Render:**
+- Docker deployments
+- No strict size limits
+- Easy setup
+**Hugging Face Inference API:**
+- Host model separately
+- Call via API
+- Free tier available
+#### Option 3: Hybrid Approach
+- Deploy FastAPI to Vercel (small)
+- Host model on Hugging Face / AWS SageMaker
+- Call model API from FastAPI
+### Testing Before Deploy
+Check your deployment size:
+```bash
+du -sh model/
+```
+If >50MB, Vercel will likely fail.
+### Current Configuration
+✅ **main.py** - FastAPI application (Vercel entry point)
+✅ **vercel.json** - Vercel configuration
+✅ **requirements.txt** - Python dependencies
+## Endpoints After Deployment
+Once deployed (regardless of platform):
+**Health Check:**
+```
+GET https://your-app-url/health
+```
+**Root:**
+```
+GET https://your-app-url/
+```
+**Prediction:**
+```
+POST https://your-app-url/predict
+Content-Type: application/json
+{
+  "text": "Hi ||| Hi anh"
+}
+```
+## Example CURL
+```bash
+curl -X POST "https://your-app-url/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}'
+```

GOOGLE_DRIVE_SETUP.md ADDED Viewed

	@@ -0,0 +1,202 @@

+# Google Drive Model Setup
+## Overview
+The ML model (`model.safetensors`, 256MB) is stored on Google Drive instead of in the repository. The application will automatically download it on first run.
+## Setup Instructions
+### Step 1: Upload Model to Google Drive
+1. Upload `model.safetensors` to your Google Drive
+2. Right-click the file and select "Share"
+3. Set permissions to "Anyone with the link"
+4. Copy the file ID from the sharing URL
+### Step 2: Get Google Drive File ID
+From the sharing URL:
+```
+https://drive.google.com/file/d/FILE_ID_HERE/view?usp=sharing
+                                              ^^^^^^^^^^^^^^^^^^^^
+                                              This is your FILE_ID
+```
+Or from a direct link:
+```
+https://drive.google.com/uc?id=FILE_ID_HERE
+                                   ^^^^^^^^^^^^^^^^^^^^
+                                   This is your FILE_ID
+```
+### Step 3: Configure Application
+You have **3 options** to provide the Google Drive file ID:
+#### Option 1: Environment Variable (Recommended)
+Create a `.env` file in the project root:
+```bash
+GDRIVE_MODEL_ID=YOUR_FILE_ID_HERE
+```
+#### Option 2: Update ml_service.py
+Edit the `get_ml_service()` function:
+```python
+def get_ml_service() -> MLInferenceService:
+    """Get or create the global ML service instance."""
+    global _ml_service
+    if _ml_service is None:
+        _ml_service = MLInferenceService(
+            gdrive_file_id="YOUR_FILE_ID_HERE"  # Add your ID here
+        )
+        _ml_service.load_model()
+    return _ml_service
+```
+#### Option 3: Set Environment Variable in Shell
+```bash
+export GDRIVE_MODEL_ID=YOUR_FILE_ID_HERE
+python main.py
+```
+### Step 4: Install Dependencies
+Make sure `gdown` is installed:
+```bash
+pip install -r requirements.txt
+```
+### Step 5: Test
+Run the application:
+```bash
+source venv/bin/activate
+python main.py
+```
+On first run, you should see:
+```
+Downloading model from Google Drive...
+[Progress bar]
+Model downloaded successfully to ./model/model.safetensors
+```
+## Files in Repository
+### ❌ Excluded (Too Large for GitHub)
+- `model/model.safetensors` (256MB)
+### ✅ Included
+- `model/config.json`
+- `model/mlb.pkl`
+- `model/tokenizer.json`
+- `model/tokenizer_config.json`
+- `model/special_tokens_map.json`
+- `model/vocab.txt`
+## How It Works
+1. On first API request, `load_model()` is called
+2. If `model.safetensors` doesn't exist locally:
+   - Check if `GDRIVE_MODEL_ID` is set
+   - Download from Google Drive using `gdown`
+   - Save to `./model/model.safetensors`
+3. Load model using transformers
+4. Create inference pipeline
+**Subsequent runs will use the cached local file** - no re-download needed.
+## .gitignore Configuration
+Make sure your `.gitignore` includes:
+```
+model/*.safetensors
+model/*.pkl  # Uncomment if you want to exclude mlb.pkl too
+```
+## Troubleshooting
+### "File not found or insufficient permissions"
+- Check the file ID is correct
+- Verify file sharing is set to "Anyone with the link"
+- Try opening the sharing URL in an incognito window
+### "Download interrupted"
+- File is large (256MB), ensure stable internet
+- Script will retry on next run
+- You can manually download and place in `model/` folder
+### "gdown not found"
+```bash
+pip install gdown
+```
+### Local Testing Without Download
+If you have the model file locally, just place it in `model/` folder:
+```bash
+cp /path/to/your/model.safetensors ./model/
+```
+## Deployment Considerations
+### Vercel / Serverless
+When deploying to serverless platforms:
+1. Upload model during build time, OR
+2. Set `GDRIVE_MODEL_ID` as environment variable
+3. First request will be slow (download + load)
+4. Subsequent requests will be fast (cached model)
+### Docker / VMs
+For Docker deployments, you have options:
+1. **Build-time download** - Download in Dockerfile
+2. **Runtime download** - Use GDRIVE_MODEL_ID env var
+3. **Volume mount** - Mount model directory
+Example Dockerfile:
+```dockerfile
+FROM python:3.13
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+# Option 1: Download at build time
+RUN gdown YOUR_FILE_ID -O model/model.safetensors
+# Option 2: Let runtime download
+# ENV GDRIVE_MODEL_ID=YOUR_FILE_ID
+CMD ["python", "main.py"]
+```
+## Security Notes
+- ✅ Google Drive ID is not sensitive (public file)
+- ✅ File ID can be safely committed to git
+- ⚠️ Consider rate limiting your API if sharing publicly
+- ⚠️ Monitor Google Drive quota if many deployments
+## Alternative Storage Options
+If Google Drive doesn't meet your needs:
+1. **AWS S3** - Use boto3 to download from S3
+2. **Hugging Face Hub** - Transformers library supports this natively
+3. **HTTP Server** - Host on any HTTP server
+4. **IPFS** - Decentralized storage
+5. **Dropbox / OneDrive** - Similar to Google Drive approach
+## Example Usage
+```python
+from ml_service import get_ml_service
+# Will auto-download from Google Drive if needed
+ml_service = get_ml_service()
+# Use normally
+predictions = ml_service.predict("Hi ||| Hi anh")
+print(predictions)
+```

QUICKSTART.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# Quick Start Guide
+## ✅ How to Run the Application
+### Step 1: Activate Virtual Environment
+```bash
+source venv/bin/activate
+```
+### Step 2: Run the Server
+```bash
+python main.py
+```
+You should see:
+```
+INFO:     Started server process
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000
+```
+**Note:** The model loads on first request, which may take 10-30 seconds.
+### Step 3: Test the API
+Open a **new terminal** (keep the server running) and test:
+**Health Check:**
+```bash
+curl http://localhost:8000/health
+```
+**Main Prediction Endpoint:**
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}'
+```
+**Get API Info:**
+```bash
+curl http://localhost:8000/
+```
+## 🌐 Interactive API Documentation
+Visit: **http://localhost:8000/docs**
+This provides a Swagger UI to test all endpoints interactively.
+## 📁 Alternative: Using Uvicorn Directly
+Instead of `python main.py`, you can also use:
+```bash
+source venv/bin/activate
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+```
+The `--reload` flag enables auto-reload on code changes.
+## 🛑 Stopping the Server
+Press `Ctrl + C` in the terminal running the server.
+## 🚀 Deploying to Vercel
+See `DEPLOYMENT.md` for detailed instructions.
+**Quick deploy:**
+```bash
+vercel login
+vercel
+```
+## 📝 Key Commands Summary
+| Task | Command |
+|------|---------|
+| Setup | `python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt` |
+| Run | `source venv/bin/activate && python main.py` |
+| Health | `curl http://localhost:8000/health` |
+| Predict | `curl -X POST http://localhost:8000/predict -H "Content-Type: application/json" -d '{"text": "Hi \|\|\| Hi anh"}'` |
+| Docs | Open `http://localhost:8000/docs` in browser |
+## ⚠️ Important Notes
+- **Python 3.13**: The project uses Python 3.13. If you have issues, ensure you're using the correct Python version.
+- **First Request**: The model loads on the first `/predict` request. This takes 10-30 seconds.
+- **Port 8000**: Make sure port 8000 is free. If not, modify `main.py` to use a different port.
+- **Model Size**: The model is 256MB, which exceeds Vercel's free tier limits. Consider alternative hosting (see `DEPLOYMENT.md`).

README copy.md ADDED Viewed

	@@ -0,0 +1,125 @@

+# ML Text Classification API
+A FastAPI-based REST API for multi-label text classification using DistilBERT.
+## Quick Start
+### Step 1: Setup
+1. Create a virtual environment (recommended):
+```bash
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+### Step 2: Configure Google Drive Model
+The model file (256MB) is stored on Google Drive. Configure it before first run:
+**Option 1: Set environment variable**
+```bash
+export GDRIVE_MODEL_ID=YOUR_FILE_ID_HERE
+```
+**Option 2: Create `.env` file**
+```bash
+echo "GDRIVE_MODEL_ID=YOUR_FILE_ID_HERE" > .env
+```
+See [GOOGLE_DRIVE_SETUP.md](GOOGLE_DRIVE_SETUP.md) for detailed instructions on getting your Google Drive file ID.
+3. Run the application:
+```bash
+python main.py
+```
+**Note:** If `pip` command not found, use `python3 -m pip install -r requirements.txt`
+Or using uvicorn directly:
+```bash
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+```
+## API Endpoints
+### Root
+- **GET** `/` - Get API information
+### Health Check
+- **GET** `/health` - Check API health status
+### Prediction
+- **POST** `/predict` - Classify text
+Request body:
+```json
+{
+  "text": "Hi ||| Hi anh"
+}
+```
+Response:
+```json
+{
+  "results": [
+    {
+      "label": "label_name_1",
+      "score": 0.85
+    },
+    {
+      "label": "label_name_2",
+      "score": 0.65
+    }
+  ]
+}
+```
+## Testing
+### Local Testing
+Test the API using curl:
+```bash
+curl -X POST "http://localhost:8000/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}'
+```
+Or use the interactive docs at `http://localhost:8000/docs`
+### Vercel Deployment
+#### CURL Examples
+After deploying to Vercel:
+**1. Health Check:**
+```bash
+curl https://your-project.vercel.app/health
+```
+**2. Root Endpoint:**
+```bash
+curl https://your-project.vercel.app/
+```
+**3. Prediction Endpoint:**
+```bash
+curl -X POST "https://your-project.vercel.app/predict" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hi ||| Hi anh"}'
+```
+**⚠️ Important:** Vercel has strict size limitations and **is NOT suitable for ML workloads** with PyTorch/Transformers models. Your deployment (~1.4GB) exceeds Vercel limits.
+**Recommended alternatives:**
+- **Railway.app** - Best for ML (easy deployment, Docker support)
+- **Render.com** - Free tier available
+- **Google Cloud Run** - Production-grade serverless
+- See [VERCEL_ISSUES.md](VERCEL_ISSUES.md) for detailed deployment guide

VERCEL_ISSUES.md ADDED Viewed

	@@ -0,0 +1,286 @@

+# Vercel Deployment Issues & Solutions
+## Error: "data is too long"
+This error occurs when:
+1. Model files are too large for Vercel's limits
+2. Response data exceeds size limits
+3. Cold start timeout issues
+## Vercel Limitations
+- **Function size**: 50MB (uncompressed)
+- **Response body**: 4.5MB max
+- **Cold start timeout**: 10s (free tier), 60s (pro)
+- **Total deployment**: 100MB
+## Your Current Situation
+### Model Size Breakdown:
+- PyTorch: ~800MB
+- Transformers: ~200-300MB
+- Your model: 256MB
+- Other dependencies: ~100MB
+- **Total: ~1.4GB** ❌
+### This Exceeds ALL Vercel Limits!
+## Solutions
+### ❌ Option 1: Vercel (Not Recommended)
+Vercel is **NOT suitable** for your ML workload.
+### ✅ Option 2: Alternative Platforms (Recommended)
+#### Option 2A: Railway.app
+**Best for ML deployments**
+```bash
+# Install Railway CLI
+npm i -g @railway/cli
+# Login
+railway login
+# Deploy
+railway up
+```
+**Why Railway:**
+- Docker-based (your app runs in a container)
+- No strict size limits
+- Better for Python/ML workloads
+- Free tier: $5 credit/month
+- Auto-deploy from GitHub
+#### Option 2B: Render.com
+**Similar to Railway**
+```bash
+# Just push to GitHub
+git push origin main
+# Connect GitHub repo to Render
+# Render auto-detects FastAPI
+```
+**Why Render:**
+- Free tier available
+- Docker support
+- Easy GitHub integration
+- Auto-scaling
+#### Option 2C: Google Cloud Run
+**Best for production ML**
+```bash
+# Create Dockerfile
+# Build and push
+gcloud run deploy lovebird-api \
+  --source . \
+  --region asia-southeast1 \
+  --allow-unauthenticated
+```
+**Why Cloud Run:**
+- Serverless (like Vercel)
+- Better ML support
+- Auto-scaling to zero
+- Pay per use
+#### Option 2D: AWS Lambda + EFS/S3
+**Enterprise-grade solution**
+Pros:
+- Model can be stored in S3
+- Load into Lambda on cold start
+- EFS for persistent storage
+Cons:
+- More complex setup
+- Cold start still slow
+### ✅ Option 3: Hybrid Approach
+#### 3A: FastAPI on Vercel + Hugging Face Inference API
+1. Deploy lightweight FastAPI to Vercel (no model)
+2. Upload your model to Hugging Face
+3. Call HF Inference API from FastAPI
+**Pros:**
+- Fast cold starts
+- Automatic scaling
+- Free tier available
+- No model management
+**Cons:**
+- Requires hosting model elsewhere
+- Network latency
+- May have costs
+#### 3B: Split Architecture
+```
+FastAPI (Vercel)
+  ↓ calls →
+ML Service (Railway/Render)
+  ↓ hosts →
+Your Model
+```
+**Architecture:**
+- Vercel: FastAPI endpoints (lightweight)
+- Railway/Render: ML inference service
+- Keep code in sync via API calls
+## Quick Comparison
+| Platform | Suitable? | Cost | Setup | Performance |
+|----------|-----------|------|-------|-------------|
+| Vercel | ❌ No | Free/Paid | Easy | Too slow |
+| Railway | ✅ Yes | $5/mo | Easy | Fast |
+| Render | ✅ Yes | Free | Easy | Fast |
+| Cloud Run | ✅ Yes | Pay-per-use | Medium | Fast |
+| Lambda | ⚠️ Complex | Pay-per-use | Hard | Cold start |
+| HF API | ✅ Yes | Free/Paid | Easy | Network latency |
+## Recommended Next Steps
+### For Testing: Render.com
+1. Push code to GitHub
+2. Sign up at render.com
+3. Create "Web Service"
+4. Connect GitHub repo
+5. Deploy (free tier works)
+### For Production: Railway.app
+1. Install Railway CLI
+2. `railway login`
+3. `railway init`
+4. `railway up`
+5. Done!
+### For Enterprise: Google Cloud Run
+1. Create Dockerfile
+2. Build container
+3. Deploy to Cloud Run
+4. Auto-scales based on traffic
+## Migration Guide
+### From Vercel to Railway
+1. **Keep your code**: No changes needed
+2. **Add Dockerfile** (optional, Railway auto-detects):
+```dockerfile
+FROM python:3.13-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+CMD ["python", "main.py"]
+```
+3. **Deploy**: `railway up`
+4. **Set env vars**: Railway dashboard
+### From Vercel to Render
+1. Push to GitHub
+2. Create Render account
+3. New → Web Service
+4. Connect GitHub repo
+5. Set environment variables
+6. Deploy
+## Environment Variables Setup
+On any platform, make sure to set:
+```bash
+GDRIVE_MODEL_ID=YOUR_FILE_ID_HERE
+```
+## Model Loading Strategy
+### Current (Works on Railway/Render):
+```python
+# Downloads model on first request
+# Caches in memory for subsequent requests
+# Cold start: 30-60s (first request)
+# Warm: < 1s (subsequent requests)
+```
+### Optimization Options:
+#### 1. Pre-load Model on Startup
+```python
+# In main.py
+@app.on_event("startup")
+async def startup_event():
+    get_ml_service()  # Load model immediately
+```
+#### 2. Use Model Caching Layer (Redis/Memcached)
+```python
+# Store model in Redis between requests
+# Reduces cold starts
+```
+#### 3. Keep Container Warm
+```python
+# Set up health checks that keep container alive
+# Prevents cold starts
+```
+## Monitoring & Debugging
+### Check Deployment Logs
+```bash
+# Railway
+railway logs
+# Render
+# Dashboard → Logs tab
+# Vercel
+vercel logs
+```
+### Check Model Loading
+Look for these in logs:
+```
+Downloading model from Google Drive...
+Model downloaded successfully
+Device set to use...
+```
+### Common Issues
+#### "Model not found"
+- Check GDRIVE_MODEL_ID is set
+- Verify file sharing permissions
+- Check internet access in container
+#### "Timeout on cold start"
+- Normal! Cold start takes 30-60s
+- Use health checks to keep warm
+- Or upgrade tier for faster starts
+#### "Out of memory"
+- Model too large for tier
+- Upgrade to higher tier
+- Or use HF Inference API instead
+## Summary
+**Don't use Vercel for this project.** It's designed for static sites and small serverless functions, not ML workloads.
+**Use Railway or Render** for easy deployment with your current code.
+**Consider Hugging Face Inference API** for production scale without managing infrastructure.

__pycache__/app.cpython-313.pyc ADDED Viewed

Binary file (331 Bytes). View file

__pycache__/main.cpython-313.pyc ADDED Viewed

Binary file (2.87 kB). View file

__pycache__/ml_service.cpython-313.pyc ADDED Viewed

Binary file (4.42 kB). View file

__pycache__/schemas.cpython-313.pyc ADDED Viewed

Binary file (1.45 kB). View file

api/index.py ADDED Viewed

	@@ -0,0 +1,15 @@

+"""
+Vercel serverless entry point for FastAPI.
+This is a wrapper that exports the FastAPI app.
+"""
+import sys
+from pathlib import Path
+# Add parent directory to path to import our modules
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from main import app
+# Export app for Vercel
+__all__ = ["app"]

main.py ADDED Viewed

	@@ -0,0 +1,90 @@

+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from ml_service import get_ml_service
+from schemas import PredictionRequest, PredictionResponse, PredictionItem
+import asyncio
+# Initialize FastAPI app
+app = FastAPI(
+    title="ML Text Classification API",
+    description="API for multi-label text classification using DistilBERT",
+    version="1.0.0"
+)
+# Configure CORS
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+@app.get("/")
+async def root():
+    """Root endpoint."""
+    return {
+        "message": "ML Text Classification API",
+        "version": "1.0.0",
+        "endpoints": {
+            "health": "/health",
+            "predict": "/predict"
+        }
+    }
+@app.get("/health")
+async def health_check():
+    """Health check endpoint."""
+    return {"status": "healthy"}
+@app.post("/predict", response_model=PredictionResponse)
+async def predict(prediction_request: PredictionRequest):
+    """
+    Predict labels for the given text.
+    Args:
+        prediction_request: Request containing the text to classify
+    Returns:
+        PredictionResponse with classification results
+    """
+    if not prediction_request.text:
+        raise HTTPException(
+            status_code=400,
+            detail="Text field is required and cannot be empty"
+        )
+    try:
+        # Get ML service and predict in executor to avoid blocking
+        ml_service = get_ml_service()
+        # Run blocking ML inference in thread pool
+        loop = asyncio.get_event_loop()
+        predictions = await loop.run_in_executor(
+            None,  # Use default executor
+            ml_service.predict,
+            prediction_request.text
+        )
+        # Convert to Pydantic models
+        results = [
+            PredictionItem(label=item['label'], score=item['score'])
+            for item in predictions
+        ]
+        return PredictionResponse(results=results)
+    except Exception as e:
+        raise HTTPException(
+            status_code=500,
+            detail=f"Prediction error: {str(e)}"
+        )
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)

ml_service.py ADDED Viewed

	@@ -0,0 +1,102 @@

+import pickle
+import os
+import gdown
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
+from typing import List, Dict, Any
+class MLInferenceService:
+    """Service for loading and running ML model inference."""
+    def __init__(self, model_dir: str = "./model", gdrive_file_id: str = None):
+        self.model_dir = model_dir
+        self.gdrive_file_id = gdrive_file_id or os.getenv("GDRIVE_MODEL_ID")
+        self.model = None
+        self.tokenizer = None
+        self.clf = None
+        self.label_names = []
+    def load_model(self):
+        """Load the model, tokenizer, and label names."""
+        if self.model is not None:
+            return
+        # If Google Drive ID provided, download model file
+        if self.gdrive_file_id:
+            self._download_from_gdrive()
+        # Load model and tokenizer
+        self.model = AutoModelForSequenceClassification.from_pretrained(self.model_dir)
+        self.tokenizer = AutoTokenizer.from_pretrained(self.model_dir)
+        # Load MultiLabelBinarizer to get label names
+        with open(f"{self.model_dir}/mlb.pkl", "rb") as f:
+            mlb = pickle.load(f)
+        self.label_names = list(mlb.classes_)
+        # Create pipeline for inference
+        self.clf = pipeline(
+            "text-classification",
+            model=self.model,
+            tokenizer=self.tokenizer,
+            return_all_scores=True
+        )
+    def _download_from_gdrive(self):
+        """Download model.safetensors from Google Drive if not exists."""
+        model_path = f"{self.model_dir}/model.safetensors"
+        # Skip download if file already exists
+        if os.path.exists(model_path):
+            return
+        # Ensure model directory exists
+        os.makedirs(self.model_dir, exist_ok=True)
+        # Download from Google Drive
+        print(f"Downloading model from Google Drive...")
+        gdrive_url = f"https://drive.google.com/uc?id={self.gdrive_file_id}"
+        gdown.download(gdrive_url, model_path, quiet=False)
+        print(f"Model downloaded successfully to {model_path}")
+    def predict(self, text: str) -> List[Dict[str, Any]]:
+        """
+        Predict labels for the given text.
+        Args:
+            text: Input text to classify
+        Returns:
+            List of dictionaries with 'label' and 'score' keys
+        """
+        if self.clf is None:
+            raise RuntimeError("Model not loaded. Call load_model() first.")
+        # Process text: replace ||| with [SEP]
+        processed_text = text.replace('|||', '[SEP]')
+        # Get predictions
+        result = self.clf(processed_text)
+        # Map label indices to label names and filter by score >= 0.5
+        output = [
+            {'label': self.label_names[i], 'score': item['score']}
+            for i, item in enumerate(result[0])
+            if item['score'] >= 0.5
+        ]
+        return output
+# Global singleton instance
+_ml_service = None
+def get_ml_service() -> MLInferenceService:
+    """Get or create the global ML service instance."""
+    global _ml_service
+    if _ml_service is None:
+        _ml_service = MLInferenceService()
+        _ml_service.load_model()
+    return _ml_service

model/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

model/config.json ADDED Viewed

	@@ -0,0 +1,80 @@

+{
+  "activation": "gelu",
+  "architectures": [
+    "DistilBertForSequenceClassification"
+  ],
+  "attention_dropout": 0.1,
+  "dim": 768,
+  "dropout": 0.1,
+  "dtype": "float32",
+  "hidden_dim": 3072,
+  "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1",
+    "2": "LABEL_2",
+    "3": "LABEL_3",
+    "4": "LABEL_4",
+    "5": "LABEL_5",
+    "6": "LABEL_6",
+    "7": "LABEL_7",
+    "8": "LABEL_8",
+    "9": "LABEL_9",
+    "10": "LABEL_10",
+    "11": "LABEL_11",
+    "12": "LABEL_12",
+    "13": "LABEL_13",
+    "14": "LABEL_14",
+    "15": "LABEL_15",
+    "16": "LABEL_16",
+    "17": "LABEL_17",
+    "18": "LABEL_18",
+    "19": "LABEL_19",
+    "20": "LABEL_20",
+    "21": "LABEL_21",
+    "22": "LABEL_22",
+    "23": "LABEL_23",
+    "24": "LABEL_24",
+    "25": "LABEL_25"
+  },
+  "initializer_range": 0.02,
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1,
+    "LABEL_10": 10,
+    "LABEL_11": 11,
+    "LABEL_12": 12,
+    "LABEL_13": 13,
+    "LABEL_14": 14,
+    "LABEL_15": 15,
+    "LABEL_16": 16,
+    "LABEL_17": 17,
+    "LABEL_18": 18,
+    "LABEL_19": 19,
+    "LABEL_2": 2,
+    "LABEL_20": 20,
+    "LABEL_21": 21,
+    "LABEL_22": 22,
+    "LABEL_23": 23,
+    "LABEL_24": 24,
+    "LABEL_25": 25,
+    "LABEL_3": 3,
+    "LABEL_4": 4,
+    "LABEL_5": 5,
+    "LABEL_6": 6,
+    "LABEL_7": 7,
+    "LABEL_8": 8,
+    "LABEL_9": 9
+  },
+  "max_position_embeddings": 512,
+  "model_type": "distilbert",
+  "n_heads": 12,
+  "n_layers": 6,
+  "pad_token_id": 0,
+  "problem_type": "multi_label_classification",
+  "qa_dropout": 0.1,
+  "seq_classif_dropout": 0.2,
+  "sinusoidal_pos_embds": false,
+  "tie_weights_": true,
+  "transformers_version": "4.57.1",
+  "vocab_size": 30522
+}

model/mlb.pkl ADDED Viewed

Binary file (957 Bytes). View file

model/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

model/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

model/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "DistilBertTokenizer",
+  "unk_token": "[UNK]"
+}

model/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+fastapi
+uvicorn[standard]
+pydantic
+transformers
+torch
+scikit-learn
+numpy
+safetensors
+gdown

schemas.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from pydantic import BaseModel, Field
+from typing import List
+class PredictionItem(BaseModel):
+    """Schema for a single prediction result."""
+    label: str = Field(..., description="Label name")
+    score: float = Field(..., description="Prediction score/confidence")
+class PredictionRequest(BaseModel):
+    """Schema for prediction request."""
+    text: str = Field(..., description="Input text to classify", min_length=1)
+class PredictionResponse(BaseModel):
+    """Schema for prediction response."""
+    results: List[PredictionItem] = Field(..., description="List of predictions")

start.sh ADDED Viewed

	@@ -0,0 +1,16 @@

+#!/bin/bash
+# Kill any existing process on port 8000
+echo "Checking for existing server on port 8000..."
+EXISTING_PID=$(lsof -ti:8000)
+if [ ! -z "$EXISTING_PID" ]; then
+    echo "Killing existing process $EXISTING_PID"
+    kill -9 $EXISTING_PID 2>/dev/null
+    sleep 2
+fi
+# Activate virtual environment and start server
+echo "Starting server..."
+source venv/bin/activate
+python main.py

vercel.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "builds": [
+    {
+      "src": "api/index.py",
+      "use": "@vercel/python"
+    }
+  ],
+  "routes": [
+    {
+      "src": "/(.*)",
+      "dest": "/api/index.py"
+    }
+  ]
+}