File size: 4,657 Bytes
3fa9baf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | # Product Requirements Document (PRD)
# Project: NCOS_S1 (Large Compliance LLM Pipeline)
## 1. Project Overview
Deploy a large compliance LLM (ACATECH/ncos, Llama-2-70B) on Hugging Face Spaces, with a Next.js frontend (Vercel), Supabase for test cases, and Redis for queueing. The backend is a FastAPI app running in a Docker container for full control (CUDA, dependencies, etc.).
---
## 2. Current State Analysis
- **Backend:**
- FastAPI app in Hugging Face Space, Dockerized.
- CUDA and torch set up for GPU inference.
- Permissions and cache issues resolved.
- Requirements are mostly correct and reproducible.
- **Frontend:**
- Next.js app on Vercel (not tightly integrated yet).
- **Test/Queue:**
- Supabase for test cases.
- Redis for queueing (not fully integrated).
- **Issues:**
- Dependency hell (CUDA, torch, flash-attn, numpy, etc.).
- File permission and cache issues.
- Model/tokenizer loading errors (corrupt/incompatible files).
- Manual syncing of requirements and Dockerfile.
- No robust, end-to-end pipeline from test case β queue β model β result β storage.
- No clear API contract between frontend, backend, and test/queue system.
- No health checks, monitoring, or error reporting.
- No automated deployment or CI/CD for the Space.
- Monolithic codebase, hard to debug.
---
## 3. Goals
- Modular, robust, and reproducible pipeline for LLM compliance testing.
- Clean separation of backend, frontend, and queue/storage.
- Automated, reliable deployment and monitoring.
- Clear API contract and documentation.
---
## 4. Recommended Architecture
### A. Modular Structure
- **Backend (Hugging Face Space):**
- FastAPI app, Dockerized, REST API for inference.
- Handles model loading, inference, health checks.
- Connects to Redis for job queueing.
- Optionally connects to Supabase for test/result storage.
- **Frontend (Vercel/Next.js):**
- Calls backend API for inference.
- Displays results, test case status, health info.
- **Queue/Storage:**
- Redis for job queueing (decouples frontend/backend).
- Supabase for storing test cases/results.
### B. Key Features
- Robust error handling and logging in backend.
- Health check endpoints (`/healthz`, `/readyz`).
- Clear API contract (OpenAPI/Swagger for FastAPI).
- Automated Docker build and deployment (version pinning).
- CI/CD pipeline for backend and frontend.
- Documentation for setup, usage, troubleshooting.
---
## 5. Action Plan
### Step 1: Design the API Contract
- Define endpoints for:
- `/infer` (POST): Accepts input, returns model output.
- `/healthz` (GET): Returns service health.
- `/queue` (POST/GET): For job submission/status (if using Redis).
- Use FastAPI's OpenAPI docs for clarity.
### Step 2: Clean Backend Implementation
- Start a new repo or clean branch.
- Write a minimal FastAPI app:
- Loads model/tokenizer (with robust error handling).
- Exposes `/infer` and `/healthz`.
- Logs errors and requests.
- Add Redis integration for queueing (optional, but recommended for scale).
- Add Supabase integration for test/result storage (optional, can be added after core works).
### Step 3: Dockerize the Backend
- Use a clean, minimal Dockerfile:
- Start from `nvidia/cuda:12.1.0-devel-ubuntu22.04`.
- Install Python, torch, dependencies in correct order.
- Set up cache and permissions.
- Pin all versions in `requirements.txt`.
- Add a health check in Dockerfile (`HEALTHCHECK`).
### Step 4: Model/Tokenizer Management
- Ensure model/tokenizer files are valid and compatible.
- Test loading locally before pushing to Hugging Face.
- Document the process for updating model files.
### Step 5: Frontend Integration
- Update Next.js frontend to call the new backend API.
- Show job status, results, and health info.
- Add error handling and user feedback.
### Step 6: Queue and Storage Integration
- Set up Redis for job queueing.
- Set up Supabase for test case/result storage.
- Ensure backend can pull jobs from Redis, process, and store results in Supabase.
### Step 7: Monitoring and Health
- Add logging and error reporting (e.g., to stdout, or a logging service).
- Implement `/healthz` and `/readyz` endpoints.
- Optionally, add Prometheus/Grafana metrics.
### Step 8: CI/CD and Documentation
- Add GitHub Actions or similar for automated build/test/deploy.
- Write clear README and API docs.
---
## 6. Success Criteria
- End-to-end pipeline works: test case β queue β model β result β storage.
- Robust error handling and health checks in place.
- Automated, reproducible builds and deployments.
- Clear, up-to-date documentation for all components. |