| # Product Requirements Document (PRD) |
| # Project: NCOS_S1 (Large Compliance LLM Pipeline) |
| |
| ## 1. Project Overview |
| Deploy a large compliance LLM (ACATECH/ncos, Llama-2-70B) on Hugging Face Spaces, with a Next.js frontend (Vercel), Supabase for test cases, and Redis for queueing. The backend is a FastAPI app running in a Docker container for full control (CUDA, dependencies, etc.). |
| |
| --- |
| |
| ## 2. Current State Analysis |
| - **Backend:** |
| - FastAPI app in Hugging Face Space, Dockerized. |
| - CUDA and torch set up for GPU inference. |
| - Permissions and cache issues resolved. |
| - Requirements are mostly correct and reproducible. |
| - **Frontend:** |
| - Next.js app on Vercel (not tightly integrated yet). |
| - **Test/Queue:** |
| - Supabase for test cases. |
| - Redis for queueing (not fully integrated). |
| - **Issues:** |
| - Dependency hell (CUDA, torch, flash-attn, numpy, etc.). |
| - File permission and cache issues. |
| - Model/tokenizer loading errors (corrupt/incompatible files). |
| - Manual syncing of requirements and Dockerfile. |
| - No robust, end-to-end pipeline from test case β queue β model β result β storage. |
| - No clear API contract between frontend, backend, and test/queue system. |
| - No health checks, monitoring, or error reporting. |
| - No automated deployment or CI/CD for the Space. |
| - Monolithic codebase, hard to debug. |
| |
| --- |
| |
| ## 3. Goals |
| - Modular, robust, and reproducible pipeline for LLM compliance testing. |
| - Clean separation of backend, frontend, and queue/storage. |
| - Automated, reliable deployment and monitoring. |
| - Clear API contract and documentation. |
| |
| --- |
| |
| ## 4. Recommended Architecture |
| ### A. Modular Structure |
| - **Backend (Hugging Face Space):** |
| - FastAPI app, Dockerized, REST API for inference. |
| - Handles model loading, inference, health checks. |
| - Connects to Redis for job queueing. |
| - Optionally connects to Supabase for test/result storage. |
| - **Frontend (Vercel/Next.js):** |
| - Calls backend API for inference. |
| - Displays results, test case status, health info. |
| - **Queue/Storage:** |
| - Redis for job queueing (decouples frontend/backend). |
| - Supabase for storing test cases/results. |
| |
| ### B. Key Features |
| - Robust error handling and logging in backend. |
| - Health check endpoints (`/healthz`, `/readyz`). |
| - Clear API contract (OpenAPI/Swagger for FastAPI). |
| - Automated Docker build and deployment (version pinning). |
| - CI/CD pipeline for backend and frontend. |
| - Documentation for setup, usage, troubleshooting. |
| |
| --- |
| |
| ## 5. Action Plan |
| ### Step 1: Design the API Contract |
| - Define endpoints for: |
| - `/infer` (POST): Accepts input, returns model output. |
| - `/healthz` (GET): Returns service health. |
| - `/queue` (POST/GET): For job submission/status (if using Redis). |
| - Use FastAPI's OpenAPI docs for clarity. |
| |
| ### Step 2: Clean Backend Implementation |
| - Start a new repo or clean branch. |
| - Write a minimal FastAPI app: |
| - Loads model/tokenizer (with robust error handling). |
| - Exposes `/infer` and `/healthz`. |
| - Logs errors and requests. |
| - Add Redis integration for queueing (optional, but recommended for scale). |
| - Add Supabase integration for test/result storage (optional, can be added after core works). |
| |
| ### Step 3: Dockerize the Backend |
| - Use a clean, minimal Dockerfile: |
| - Start from `nvidia/cuda:12.1.0-devel-ubuntu22.04`. |
| - Install Python, torch, dependencies in correct order. |
| - Set up cache and permissions. |
| - Pin all versions in `requirements.txt`. |
| - Add a health check in Dockerfile (`HEALTHCHECK`). |
| |
| ### Step 4: Model/Tokenizer Management |
| - Ensure model/tokenizer files are valid and compatible. |
| - Test loading locally before pushing to Hugging Face. |
| - Document the process for updating model files. |
| |
| ### Step 5: Frontend Integration |
| - Update Next.js frontend to call the new backend API. |
| - Show job status, results, and health info. |
| - Add error handling and user feedback. |
| |
| ### Step 6: Queue and Storage Integration |
| - Set up Redis for job queueing. |
| - Set up Supabase for test case/result storage. |
| - Ensure backend can pull jobs from Redis, process, and store results in Supabase. |
| |
| ### Step 7: Monitoring and Health |
| - Add logging and error reporting (e.g., to stdout, or a logging service). |
| - Implement `/healthz` and `/readyz` endpoints. |
| - Optionally, add Prometheus/Grafana metrics. |
| |
| ### Step 8: CI/CD and Documentation |
| - Add GitHub Actions or similar for automated build/test/deploy. |
| - Write clear README and API docs. |
| |
| --- |
| |
| ## 6. Success Criteria |
| - End-to-end pipeline works: test case β queue β model β result β storage. |
| - Robust error handling and health checks in place. |
| - Automated, reproducible builds and deployments. |
| - Clear, up-to-date documentation for all components. |