NCOS_S3 / scripts /prd.md
Vx2-3y
Initial project structure: FastAPI backend, Dockerfile, requirements, and PRD
3fa9baf

Product Requirements Document (PRD)

Project: NCOS_S1 (Large Compliance LLM Pipeline)

1. Project Overview

Deploy a large compliance LLM (ACATECH/ncos, Llama-2-70B) on Hugging Face Spaces, with a Next.js frontend (Vercel), Supabase for test cases, and Redis for queueing. The backend is a FastAPI app running in a Docker container for full control (CUDA, dependencies, etc.).


2. Current State Analysis

  • Backend:
    • FastAPI app in Hugging Face Space, Dockerized.
    • CUDA and torch set up for GPU inference.
    • Permissions and cache issues resolved.
    • Requirements are mostly correct and reproducible.
  • Frontend:
    • Next.js app on Vercel (not tightly integrated yet).
  • Test/Queue:
    • Supabase for test cases.
    • Redis for queueing (not fully integrated).
  • Issues:
    • Dependency hell (CUDA, torch, flash-attn, numpy, etc.).
    • File permission and cache issues.
    • Model/tokenizer loading errors (corrupt/incompatible files).
    • Manual syncing of requirements and Dockerfile.
    • No robust, end-to-end pipeline from test case β†’ queue β†’ model β†’ result β†’ storage.
    • No clear API contract between frontend, backend, and test/queue system.
    • No health checks, monitoring, or error reporting.
    • No automated deployment or CI/CD for the Space.
    • Monolithic codebase, hard to debug.

3. Goals

  • Modular, robust, and reproducible pipeline for LLM compliance testing.
  • Clean separation of backend, frontend, and queue/storage.
  • Automated, reliable deployment and monitoring.
  • Clear API contract and documentation.

4. Recommended Architecture

A. Modular Structure

  • Backend (Hugging Face Space):
    • FastAPI app, Dockerized, REST API for inference.
    • Handles model loading, inference, health checks.
    • Connects to Redis for job queueing.
    • Optionally connects to Supabase for test/result storage.
  • Frontend (Vercel/Next.js):
    • Calls backend API for inference.
    • Displays results, test case status, health info.
  • Queue/Storage:
    • Redis for job queueing (decouples frontend/backend).
    • Supabase for storing test cases/results.

B. Key Features

  • Robust error handling and logging in backend.
  • Health check endpoints (/healthz, /readyz).
  • Clear API contract (OpenAPI/Swagger for FastAPI).
  • Automated Docker build and deployment (version pinning).
  • CI/CD pipeline for backend and frontend.
  • Documentation for setup, usage, troubleshooting.

5. Action Plan

Step 1: Design the API Contract

  • Define endpoints for:
    • /infer (POST): Accepts input, returns model output.
    • /healthz (GET): Returns service health.
    • /queue (POST/GET): For job submission/status (if using Redis).
  • Use FastAPI's OpenAPI docs for clarity.

Step 2: Clean Backend Implementation

  • Start a new repo or clean branch.
  • Write a minimal FastAPI app:
    • Loads model/tokenizer (with robust error handling).
    • Exposes /infer and /healthz.
    • Logs errors and requests.
  • Add Redis integration for queueing (optional, but recommended for scale).
  • Add Supabase integration for test/result storage (optional, can be added after core works).

Step 3: Dockerize the Backend

  • Use a clean, minimal Dockerfile:
    • Start from nvidia/cuda:12.1.0-devel-ubuntu22.04.
    • Install Python, torch, dependencies in correct order.
    • Set up cache and permissions.
    • Pin all versions in requirements.txt.
    • Add a health check in Dockerfile (HEALTHCHECK).

Step 4: Model/Tokenizer Management

  • Ensure model/tokenizer files are valid and compatible.
  • Test loading locally before pushing to Hugging Face.
  • Document the process for updating model files.

Step 5: Frontend Integration

  • Update Next.js frontend to call the new backend API.
  • Show job status, results, and health info.
  • Add error handling and user feedback.

Step 6: Queue and Storage Integration

  • Set up Redis for job queueing.
  • Set up Supabase for test case/result storage.
  • Ensure backend can pull jobs from Redis, process, and store results in Supabase.

Step 7: Monitoring and Health

  • Add logging and error reporting (e.g., to stdout, or a logging service).
  • Implement /healthz and /readyz endpoints.
  • Optionally, add Prometheus/Grafana metrics.

Step 8: CI/CD and Documentation

  • Add GitHub Actions or similar for automated build/test/deploy.
  • Write clear README and API docs.

6. Success Criteria

  • End-to-end pipeline works: test case β†’ queue β†’ model β†’ result β†’ storage.
  • Robust error handling and health checks in place.
  • Automated, reproducible builds and deployments.
  • Clear, up-to-date documentation for all components.