Spaces:

Vx2-3y
/

NCOS_S3

Paused

App Files Files Community

NCOS_S3 / scripts /prd.md

Vx2-3y

Initial project structure: FastAPI backend, Dockerfile, requirements, and PRD

3fa9baf 10 months ago

preview code

raw

history blame contribute delete

4.66 kB

Product Requirements Document (PRD)

Project: NCOS_S1 (Large Compliance LLM Pipeline)

1. Project Overview

Deploy a large compliance LLM (ACATECH/ncos, Llama-2-70B) on Hugging Face Spaces, with a Next.js frontend (Vercel), Supabase for test cases, and Redis for queueing. The backend is a FastAPI app running in a Docker container for full control (CUDA, dependencies, etc.).

2. Current State Analysis

Backend:
- FastAPI app in Hugging Face Space, Dockerized.
- CUDA and torch set up for GPU inference.
- Permissions and cache issues resolved.
- Requirements are mostly correct and reproducible.
Frontend:
- Next.js app on Vercel (not tightly integrated yet).
Test/Queue:
- Supabase for test cases.
- Redis for queueing (not fully integrated).
Issues:
- Dependency hell (CUDA, torch, flash-attn, numpy, etc.).
- File permission and cache issues.
- Model/tokenizer loading errors (corrupt/incompatible files).
- Manual syncing of requirements and Dockerfile.
- No robust, end-to-end pipeline from test case → queue → model → result → storage.
- No clear API contract between frontend, backend, and test/queue system.
- No health checks, monitoring, or error reporting.
- No automated deployment or CI/CD for the Space.
- Monolithic codebase, hard to debug.

3. Goals

Modular, robust, and reproducible pipeline for LLM compliance testing.
Clean separation of backend, frontend, and queue/storage.
Automated, reliable deployment and monitoring.
Clear API contract and documentation.

4. Recommended Architecture

A. Modular Structure

Backend (Hugging Face Space):
- FastAPI app, Dockerized, REST API for inference.
- Handles model loading, inference, health checks.
- Connects to Redis for job queueing.
- Optionally connects to Supabase for test/result storage.
Frontend (Vercel/Next.js):
- Calls backend API for inference.
- Displays results, test case status, health info.
Queue/Storage:
- Redis for job queueing (decouples frontend/backend).
- Supabase for storing test cases/results.

B. Key Features

Robust error handling and logging in backend.
Health check endpoints (/healthz, /readyz).
Clear API contract (OpenAPI/Swagger for FastAPI).
Automated Docker build and deployment (version pinning).
CI/CD pipeline for backend and frontend.
Documentation for setup, usage, troubleshooting.

5. Action Plan

Step 1: Design the API Contract

Define endpoints for:
- /infer (POST): Accepts input, returns model output.
- /healthz (GET): Returns service health.
- /queue (POST/GET): For job submission/status (if using Redis).
Use FastAPI's OpenAPI docs for clarity.

Step 2: Clean Backend Implementation

Start a new repo or clean branch.
Write a minimal FastAPI app:
- Loads model/tokenizer (with robust error handling).
- Exposes /infer and /healthz.
- Logs errors and requests.
Add Redis integration for queueing (optional, but recommended for scale).
Add Supabase integration for test/result storage (optional, can be added after core works).

Step 3: Dockerize the Backend

Use a clean, minimal Dockerfile:
- Start from nvidia/cuda:12.1.0-devel-ubuntu22.04.
- Install Python, torch, dependencies in correct order.
- Set up cache and permissions.
- Pin all versions in requirements.txt.
- Add a health check in Dockerfile (HEALTHCHECK).

Step 4: Model/Tokenizer Management

Ensure model/tokenizer files are valid and compatible.
Test loading locally before pushing to Hugging Face.
Document the process for updating model files.

Step 5: Frontend Integration

Update Next.js frontend to call the new backend API.
Show job status, results, and health info.
Add error handling and user feedback.

Step 6: Queue and Storage Integration

Set up Redis for job queueing.
Set up Supabase for test case/result storage.
Ensure backend can pull jobs from Redis, process, and store results in Supabase.

Step 7: Monitoring and Health

Add logging and error reporting (e.g., to stdout, or a logging service).
Implement /healthz and /readyz endpoints.
Optionally, add Prometheus/Grafana metrics.

Step 8: CI/CD and Documentation

Add GitHub Actions or similar for automated build/test/deploy.
Write clear README and API docs.

6. Success Criteria

End-to-end pipeline works: test case → queue → model → result → storage.
Robust error handling and health checks in place.
Automated, reproducible builds and deployments.
Clear, up-to-date documentation for all components.