Spaces:

Vx2-3y
/

NCOS_S3

Paused

App Files Files Community

NCOS_S3 / scripts /prd.md

Vx2-3y

Initial project structure: FastAPI backend, Dockerfile, requirements, and PRD

3fa9baf 10 months ago

preview code

raw

history blame contribute delete

4.66 kB

	# Product Requirements Document (PRD)
	# Project: NCOS_S1 (Large Compliance LLM Pipeline)

	## 1. Project Overview
	Deploy a large compliance LLM (ACATECH/ncos, Llama-2-70B) on Hugging Face Spaces, with a Next.js frontend (Vercel), Supabase for test cases, and Redis for queueing. The backend is a FastAPI app running in a Docker container for full control (CUDA, dependencies, etc.).

	---

	## 2. Current State Analysis
	- Backend:
	- FastAPI app in Hugging Face Space, Dockerized.
	- CUDA and torch set up for GPU inference.
	- Permissions and cache issues resolved.
	- Requirements are mostly correct and reproducible.
	- Frontend:
	- Next.js app on Vercel (not tightly integrated yet).
	- Test/Queue:
	- Supabase for test cases.
	- Redis for queueing (not fully integrated).
	- Issues:
	- Dependency hell (CUDA, torch, flash-attn, numpy, etc.).
	- File permission and cache issues.
	- Model/tokenizer loading errors (corrupt/incompatible files).
	- Manual syncing of requirements and Dockerfile.
	- No robust, end-to-end pipeline from test case → queue → model → result → storage.
	- No clear API contract between frontend, backend, and test/queue system.
	- No health checks, monitoring, or error reporting.
	- No automated deployment or CI/CD for the Space.
	- Monolithic codebase, hard to debug.

	---

	## 3. Goals
	- Modular, robust, and reproducible pipeline for LLM compliance testing.
	- Clean separation of backend, frontend, and queue/storage.
	- Automated, reliable deployment and monitoring.
	- Clear API contract and documentation.

	---

	## 4. Recommended Architecture
	### A. Modular Structure
	- Backend (Hugging Face Space):
	- FastAPI app, Dockerized, REST API for inference.
	- Handles model loading, inference, health checks.
	- Connects to Redis for job queueing.
	- Optionally connects to Supabase for test/result storage.
	- Frontend (Vercel/Next.js):
	- Calls backend API for inference.
	- Displays results, test case status, health info.
	- Queue/Storage:
	- Redis for job queueing (decouples frontend/backend).
	- Supabase for storing test cases/results.

	### B. Key Features
	- Robust error handling and logging in backend.
	- Health check endpoints (`/healthz`, `/readyz`).
	- Clear API contract (OpenAPI/Swagger for FastAPI).
	- Automated Docker build and deployment (version pinning).
	- CI/CD pipeline for backend and frontend.
	- Documentation for setup, usage, troubleshooting.

	---

	## 5. Action Plan
	### Step 1: Design the API Contract
	- Define endpoints for:
	- `/infer` (POST): Accepts input, returns model output.
	- `/healthz` (GET): Returns service health.
	- `/queue` (POST/GET): For job submission/status (if using Redis).
	- Use FastAPI's OpenAPI docs for clarity.

	### Step 2: Clean Backend Implementation
	- Start a new repo or clean branch.
	- Write a minimal FastAPI app:
	- Loads model/tokenizer (with robust error handling).
	- Exposes `/infer` and `/healthz`.
	- Logs errors and requests.
	- Add Redis integration for queueing (optional, but recommended for scale).
	- Add Supabase integration for test/result storage (optional, can be added after core works).

	### Step 3: Dockerize the Backend
	- Use a clean, minimal Dockerfile:
	- Start from `nvidia/cuda:12.1.0-devel-ubuntu22.04`.
	- Install Python, torch, dependencies in correct order.
	- Set up cache and permissions.
	- Pin all versions in `requirements.txt`.
	- Add a health check in Dockerfile (`HEALTHCHECK`).

	### Step 4: Model/Tokenizer Management
	- Ensure model/tokenizer files are valid and compatible.
	- Test loading locally before pushing to Hugging Face.
	- Document the process for updating model files.

	### Step 5: Frontend Integration
	- Update Next.js frontend to call the new backend API.
	- Show job status, results, and health info.
	- Add error handling and user feedback.

	### Step 6: Queue and Storage Integration
	- Set up Redis for job queueing.
	- Set up Supabase for test case/result storage.
	- Ensure backend can pull jobs from Redis, process, and store results in Supabase.

	### Step 7: Monitoring and Health
	- Add logging and error reporting (e.g., to stdout, or a logging service).
	- Implement `/healthz` and `/readyz` endpoints.
	- Optionally, add Prometheus/Grafana metrics.

	### Step 8: CI/CD and Documentation
	- Add GitHub Actions or similar for automated build/test/deploy.
	- Write clear README and API docs.

	---

	## 6. Success Criteria
	- End-to-end pipeline works: test case → queue → model → result → storage.
	- Robust error handling and health checks in place.
	- Automated, reproducible builds and deployments.
	- Clear, up-to-date documentation for all components.