Spaces:

msse-team-3
/

ai-engineering-project

Sleeping

App Files Files Community

ai-engineering-project / docs /HF_CI_CD_PIPELINE.md

GitHub Action

Clean deployment without binary files

f884e6e 2 months ago

preview code

raw

history blame contribute delete

6.69 kB

HuggingFace CI/CD Pipeline Documentation

🚀 Overview

This repository implements a comprehensive CI/CD pipeline for deploying the Corporate Policy Assistant to HuggingFace Spaces with automated testing and validation.

🏗️ Architecture

Hybrid AI System

Embeddings: HuggingFace Inference API (intfloat/multilingual-e5-large)
LLM: OpenRouter (microsoft/wizardlm-2-8x22b)
Citation Validation: Real-time hallucination detection
Vector Database: ChromaDB for document storage

CI/CD Components

GitHub Actions: Automated testing and deployment
HuggingFace Spaces: Production environment
Comprehensive Test Suite: 27+ tests covering all components
Code Quality: Black, isort, flake8 validation

📋 Pipeline Workflow

1. Code Quality Checks

# Formatting validation
black --check .
isort --check-only .
flake8 --max-line-length=88

2. Comprehensive Testing

# Run all tests
pytest -v --cov=src --cov-report=xml

# HF-specific tests
pytest tests/test_embedding/test_hf_embedding_service.py -v

# Citation validation tests
pytest -k citation -v

3. Architecture Validation

Service initialization checks
Import validation
End-to-end pipeline testing
Citation fix verification

4. Deployment

Primary: msse-team-3/ai-engineering-project
Backup: sethmcknight/msse-ai-engineering
Health Checks: Automated smoke tests

🔧 Configuration Files

`.github/workflows/hf-ci-cd.yml`

Main CI/CD pipeline with:

Multi-Python version testing (3.10, 3.11)
Comprehensive test suite
Automatic HF deployment
Post-deployment validation

`.hf.yml`

HuggingFace Space configuration:

title: MSSE AI Engineering - Corporate Policy Assistant
sdk: gradio
app_file: app.py
models:
  - intfloat/multilingual-e5-large

`pytest.ini`

Test configuration with coverage and markers:

[tool.pytest.ini_options]
markers = [
    "unit: Unit tests",
    "integration: Integration tests",
    "hf: HuggingFace specific tests",
    "citation: Citation validation tests"
]

🧪 Testing Strategy

Unit Tests (Critical)

✅ HF Embedding Service: 12 comprehensive tests
✅ Prompt Templates: Citation fix validation
✅ LLM Components: Response processing
✅ Context Formatting: Fixed document numbering

Integration Tests (Non-Critical)

⚠️ API Integration: Real HF/OpenRouter calls
⚠️ End-to-End Pipeline: Complete workflow
⚠️ Service Validation: Production readiness

Coverage Requirements

Minimum: 80% code coverage
Focus Areas: Core business logic
Exclusions: Test files, dev tools

🚦 Pipeline Triggers

Automatic Deployment

Push to main: Full pipeline + production deployment
Push to hf-main-local: HF-specific testing + staging deployment

Pull Request Validation

All PRs: Full test suite without deployment
Pre-commit checks: Code quality validation

Manual Triggers

Emergency Deployment: Manual sync workflow
Test-only Runs: Validation without deployment

🔐 Required Secrets

Configure these in GitHub repository settings:

# HuggingFace
HF_TOKEN=hf_xxxxxxxxxx

# OpenRouter (for production testing)
OPENROUTER_API_KEY=sk-or-xxxxxxxxxx

# Existing secrets
RENDER_API_KEY=rnd_xxxxxxxxxx
RENDER_SERVICE_ID=srv-xxxxxxxxxx

📊 Monitoring & Validation

Automated Health Checks

# Production endpoints
https://msse-team-3-ai-engineering-project.hf.space/health
https://sethmcknight-msse-ai-engineering.hf.space/health

Citation Quality Monitoring

Real-time hallucination detection
Invalid citation logging
Performance metrics tracking

Test Execution

# Run comprehensive test suite
./scripts/hf_test_runner.sh

# Run specific test categories
pytest -m "hf and unit" -v
pytest -m "citation" -v

🎯 Key Features Validated

✅ Citation Hallucination Fix

Problem: LLM generated document_1.md instead of real filenames
Solution: Enhanced prompt engineering + context formatting
Validation: Automated tests verify proper citations

✅ Hybrid Architecture Support

HF Embeddings: Production-ready API integration
OpenRouter LLM: Reliable response generation
Error Handling: Graceful degradation on failures

✅ Test Infrastructure

Mock Services: CI-friendly testing
Integration Tests: Real API validation
Coverage Reporting: Quality metrics

🚀 Deployment Process

1. Development

# Create feature branch
git checkout -b feature/your-feature

# Make changes and test locally
pytest tests/

# Submit PR
git push origin feature/your-feature

2. CI Validation

Automated testing on PR
Code quality checks
Architecture validation

3. Production Deployment

# Merge to main triggers deployment
git checkout main
git merge feature/your-feature
git push origin main

4. Post-Deployment

Automated health checks
Citation validation monitoring
Performance tracking

🔧 Troubleshooting

Common Issues

Test Failures in CI

# Check test runner output
./scripts/hf_test_runner.sh

# Run specific failing tests
pytest tests/test_embedding/ -v --tb=short

HF Deployment Issues

Verify HF_TOKEN secret is configured
Check HuggingFace Space settings
Review deployment logs in GitHub Actions

Citation Validation Warnings

Expected behavior: System catches LLM hallucinations
Check that actual policy filenames are being used
Verify prompt template contains citation fix

Debug Commands

# Validate services locally
python scripts/validate_services.py

# Test citation fix
python scripts/test_e2e_pipeline.py

# Run full pipeline
./scripts/hf_test_runner.sh

📈 Performance Metrics

Test Execution Times

Unit Tests: ~30 seconds
Integration Tests: ~2 minutes
Full Pipeline: ~5 minutes

Deployment Times

HuggingFace Build: ~3-5 minutes
Health Check Validation: ~2 minutes
Total Deployment: ~7-10 minutes

🎉 Success Indicators

✅ All Tests Passing

27+ tests across all components
80%+ code coverage
No critical linting errors

✅ Successful Deployment

HuggingFace Spaces responding
Health endpoints returning 200
Citation validation working

✅ Quality Metrics

Real policy filenames in citations
No document_1.md hallucinations
Proper error handling

Last Updated: October 25, 2025 Pipeline Version: 2.0 Maintainer: MSSE Team 3