ai-engineering-project / docs /HF_CI_CD_PIPELINE.md
GitHub Action
Clean deployment without binary files
f884e6e

HuggingFace CI/CD Pipeline Documentation

πŸš€ Overview

This repository implements a comprehensive CI/CD pipeline for deploying the Corporate Policy Assistant to HuggingFace Spaces with automated testing and validation.

πŸ—οΈ Architecture

Hybrid AI System

  • Embeddings: HuggingFace Inference API (intfloat/multilingual-e5-large)
  • LLM: OpenRouter (microsoft/wizardlm-2-8x22b)
  • Citation Validation: Real-time hallucination detection
  • Vector Database: ChromaDB for document storage

CI/CD Components

  1. GitHub Actions: Automated testing and deployment
  2. HuggingFace Spaces: Production environment
  3. Comprehensive Test Suite: 27+ tests covering all components
  4. Code Quality: Black, isort, flake8 validation

πŸ“‹ Pipeline Workflow

1. Code Quality Checks

# Formatting validation
black --check .
isort --check-only .
flake8 --max-line-length=88

2. Comprehensive Testing

# Run all tests
pytest -v --cov=src --cov-report=xml

# HF-specific tests
pytest tests/test_embedding/test_hf_embedding_service.py -v

# Citation validation tests
pytest -k citation -v

3. Architecture Validation

  • Service initialization checks
  • Import validation
  • End-to-end pipeline testing
  • Citation fix verification

4. Deployment

  • Primary: msse-team-3/ai-engineering-project
  • Backup: sethmcknight/msse-ai-engineering
  • Health Checks: Automated smoke tests

πŸ”§ Configuration Files

.github/workflows/hf-ci-cd.yml

Main CI/CD pipeline with:

  • Multi-Python version testing (3.10, 3.11)
  • Comprehensive test suite
  • Automatic HF deployment
  • Post-deployment validation

.hf.yml

HuggingFace Space configuration:

title: MSSE AI Engineering - Corporate Policy Assistant
sdk: gradio
app_file: app.py
models:
  - intfloat/multilingual-e5-large

pytest.ini

Test configuration with coverage and markers:

[tool.pytest.ini_options]
markers = [
    "unit: Unit tests",
    "integration: Integration tests",
    "hf: HuggingFace specific tests",
    "citation: Citation validation tests"
]

πŸ§ͺ Testing Strategy

Unit Tests (Critical)

  • βœ… HF Embedding Service: 12 comprehensive tests
  • βœ… Prompt Templates: Citation fix validation
  • βœ… LLM Components: Response processing
  • βœ… Context Formatting: Fixed document numbering

Integration Tests (Non-Critical)

  • ⚠️ API Integration: Real HF/OpenRouter calls
  • ⚠️ End-to-End Pipeline: Complete workflow
  • ⚠️ Service Validation: Production readiness

Coverage Requirements

  • Minimum: 80% code coverage
  • Focus Areas: Core business logic
  • Exclusions: Test files, dev tools

🚦 Pipeline Triggers

Automatic Deployment

  • Push to main: Full pipeline + production deployment
  • Push to hf-main-local: HF-specific testing + staging deployment

Pull Request Validation

  • All PRs: Full test suite without deployment
  • Pre-commit checks: Code quality validation

Manual Triggers

  • Emergency Deployment: Manual sync workflow
  • Test-only Runs: Validation without deployment

πŸ” Required Secrets

Configure these in GitHub repository settings:

# HuggingFace
HF_TOKEN=hf_xxxxxxxxxx

# OpenRouter (for production testing)
OPENROUTER_API_KEY=sk-or-xxxxxxxxxx

# Existing secrets
RENDER_API_KEY=rnd_xxxxxxxxxx
RENDER_SERVICE_ID=srv-xxxxxxxxxx

πŸ“Š Monitoring & Validation

Automated Health Checks

# Production endpoints
https://msse-team-3-ai-engineering-project.hf.space/health
https://sethmcknight-msse-ai-engineering.hf.space/health

Citation Quality Monitoring

  • Real-time hallucination detection
  • Invalid citation logging
  • Performance metrics tracking

Test Execution

# Run comprehensive test suite
./scripts/hf_test_runner.sh

# Run specific test categories
pytest -m "hf and unit" -v
pytest -m "citation" -v

🎯 Key Features Validated

βœ… Citation Hallucination Fix

  • Problem: LLM generated document_1.md instead of real filenames
  • Solution: Enhanced prompt engineering + context formatting
  • Validation: Automated tests verify proper citations

βœ… Hybrid Architecture Support

  • HF Embeddings: Production-ready API integration
  • OpenRouter LLM: Reliable response generation
  • Error Handling: Graceful degradation on failures

βœ… Test Infrastructure

  • Mock Services: CI-friendly testing
  • Integration Tests: Real API validation
  • Coverage Reporting: Quality metrics

πŸš€ Deployment Process

1. Development

# Create feature branch
git checkout -b feature/your-feature

# Make changes and test locally
pytest tests/

# Submit PR
git push origin feature/your-feature

2. CI Validation

  • Automated testing on PR
  • Code quality checks
  • Architecture validation

3. Production Deployment

# Merge to main triggers deployment
git checkout main
git merge feature/your-feature
git push origin main

4. Post-Deployment

  • Automated health checks
  • Citation validation monitoring
  • Performance tracking

πŸ”§ Troubleshooting

Common Issues

Test Failures in CI

# Check test runner output
./scripts/hf_test_runner.sh

# Run specific failing tests
pytest tests/test_embedding/ -v --tb=short

HF Deployment Issues

  • Verify HF_TOKEN secret is configured
  • Check HuggingFace Space settings
  • Review deployment logs in GitHub Actions

Citation Validation Warnings

  • Expected behavior: System catches LLM hallucinations
  • Check that actual policy filenames are being used
  • Verify prompt template contains citation fix

Debug Commands

# Validate services locally
python scripts/validate_services.py

# Test citation fix
python scripts/test_e2e_pipeline.py

# Run full pipeline
./scripts/hf_test_runner.sh

πŸ“ˆ Performance Metrics

Test Execution Times

  • Unit Tests: ~30 seconds
  • Integration Tests: ~2 minutes
  • Full Pipeline: ~5 minutes

Deployment Times

  • HuggingFace Build: ~3-5 minutes
  • Health Check Validation: ~2 minutes
  • Total Deployment: ~7-10 minutes

πŸŽ‰ Success Indicators

βœ… All Tests Passing

  • 27+ tests across all components
  • 80%+ code coverage
  • No critical linting errors

βœ… Successful Deployment

  • HuggingFace Spaces responding
  • Health endpoints returning 200
  • Citation validation working

βœ… Quality Metrics

  • Real policy filenames in citations
  • No document_1.md hallucinations
  • Proper error handling

Last Updated: October 25, 2025 Pipeline Version: 2.0 Maintainer: MSSE Team 3