Spaces:
Sleeping
Sleeping
HuggingFace CI/CD Pipeline Documentation
π Overview
This repository implements a comprehensive CI/CD pipeline for deploying the Corporate Policy Assistant to HuggingFace Spaces with automated testing and validation.
ποΈ Architecture
Hybrid AI System
- Embeddings: HuggingFace Inference API (
intfloat/multilingual-e5-large) - LLM: OpenRouter (
microsoft/wizardlm-2-8x22b) - Citation Validation: Real-time hallucination detection
- Vector Database: ChromaDB for document storage
CI/CD Components
- GitHub Actions: Automated testing and deployment
- HuggingFace Spaces: Production environment
- Comprehensive Test Suite: 27+ tests covering all components
- Code Quality: Black, isort, flake8 validation
π Pipeline Workflow
1. Code Quality Checks
# Formatting validation
black --check .
isort --check-only .
flake8 --max-line-length=88
2. Comprehensive Testing
# Run all tests
pytest -v --cov=src --cov-report=xml
# HF-specific tests
pytest tests/test_embedding/test_hf_embedding_service.py -v
# Citation validation tests
pytest -k citation -v
3. Architecture Validation
- Service initialization checks
- Import validation
- End-to-end pipeline testing
- Citation fix verification
4. Deployment
- Primary:
msse-team-3/ai-engineering-project - Backup:
sethmcknight/msse-ai-engineering - Health Checks: Automated smoke tests
π§ Configuration Files
.github/workflows/hf-ci-cd.yml
Main CI/CD pipeline with:
- Multi-Python version testing (3.10, 3.11)
- Comprehensive test suite
- Automatic HF deployment
- Post-deployment validation
.hf.yml
HuggingFace Space configuration:
title: MSSE AI Engineering - Corporate Policy Assistant
sdk: gradio
app_file: app.py
models:
- intfloat/multilingual-e5-large
pytest.ini
Test configuration with coverage and markers:
[tool.pytest.ini_options]
markers = [
"unit: Unit tests",
"integration: Integration tests",
"hf: HuggingFace specific tests",
"citation: Citation validation tests"
]
π§ͺ Testing Strategy
Unit Tests (Critical)
- β HF Embedding Service: 12 comprehensive tests
- β Prompt Templates: Citation fix validation
- β LLM Components: Response processing
- β Context Formatting: Fixed document numbering
Integration Tests (Non-Critical)
- β οΈ API Integration: Real HF/OpenRouter calls
- β οΈ End-to-End Pipeline: Complete workflow
- β οΈ Service Validation: Production readiness
Coverage Requirements
- Minimum: 80% code coverage
- Focus Areas: Core business logic
- Exclusions: Test files, dev tools
π¦ Pipeline Triggers
Automatic Deployment
- Push to
main: Full pipeline + production deployment - Push to
hf-main-local: HF-specific testing + staging deployment
Pull Request Validation
- All PRs: Full test suite without deployment
- Pre-commit checks: Code quality validation
Manual Triggers
- Emergency Deployment: Manual sync workflow
- Test-only Runs: Validation without deployment
π Required Secrets
Configure these in GitHub repository settings:
# HuggingFace
HF_TOKEN=hf_xxxxxxxxxx
# OpenRouter (for production testing)
OPENROUTER_API_KEY=sk-or-xxxxxxxxxx
# Existing secrets
RENDER_API_KEY=rnd_xxxxxxxxxx
RENDER_SERVICE_ID=srv-xxxxxxxxxx
π Monitoring & Validation
Automated Health Checks
# Production endpoints
https://msse-team-3-ai-engineering-project.hf.space/health
https://sethmcknight-msse-ai-engineering.hf.space/health
Citation Quality Monitoring
- Real-time hallucination detection
- Invalid citation logging
- Performance metrics tracking
Test Execution
# Run comprehensive test suite
./scripts/hf_test_runner.sh
# Run specific test categories
pytest -m "hf and unit" -v
pytest -m "citation" -v
π― Key Features Validated
β Citation Hallucination Fix
- Problem: LLM generated
document_1.mdinstead of real filenames - Solution: Enhanced prompt engineering + context formatting
- Validation: Automated tests verify proper citations
β Hybrid Architecture Support
- HF Embeddings: Production-ready API integration
- OpenRouter LLM: Reliable response generation
- Error Handling: Graceful degradation on failures
β Test Infrastructure
- Mock Services: CI-friendly testing
- Integration Tests: Real API validation
- Coverage Reporting: Quality metrics
π Deployment Process
1. Development
# Create feature branch
git checkout -b feature/your-feature
# Make changes and test locally
pytest tests/
# Submit PR
git push origin feature/your-feature
2. CI Validation
- Automated testing on PR
- Code quality checks
- Architecture validation
3. Production Deployment
# Merge to main triggers deployment
git checkout main
git merge feature/your-feature
git push origin main
4. Post-Deployment
- Automated health checks
- Citation validation monitoring
- Performance tracking
π§ Troubleshooting
Common Issues
Test Failures in CI
# Check test runner output
./scripts/hf_test_runner.sh
# Run specific failing tests
pytest tests/test_embedding/ -v --tb=short
HF Deployment Issues
- Verify
HF_TOKENsecret is configured - Check HuggingFace Space settings
- Review deployment logs in GitHub Actions
Citation Validation Warnings
- Expected behavior: System catches LLM hallucinations
- Check that actual policy filenames are being used
- Verify prompt template contains citation fix
Debug Commands
# Validate services locally
python scripts/validate_services.py
# Test citation fix
python scripts/test_e2e_pipeline.py
# Run full pipeline
./scripts/hf_test_runner.sh
π Performance Metrics
Test Execution Times
- Unit Tests: ~30 seconds
- Integration Tests: ~2 minutes
- Full Pipeline: ~5 minutes
Deployment Times
- HuggingFace Build: ~3-5 minutes
- Health Check Validation: ~2 minutes
- Total Deployment: ~7-10 minutes
π Success Indicators
β All Tests Passing
- 27+ tests across all components
- 80%+ code coverage
- No critical linting errors
β Successful Deployment
- HuggingFace Spaces responding
- Health endpoints returning 200
- Citation validation working
β Quality Metrics
- Real policy filenames in citations
- No
document_1.mdhallucinations - Proper error handling
Last Updated: October 25, 2025 Pipeline Version: 2.0 Maintainer: MSSE Team 3