Spaces:
Sleeping
Sleeping
| # Deployment Pipeline | |
| ```mermaid | |
| graph TB | |
| %% Input Sources | |
| subgraph "Inputs" | |
| TRAINED_MODEL[Trained Model<br/>Local directory] | |
| TRAINING_CONFIG[Training Config<br/>JSON/YAML] | |
| TRAINING_RESULTS[Training Results<br/>Metrics & logs] | |
| MODEL_METADATA[Model Metadata<br/>Name, description, etc.] | |
| end | |
| %% Model Publishing | |
| subgraph "Model Publishing" | |
| PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher] | |
| subgraph "Publishing Steps" | |
| REPO_CREATION[Repository Creation<br/>HF Hub API] | |
| FILE_UPLOAD[File Upload<br/>Model files to HF] | |
| METADATA_UPLOAD[Metadata Upload<br/>Config & results] | |
| end | |
| end | |
| %% Model Card Generation | |
| subgraph "Model Card Generation" | |
| CARD_SCRIPT[generate_model_card.py<br/>Card Generator] | |
| subgraph "Card Components" | |
| TEMPLATE_LOAD[Template Loading<br/>model_card.md] | |
| VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection] | |
| CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.] | |
| end | |
| end | |
| %% Demo Space Deployment | |
| subgraph "Demo Space Deployment" | |
| DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer] | |
| subgraph "Space Setup" | |
| SPACE_CREATION[Space Repository<br/>Create HF Space] | |
| TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files] | |
| ENV_INJECTION[Environment Setup<br/>Model config injection] | |
| SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars] | |
| end | |
| end | |
| %% Space Building & Testing | |
| subgraph "Space Building" | |
| BUILD_TRIGGER[Build Trigger<br/>Automatic build start] | |
| DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt] | |
| MODEL_DOWNLOAD[Model Download<br/>From HF Hub] | |
| APP_INITIALIZATION[App Initialization<br/>Gradio app setup] | |
| end | |
| %% Live Demo | |
| subgraph "Live Demo Space" | |
| GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo] | |
| MODEL_INFERENCE[Model Inference<br/>Real-time ASR] | |
| USER_INTERACTION[User Interaction<br/>Audio upload/playback] | |
| end | |
| %% External Services | |
| subgraph "External Services" | |
| HF_HUB[Hugging Face Hub<br/>Model & Space hosting] | |
| HF_SPACES[HF Spaces Platform<br/>Demo hosting] | |
| end | |
| %% Flow Connections | |
| TRAINED_MODEL --> PUSH_SCRIPT | |
| TRAINING_CONFIG --> PUSH_SCRIPT | |
| TRAINING_RESULTS --> PUSH_SCRIPT | |
| MODEL_METADATA --> PUSH_SCRIPT | |
| PUSH_SCRIPT --> REPO_CREATION | |
| REPO_CREATION --> FILE_UPLOAD | |
| FILE_UPLOAD --> METADATA_UPLOAD | |
| METADATA_UPLOAD --> CARD_SCRIPT | |
| TRAINING_CONFIG --> CARD_SCRIPT | |
| TRAINING_RESULTS --> CARD_SCRIPT | |
| CARD_SCRIPT --> TEMPLATE_LOAD | |
| TEMPLATE_LOAD --> VARIABLE_REPLACEMENT | |
| VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING | |
| CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT | |
| METADATA_UPLOAD --> DEPLOY_SCRIPT | |
| DEPLOY_SCRIPT --> SPACE_CREATION | |
| SPACE_CREATION --> TEMPLATE_COPY | |
| TEMPLATE_COPY --> ENV_INJECTION | |
| ENV_INJECTION --> SECRET_SETUP | |
| SECRET_SETUP --> BUILD_TRIGGER | |
| BUILD_TRIGGER --> DEPENDENCY_INSTALL | |
| DEPENDENCY_INSTALL --> MODEL_DOWNLOAD | |
| MODEL_DOWNLOAD --> APP_INITIALIZATION | |
| APP_INITIALIZATION --> GRADIO_INTERFACE | |
| GRADIO_INTERFACE --> MODEL_INFERENCE | |
| MODEL_INFERENCE --> USER_INTERACTION | |
| HF_HUB --> MODEL_DOWNLOAD | |
| HF_SPACES --> GRADIO_INTERFACE | |
| %% Styling | |
| classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px | |
| classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px | |
| classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px | |
| classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px | |
| classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px | |
| classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px | |
| classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px | |
| class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input | |
| class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing | |
| class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation | |
| class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment | |
| class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building | |
| class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo | |
| class HF_HUB,HF_SPACES external | |
| ``` | |
| ## Deployment Pipeline Overview | |
| This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces. | |
| ### Input Sources | |
| #### Trained Model Artifacts | |
| - **Model Files**: `model.safetensors`, `config.json`, `tokenizer.json` | |
| - **Training Config**: Hyperparameters and training setup | |
| - **Training Results**: Metrics, loss curves, evaluation results | |
| - **Model Metadata**: Name, description, base model information | |
| ### Model Publishing Phase | |
| #### push_to_huggingface.py Script | |
| ```python | |
| # Initialize publisher | |
| pusher = HuggingFacePusher( | |
| model_path=output_dir, | |
| repo_name=repo_name, | |
| token=hf_token | |
| ) | |
| # Push model | |
| success = pusher.push_model(training_config, results) | |
| ``` | |
| #### Publishing Steps | |
| 1. **Repository Creation**: Create HF Hub repository | |
| 2. **File Upload**: Upload all model files | |
| 3. **Metadata Upload**: Upload training config and results | |
| ### Model Card Generation | |
| #### generate_model_card.py Script | |
| ```python | |
| # Create generator | |
| generator = ModelCardGenerator() | |
| # Generate card | |
| variables = { | |
| "model_name": model_name, | |
| "repo_name": repo_id, | |
| "base_model": base_model, | |
| # ... other variables | |
| } | |
| content = generator.generate_model_card(variables) | |
| ``` | |
| #### Card Processing | |
| 1. **Template Loading**: Load from `templates/model_card.md` | |
| 2. **Variable Replacement**: Inject actual values | |
| 3. **Conditional Processing**: Handle optional sections | |
| ### Demo Space Deployment | |
| #### deploy_demo_space.py Script | |
| ```python | |
| # Initialize deployer | |
| deployer = DemoSpaceDeployer( | |
| hf_token=token, | |
| hf_username=username, | |
| model_id=model_id, | |
| demo_type="voxtral" | |
| ) | |
| # Deploy space | |
| success = deployer.deploy() | |
| ``` | |
| #### Space Setup Process | |
| 1. **Space Creation**: Create HF Space repository | |
| 2. **Template Copying**: Copy demo template files | |
| 3. **Environment Injection**: Set model-specific variables | |
| 4. **Secret Configuration**: Configure HF_TOKEN and model variables | |
| ### Space Building Process | |
| #### Automatic Build Trigger | |
| - **Dependency Installation**: `pip install -r requirements.txt` | |
| - **Model Download**: Download model from HF Hub | |
| - **App Initialization**: Setup Gradio application | |
| #### Demo Template Structure | |
| ``` | |
| templates/spaces/demo_voxtral/ | |
| βββ app.py # Main Gradio application | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # Space documentation | |
| ``` | |
| ### Live Demo Features | |
| #### Gradio Interface | |
| - **Audio Upload**: File upload or recording | |
| - **Real-time Inference**: Live ASR transcription | |
| - **Interactive Controls**: Model parameters, settings | |
| #### Model Inference Pipeline | |
| - **Audio Processing**: Convert to model inputs | |
| - **Transcription Generation**: Run ASR inference | |
| - **Result Display**: Show transcription with confidence | |
| ### Configuration Management | |
| #### Environment Variables | |
| ```python | |
| # Set in Space secrets/environment | |
| os.environ['HF_MODEL_ID'] = model_id | |
| os.environ['MODEL_NAME'] = model_name | |
| os.environ['HF_TOKEN'] = token # For model access | |
| ``` | |
| #### Demo-Specific Settings | |
| - **Model Configuration**: Base model, subfolder, quantization | |
| - **UI Branding**: Custom titles, descriptions, links | |
| - **Example Prompts**: Pre-configured demo examples | |
| ### Error Handling & Monitoring | |
| #### Build Process Monitoring | |
| - **Build Logs**: Real-time build status | |
| - **Error Detection**: Failed dependency installation | |
| - **Retry Logic**: Automatic rebuild on failure | |
| #### Runtime Monitoring | |
| - **Space Health**: Uptime and responsiveness | |
| - **Model Loading**: Successful model initialization | |
| - **Inference Errors**: Runtime error handling | |
| ### Security Considerations | |
| #### Token Management | |
| - **Read-Only Tokens**: Use read-only tokens for demo spaces | |
| - **Secret Storage**: Secure storage of HF_TOKEN | |
| - **Access Control**: Proper repository permissions | |
| #### Resource Management | |
| - **Memory Limits**: Space hardware constraints | |
| - **Timeout Handling**: Inference timeout protection | |
| - **Rate Limiting**: Prevent abuse | |
| ### Integration Points | |
| #### With Training Scripts | |
| - **Training Config**: Used for model card generation | |
| - **Training Results**: Included in model metadata | |
| - **Model Path**: Direct path to trained model files | |
| #### With Interface (interface.py) | |
| - **Parameter Passing**: Deployment settings from UI | |
| - **Progress Updates**: Deployment progress to user | |
| - **Result Links**: Direct links to deployed spaces | |
| ### Deployment Workflows | |
| #### Full Pipeline (Recommended) | |
| 1. Train model β Generate model card β Push to Hub β Deploy demo | |
| 2. All steps automated through single interface action | |
| 3. Comprehensive error handling and rollback | |
| #### Manual Deployment | |
| 1. Use individual scripts for granular control | |
| 2. Custom configuration and branding | |
| 3. Debugging and troubleshooting capabilities | |
| #### CI/CD Integration | |
| - **Automated Triggers**: GitHub Actions integration | |
| - **Version Control**: Model versioning and releases | |
| - **Testing**: Automated demo testing | |
| ### Performance Optimization | |
| #### Space Hardware Selection | |
| - **CPU Basic**: Free tier, sufficient for small models | |
| - **GPU Options**: For larger models requiring acceleration | |
| - **Memory Scaling**: Based on model size requirements | |
| #### Model Optimization | |
| - **Quantization**: 4-bit quantization for smaller footprint | |
| - **Model Sharding**: Split large models across memory | |
| - **Caching**: Model caching for faster cold starts | |
| ### Monitoring & Analytics | |
| #### Space Analytics | |
| - **Usage Metrics**: Daily active users, session duration | |
| - **Performance Metrics**: Inference latency, error rates | |
| - **User Feedback**: Demo effectiveness and issues | |
| #### Model Analytics | |
| - **Download Stats**: Model popularity and usage | |
| - **Citation Tracking**: Academic and research usage | |
| - **Community Feedback**: GitHub issues and discussions | |
| See also: | |
| - [Architecture Overview](architecture.md) | |
| - [Training Pipeline](training-pipeline.md) | |
| - [Data Flow](data-flow.md) | |