Spaces:
Sleeping
Sleeping
| # π Push to Hugging Face Script Guide | |
| ## Overview | |
| The `push_to_huggingface.py` script has been enhanced to integrate with **HF Datasets** for experiment tracking and provides complete model deployment with persistent experiment storage. | |
| ## π Key Improvements | |
| ### **1. HF Datasets Integration** | |
| - β **Dataset Repository Support**: Configurable dataset repository for experiment storage | |
| - β **Environment Variables**: Automatic detection of `HF_TOKEN` and `TRACKIO_DATASET_REPO` | |
| - β **Enhanced Logging**: Logs push actions to both Trackio and HF Datasets | |
| - β **Model Card Integration**: Includes dataset repository information in model cards | |
| ### **2. Enhanced Configuration** | |
| - β **Flexible Token Input**: Multiple ways to provide HF token | |
| - β **Dataset Repository Tracking**: Links models to their experiment datasets | |
| - β **Environment Variable Support**: Fallback to environment variables | |
| - β **Command Line Arguments**: New arguments for HF Datasets integration | |
| ### **3. Improved Model Cards** | |
| - β **Dataset Repository Info**: Shows which dataset contains experiment data | |
| - β **Experiment Tracking Section**: Explains how to access training data | |
| - β **Enhanced Documentation**: Better model cards with experiment links | |
| ## π Usage Examples | |
| ### **Basic Usage** | |
| ```bash | |
| # Push model with default settings | |
| python push_to_huggingface.py /path/to/model username/repo-name | |
| ``` | |
| ### **With HF Datasets Integration** | |
| ```bash | |
| # Push model with custom dataset repository | |
| python push_to_huggingface.py /path/to/model username/repo-name \ | |
| --dataset-repo username/experiments | |
| ``` | |
| ### **With Custom Token** | |
| ```bash | |
| # Push model with custom HF token | |
| python push_to_huggingface.py /path/to/model username/repo-name \ | |
| --hf-token your_token_here | |
| ``` | |
| ### **Complete Example** | |
| ```bash | |
| # Push model with all options | |
| python push_to_huggingface.py /path/to/model username/repo-name \ | |
| --dataset-repo username/experiments \ | |
| --hf-token your_token_here \ | |
| --private \ | |
| --experiment-name "smollm3_finetune_v2" | |
| ``` | |
| ## π§ Command Line Arguments | |
| | Argument | Required | Default | Description | | |
| |----------|----------|---------|-------------| | |
| | `model_path` | β Yes | None | Path to trained model directory | | |
| | `repo_name` | β Yes | None | HF repository name (username/repo-name) | | |
| | `--token` | β No | `HF_TOKEN` env | Hugging Face token | | |
| | `--hf-token` | β No | `HF_TOKEN` env | HF token (alternative to --token) | | |
| | `--private` | β No | False | Make repository private | | |
| | `--trackio-url` | β No | None | Trackio Space URL for logging | | |
| | `--experiment-name` | β No | None | Experiment name for Trackio | | |
| | `--dataset-repo` | β No | `TRACKIO_DATASET_REPO` env | HF Dataset repository | | |
| ## π οΈ Configuration Methods | |
| ### **Method 1: Command Line Arguments** | |
| ```bash | |
| python push_to_huggingface.py model_path repo_name \ | |
| --dataset-repo username/experiments \ | |
| --hf-token your_token_here | |
| ``` | |
| ### **Method 2: Environment Variables** | |
| ```bash | |
| export HF_TOKEN=your_token_here | |
| export TRACKIO_DATASET_REPO=username/experiments | |
| python push_to_huggingface.py model_path repo_name | |
| ``` | |
| ### **Method 3: Hybrid Approach** | |
| ```bash | |
| # Set defaults via environment variables | |
| export HF_TOKEN=your_token_here | |
| export TRACKIO_DATASET_REPO=username/experiments | |
| # Override specific values via command line | |
| python push_to_huggingface.py model_path repo_name \ | |
| --dataset-repo username/specific-experiments | |
| ``` | |
| ## π What Gets Pushed | |
| ### **Model Files** | |
| - β **Model Weights**: `pytorch_model.bin` | |
| - β **Configuration**: `config.json` | |
| - β **Tokenizer**: `tokenizer.json`, `tokenizer_config.json` | |
| - β **All Other Files**: Any additional files in model directory | |
| ### **Documentation** | |
| - β **Model Card**: Comprehensive README.md with model information | |
| - β **Training Configuration**: JSON configuration used for training | |
| - β **Training Results**: JSON results and metrics | |
| - β **Training Logs**: Text logs from training process | |
| ### **Experiment Data** | |
| - β **Dataset Repository**: Links to HF Dataset containing experiment data | |
| - β **Training Metrics**: All training metrics stored in dataset | |
| - β **Configuration**: Training configuration stored in dataset | |
| - β **Artifacts**: Training artifacts and logs | |
| ## π Enhanced Model Cards | |
| The improved script creates enhanced model cards that include: | |
| ### **Model Information** | |
| - Base model and architecture | |
| - Training date and model size | |
| - **Dataset repository** for experiment data | |
| ### **Training Configuration** | |
| - Complete training parameters | |
| - Hardware information | |
| - Training duration and steps | |
| ### **Experiment Tracking** | |
| - Links to HF Dataset repository | |
| - Instructions for accessing experiment data | |
| - Training metrics and results | |
| ### **Usage Examples** | |
| - Code examples for loading and using the model | |
| - Generation examples | |
| - Performance information | |
| ## π Logging Integration | |
| ### **Trackio Logging** | |
| - β **Push Actions**: Logs model push events | |
| - β **Model Information**: Repository name, size, configuration | |
| - β **Training Data**: Links to experiment dataset | |
| ### **HF Datasets Logging** | |
| - β **Experiment Summary**: Final training summary | |
| - β **Push Metadata**: Model repository and push date | |
| - β **Configuration**: Complete training configuration | |
| ### **Dual Storage** | |
| - β **Trackio**: Real-time monitoring and visualization | |
| - β **HF Datasets**: Persistent experiment storage | |
| - β **Synchronized**: Both systems updated together | |
| ## π¨ Troubleshooting | |
| ### **Issue: "Missing required files"** | |
| **Solutions**: | |
| 1. Check model directory contains required files | |
| 2. Ensure model was saved correctly during training | |
| 3. Verify file permissions | |
| ### **Issue: "Failed to create repository"** | |
| **Solutions**: | |
| 1. Check HF token has write permissions | |
| 2. Verify repository name format: `username/repo-name` | |
| 3. Ensure repository doesn't already exist (or use `--private`) | |
| ### **Issue: "Failed to upload files"** | |
| **Solutions**: | |
| 1. Check network connectivity | |
| 2. Verify HF token is valid | |
| 3. Ensure repository was created successfully | |
| ### **Issue: "Dataset repository not found"** | |
| **Solutions**: | |
| 1. Check dataset repository exists | |
| 2. Verify HF token has read access | |
| 3. Use `--dataset-repo` to specify correct repository | |
| ## π Workflow Integration | |
| ### **Complete Training Workflow** | |
| 1. **Train Model**: Use training scripts with monitoring | |
| 2. **Monitor Progress**: View metrics in Trackio interface | |
| 3. **Push Model**: Use improved push script | |
| 4. **Access Data**: View experiments in HF Dataset repository | |
| ### **Example Workflow** | |
| ```bash | |
| # 1. Train model with monitoring | |
| python train.py config/train_smollm3_openhermes_fr.py \ | |
| --experiment_name "smollm3_french_v2" | |
| # 2. Push model to HF Hub | |
| python push_to_huggingface.py outputs/model username/smollm3-french \ | |
| --dataset-repo username/experiments \ | |
| --experiment-name "smollm3_french_v2" | |
| # 3. View results | |
| # - Model: https://huggingface.co/username/smollm3-french | |
| # - Experiments: https://huggingface.co/datasets/username/experiments | |
| # - Trackio: Your Trackio Space interface | |
| ``` | |
| ## π― Benefits | |
| ### **For Model Deployment** | |
| - β **Complete Documentation**: Enhanced model cards with experiment links | |
| - β **Persistent Storage**: Experiment data stored in HF Datasets | |
| - β **Easy Access**: Direct links to training data and metrics | |
| - β **Reproducibility**: Complete training configuration included | |
| ### **For Experiment Management** | |
| - β **Centralized Storage**: All experiments in HF Dataset repository | |
| - β **Version Control**: Model versions linked to experiment data | |
| - β **Collaboration**: Share experiments and models easily | |
| - β **Searchability**: Easy to find specific experiments | |
| ### **For Development** | |
| - β **Flexible Configuration**: Multiple ways to set parameters | |
| - β **Backward Compatible**: Works with existing setups | |
| - β **Error Handling**: Clear error messages and troubleshooting | |
| - β **Integration**: Works with existing monitoring system | |
| ## π Testing Results | |
| All push script tests passed: | |
| - β **HuggingFacePusher Initialization**: Works with new parameters | |
| - β **Model Card Creation**: Includes HF Datasets integration | |
| - β **Logging Integration**: Logs to both Trackio and HF Datasets | |
| - β **Argument Parsing**: Handles new command line arguments | |
| - β **Environment Variables**: Proper fallback handling | |
| ## π Migration Guide | |
| ### **From Old Script** | |
| ```bash | |
| # Old way | |
| python push_to_huggingface.py model_path repo_name --token your_token | |
| # New way (same functionality) | |
| python push_to_huggingface.py model_path repo_name --hf-token your_token | |
| # New way with HF Datasets | |
| python push_to_huggingface.py model_path repo_name \ | |
| --hf-token your_token \ | |
| --dataset-repo username/experiments | |
| ``` | |
| ### **Environment Variables** | |
| ```bash | |
| # Set environment variables for automatic detection | |
| export HF_TOKEN=your_token_here | |
| export TRACKIO_DATASET_REPO=username/experiments | |
| # Then use simple command | |
| python push_to_huggingface.py model_path repo_name | |
| ``` | |
| --- | |
| **π Your push script is now fully integrated with HF Datasets for complete experiment tracking and model deployment!** |