| # Hugging Face Upload Guide | |
| This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face. | |
| ## π¦ Package Contents | |
| This folder contains everything needed for a complete Hugging Face model repository: | |
| ### Model Files | |
| - **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format | |
| - **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32) | |
| - **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - β Quantized ONNX model (INT8, recommended) | |
| ### Tokenizer Files | |
| - **`tokenizer.json`** - Fast tokenizer | |
| - **`tokenizer_config.json`** - Tokenizer configuration | |
| - **`vocab.txt`** - Vocabulary file | |
| - **`special_tokens_map.json`** - Special tokens mapping | |
| ### Configuration Files | |
| - **`config.json`** - Model architecture configuration | |
| - **`metrics.yaml`** - Training and validation metrics | |
| ### Documentation | |
| - **`README.md`** - Comprehensive model card and documentation | |
| - **`model_card.json`** - Machine-readable model metadata | |
| - **`requirements.txt`** - Python dependencies | |
| - **`.gitattributes`** - Git LFS configuration for large files | |
| ### Code Examples | |
| - **`inference_example.py`** - Interactive demo and usage examples | |
| - **`UPLOAD_GUIDE.md`** - This file | |
| ## π Upload Steps | |
| ### Option 1: Using Hugging Face CLI (Recommended) | |
| ```bash | |
| # Install Hugging Face CLI | |
| pip install huggingface-hub | |
| # Login to Hugging Face | |
| huggingface-cli login | |
| # Navigate to the model folder | |
| cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou | |
| # Create repository (replace YOUR_USERNAME with your HF username) | |
| huggingface-cli repo create turnlet-bert-multilingual-eou --type model | |
| # Initialize git and git-lfs | |
| git init | |
| git lfs install | |
| git lfs track "*.onnx" | |
| git lfs track "*.safetensors" | |
| # Add all files | |
| git add . | |
| # Commit | |
| git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants" | |
| # Add remote (replace YOUR_USERNAME) | |
| git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou | |
| # Push to Hugging Face | |
| git push -u origin main | |
| ``` | |
| ### Option 2: Using Python API | |
| ```python | |
| from huggingface_hub import HfApi, create_repo | |
| # Initialize API | |
| api = HfApi() | |
| # Login (you'll be prompted for token) | |
| from huggingface_hub import login | |
| login() | |
| # Create repository | |
| repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou" | |
| create_repo(repo_id, repo_type="model", exist_ok=True) | |
| # Upload folder | |
| api.upload_folder( | |
| folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou", | |
| repo_id=repo_id, | |
| repo_type="model", | |
| ) | |
| print(f"β Model uploaded to: https://huggingface.co/{repo_id}") | |
| ``` | |
| ### Option 3: Manual Upload via Web Interface | |
| 1. Go to https://huggingface.co/new | |
| 2. Create a new model repository: `turnlet-bert-multilingual-eou` | |
| 3. Use the web interface to upload files: | |
| - Upload large files (`.onnx`, `.safetensors`) via Git LFS | |
| - Upload smaller files directly via web interface | |
| 4. Copy the README.md content to the model card | |
| ## β οΈ Important Notes | |
| ### Git LFS Required | |
| The model files are large and require Git LFS (Large File Storage): | |
| - Make sure Git LFS is installed: `git lfs install` | |
| - The `.gitattributes` file is already configured | |
| - Files tracked: `*.onnx`, `*.safetensors` | |
| ### File Sizes | |
| - Total repository size: ~1.2 GB | |
| - Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB) | |
| - Recommended for deployment: INT8 ONNX (132 MB) | |
| ### Model Naming | |
| Consider these naming conventions: | |
| - `YOUR_USERNAME/turnlet-bert-multilingual-eou` | |
| - `YOUR_ORG/turnlet-eou-detection-multilingual` | |
| - `YOUR_USERNAME/distilbert-eou-en-hi-es` | |
| ### Tags to Add | |
| When creating the repository, add these tags: | |
| - `end-of-utterance` | |
| - `eou-detection` | |
| - `multilingual` | |
| - `distilbert` | |
| - `onnx` | |
| - `quantized` | |
| - `conversational-ai` | |
| - `dialogue` | |
| - `turn-taking` | |
| - `text-classification` | |
| ## π§ͺ Testing After Upload | |
| After uploading, test the model: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| # Test loading | |
| model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou") | |
| tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou") | |
| # Quick test | |
| text = "Thanks for your help!" | |
| inputs = tokenizer(text, return_tensors="pt") | |
| outputs = model(**inputs) | |
| print(f"β Model loaded and working! Logits: {outputs.logits}") | |
| ``` | |
| ## π Post-Upload Checklist | |
| After successful upload: | |
| - [ ] Verify all files are uploaded | |
| - [ ] Test model loading via transformers | |
| - [ ] Test ONNX model download | |
| - [ ] Update README with correct username/repo paths | |
| - [ ] Add license information | |
| - [ ] Add model tags and metadata | |
| - [ ] Test interactive script | |
| - [ ] Share on social media/communities | |
| ## π Useful Links | |
| - Hugging Face Hub Documentation: https://huggingface.co/docs/hub | |
| - Git LFS: https://git-lfs.github.com/ | |
| - Model Cards Guide: https://huggingface.co/docs/hub/model-cards | |
| - ONNX Models: https://huggingface.co/docs/hub/onnx | |
| ## π‘ Tips | |
| 1. **Use descriptive commit messages** when updating the model | |
| 2. **Version your models** by creating tags (v1.0, v2.0, etc.) | |
| 3. **Monitor downloads** via your Hugging Face dashboard | |
| 4. **Respond to community questions** in the community tab | |
| 5. **Update metrics** as you improve the model | |
| ## π Troubleshooting | |
| ### Git LFS Bandwidth Issues | |
| If you hit LFS bandwidth limits: | |
| - Use smaller model variant first | |
| - Upload during off-peak hours | |
| - Consider Hugging Face Pro for more bandwidth | |
| ### Authentication Issues | |
| ```bash | |
| # Re-login | |
| huggingface-cli login --token YOUR_TOKEN | |
| # Or set token as environment variable | |
| export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN | |
| ``` | |
| ### Large File Upload Timeout | |
| ```bash | |
| # Increase timeout | |
| git config http.postBuffer 524288000 | |
| git config http.lowSpeedLimit 0 | |
| git config http.lowSpeedTime 999999 | |
| ``` | |
| ## β Ready to Upload! | |
| Your model is fully prepared and ready for upload to Hugging Face! π | |