# Hugging Face Upload Guide This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face. ## ๐Ÿ“ฆ Package Contents This folder contains everything needed for a complete Hugging Face model repository: ### Model Files - **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format - **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32) - **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - โญ Quantized ONNX model (INT8, recommended) ### Tokenizer Files - **`tokenizer.json`** - Fast tokenizer - **`tokenizer_config.json`** - Tokenizer configuration - **`vocab.txt`** - Vocabulary file - **`special_tokens_map.json`** - Special tokens mapping ### Configuration Files - **`config.json`** - Model architecture configuration - **`metrics.yaml`** - Training and validation metrics ### Documentation - **`README.md`** - Comprehensive model card and documentation - **`model_card.json`** - Machine-readable model metadata - **`requirements.txt`** - Python dependencies - **`.gitattributes`** - Git LFS configuration for large files ### Code Examples - **`inference_example.py`** - Interactive demo and usage examples - **`UPLOAD_GUIDE.md`** - This file ## ๐Ÿš€ Upload Steps ### Option 1: Using Hugging Face CLI (Recommended) ```bash # Install Hugging Face CLI pip install huggingface-hub # Login to Hugging Face huggingface-cli login # Navigate to the model folder cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou # Create repository (replace YOUR_USERNAME with your HF username) huggingface-cli repo create turnlet-bert-multilingual-eou --type model # Initialize git and git-lfs git init git lfs install git lfs track "*.onnx" git lfs track "*.safetensors" # Add all files git add . # Commit git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants" # Add remote (replace YOUR_USERNAME) git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou # Push to Hugging Face git push -u origin main ``` ### Option 2: Using Python API ```python from huggingface_hub import HfApi, create_repo # Initialize API api = HfApi() # Login (you'll be prompted for token) from huggingface_hub import login login() # Create repository repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou" create_repo(repo_id, repo_type="model", exist_ok=True) # Upload folder api.upload_folder( folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou", repo_id=repo_id, repo_type="model", ) print(f"โœ… Model uploaded to: https://huggingface.co/{repo_id}") ``` ### Option 3: Manual Upload via Web Interface 1. Go to https://huggingface.co/new 2. Create a new model repository: `turnlet-bert-multilingual-eou` 3. Use the web interface to upload files: - Upload large files (`.onnx`, `.safetensors`) via Git LFS - Upload smaller files directly via web interface 4. Copy the README.md content to the model card ## โš ๏ธ Important Notes ### Git LFS Required The model files are large and require Git LFS (Large File Storage): - Make sure Git LFS is installed: `git lfs install` - The `.gitattributes` file is already configured - Files tracked: `*.onnx`, `*.safetensors` ### File Sizes - Total repository size: ~1.2 GB - Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB) - Recommended for deployment: INT8 ONNX (132 MB) ### Model Naming Consider these naming conventions: - `YOUR_USERNAME/turnlet-bert-multilingual-eou` - `YOUR_ORG/turnlet-eou-detection-multilingual` - `YOUR_USERNAME/distilbert-eou-en-hi-es` ### Tags to Add When creating the repository, add these tags: - `end-of-utterance` - `eou-detection` - `multilingual` - `distilbert` - `onnx` - `quantized` - `conversational-ai` - `dialogue` - `turn-taking` - `text-classification` ## ๐Ÿงช Testing After Upload After uploading, test the model: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Test loading model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou") tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou") # Quick test text = "Thanks for your help!" inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) print(f"โœ… Model loaded and working! Logits: {outputs.logits}") ``` ## ๐Ÿ“ Post-Upload Checklist After successful upload: - [ ] Verify all files are uploaded - [ ] Test model loading via transformers - [ ] Test ONNX model download - [ ] Update README with correct username/repo paths - [ ] Add license information - [ ] Add model tags and metadata - [ ] Test interactive script - [ ] Share on social media/communities ## ๐Ÿ”— Useful Links - Hugging Face Hub Documentation: https://huggingface.co/docs/hub - Git LFS: https://git-lfs.github.com/ - Model Cards Guide: https://huggingface.co/docs/hub/model-cards - ONNX Models: https://huggingface.co/docs/hub/onnx ## ๐Ÿ’ก Tips 1. **Use descriptive commit messages** when updating the model 2. **Version your models** by creating tags (v1.0, v2.0, etc.) 3. **Monitor downloads** via your Hugging Face dashboard 4. **Respond to community questions** in the community tab 5. **Update metrics** as you improve the model ## ๐Ÿ†˜ Troubleshooting ### Git LFS Bandwidth Issues If you hit LFS bandwidth limits: - Use smaller model variant first - Upload during off-peak hours - Consider Hugging Face Pro for more bandwidth ### Authentication Issues ```bash # Re-login huggingface-cli login --token YOUR_TOKEN # Or set token as environment variable export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN ``` ### Large File Upload Timeout ```bash # Increase timeout git config http.postBuffer 524288000 git config http.lowSpeedLimit 0 git config http.lowSpeedTime 999999 ``` ## โœ… Ready to Upload! Your model is fully prepared and ready for upload to Hugging Face! ๐ŸŽ‰