File size: 5,887 Bytes
f70597d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# Hugging Face Upload Guide
This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.
## π¦ Package Contents
This folder contains everything needed for a complete Hugging Face model repository:
### Model Files
- **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format
- **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32)
- **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - β Quantized ONNX model (INT8, recommended)
### Tokenizer Files
- **`tokenizer.json`** - Fast tokenizer
- **`tokenizer_config.json`** - Tokenizer configuration
- **`vocab.txt`** - Vocabulary file
- **`special_tokens_map.json`** - Special tokens mapping
### Configuration Files
- **`config.json`** - Model architecture configuration
- **`metrics.yaml`** - Training and validation metrics
### Documentation
- **`README.md`** - Comprehensive model card and documentation
- **`model_card.json`** - Machine-readable model metadata
- **`requirements.txt`** - Python dependencies
- **`.gitattributes`** - Git LFS configuration for large files
### Code Examples
- **`inference_example.py`** - Interactive demo and usage examples
- **`UPLOAD_GUIDE.md`** - This file
## π Upload Steps
### Option 1: Using Hugging Face CLI (Recommended)
```bash
# Install Hugging Face CLI
pip install huggingface-hub
# Login to Hugging Face
huggingface-cli login
# Navigate to the model folder
cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou
# Create repository (replace YOUR_USERNAME with your HF username)
huggingface-cli repo create turnlet-bert-multilingual-eou --type model
# Initialize git and git-lfs
git init
git lfs install
git lfs track "*.onnx"
git lfs track "*.safetensors"
# Add all files
git add .
# Commit
git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"
# Add remote (replace YOUR_USERNAME)
git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou
# Push to Hugging Face
git push -u origin main
```
### Option 2: Using Python API
```python
from huggingface_hub import HfApi, create_repo
# Initialize API
api = HfApi()
# Login (you'll be prompted for token)
from huggingface_hub import login
login()
# Create repository
repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
create_repo(repo_id, repo_type="model", exist_ok=True)
# Upload folder
api.upload_folder(
folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
repo_id=repo_id,
repo_type="model",
)
print(f"β
Model uploaded to: https://huggingface.co/{repo_id}")
```
### Option 3: Manual Upload via Web Interface
1. Go to https://huggingface.co/new
2. Create a new model repository: `turnlet-bert-multilingual-eou`
3. Use the web interface to upload files:
- Upload large files (`.onnx`, `.safetensors`) via Git LFS
- Upload smaller files directly via web interface
4. Copy the README.md content to the model card
## β οΈ Important Notes
### Git LFS Required
The model files are large and require Git LFS (Large File Storage):
- Make sure Git LFS is installed: `git lfs install`
- The `.gitattributes` file is already configured
- Files tracked: `*.onnx`, `*.safetensors`
### File Sizes
- Total repository size: ~1.2 GB
- Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
- Recommended for deployment: INT8 ONNX (132 MB)
### Model Naming
Consider these naming conventions:
- `YOUR_USERNAME/turnlet-bert-multilingual-eou`
- `YOUR_ORG/turnlet-eou-detection-multilingual`
- `YOUR_USERNAME/distilbert-eou-en-hi-es`
### Tags to Add
When creating the repository, add these tags:
- `end-of-utterance`
- `eou-detection`
- `multilingual`
- `distilbert`
- `onnx`
- `quantized`
- `conversational-ai`
- `dialogue`
- `turn-taking`
- `text-classification`
## π§ͺ Testing After Upload
After uploading, test the model:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Test loading
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
# Quick test
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(f"β
Model loaded and working! Logits: {outputs.logits}")
```
## π Post-Upload Checklist
After successful upload:
- [ ] Verify all files are uploaded
- [ ] Test model loading via transformers
- [ ] Test ONNX model download
- [ ] Update README with correct username/repo paths
- [ ] Add license information
- [ ] Add model tags and metadata
- [ ] Test interactive script
- [ ] Share on social media/communities
## π Useful Links
- Hugging Face Hub Documentation: https://huggingface.co/docs/hub
- Git LFS: https://git-lfs.github.com/
- Model Cards Guide: https://huggingface.co/docs/hub/model-cards
- ONNX Models: https://huggingface.co/docs/hub/onnx
## π‘ Tips
1. **Use descriptive commit messages** when updating the model
2. **Version your models** by creating tags (v1.0, v2.0, etc.)
3. **Monitor downloads** via your Hugging Face dashboard
4. **Respond to community questions** in the community tab
5. **Update metrics** as you improve the model
## π Troubleshooting
### Git LFS Bandwidth Issues
If you hit LFS bandwidth limits:
- Use smaller model variant first
- Upload during off-peak hours
- Consider Hugging Face Pro for more bandwidth
### Authentication Issues
```bash
# Re-login
huggingface-cli login --token YOUR_TOKEN
# Or set token as environment variable
export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN
```
### Large File Upload Timeout
```bash
# Increase timeout
git config http.postBuffer 524288000
git config http.lowSpeedLimit 0
git config http.lowSpeedTime 999999
```
## β
Ready to Upload!
Your model is fully prepared and ready for upload to Hugging Face! π
|