Estonel's picture
Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants
f70597d verified
# Hugging Face Upload Guide
This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.
## πŸ“¦ Package Contents
This folder contains everything needed for a complete Hugging Face model repository:
### Model Files
- **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format
- **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32)
- **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - ⭐ Quantized ONNX model (INT8, recommended)
### Tokenizer Files
- **`tokenizer.json`** - Fast tokenizer
- **`tokenizer_config.json`** - Tokenizer configuration
- **`vocab.txt`** - Vocabulary file
- **`special_tokens_map.json`** - Special tokens mapping
### Configuration Files
- **`config.json`** - Model architecture configuration
- **`metrics.yaml`** - Training and validation metrics
### Documentation
- **`README.md`** - Comprehensive model card and documentation
- **`model_card.json`** - Machine-readable model metadata
- **`requirements.txt`** - Python dependencies
- **`.gitattributes`** - Git LFS configuration for large files
### Code Examples
- **`inference_example.py`** - Interactive demo and usage examples
- **`UPLOAD_GUIDE.md`** - This file
## πŸš€ Upload Steps
### Option 1: Using Hugging Face CLI (Recommended)
```bash
# Install Hugging Face CLI
pip install huggingface-hub
# Login to Hugging Face
huggingface-cli login
# Navigate to the model folder
cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou
# Create repository (replace YOUR_USERNAME with your HF username)
huggingface-cli repo create turnlet-bert-multilingual-eou --type model
# Initialize git and git-lfs
git init
git lfs install
git lfs track "*.onnx"
git lfs track "*.safetensors"
# Add all files
git add .
# Commit
git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"
# Add remote (replace YOUR_USERNAME)
git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou
# Push to Hugging Face
git push -u origin main
```
### Option 2: Using Python API
```python
from huggingface_hub import HfApi, create_repo
# Initialize API
api = HfApi()
# Login (you'll be prompted for token)
from huggingface_hub import login
login()
# Create repository
repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
create_repo(repo_id, repo_type="model", exist_ok=True)
# Upload folder
api.upload_folder(
folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
repo_id=repo_id,
repo_type="model",
)
print(f"βœ… Model uploaded to: https://huggingface.co/{repo_id}")
```
### Option 3: Manual Upload via Web Interface
1. Go to https://huggingface.co/new
2. Create a new model repository: `turnlet-bert-multilingual-eou`
3. Use the web interface to upload files:
- Upload large files (`.onnx`, `.safetensors`) via Git LFS
- Upload smaller files directly via web interface
4. Copy the README.md content to the model card
## ⚠️ Important Notes
### Git LFS Required
The model files are large and require Git LFS (Large File Storage):
- Make sure Git LFS is installed: `git lfs install`
- The `.gitattributes` file is already configured
- Files tracked: `*.onnx`, `*.safetensors`
### File Sizes
- Total repository size: ~1.2 GB
- Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
- Recommended for deployment: INT8 ONNX (132 MB)
### Model Naming
Consider these naming conventions:
- `YOUR_USERNAME/turnlet-bert-multilingual-eou`
- `YOUR_ORG/turnlet-eou-detection-multilingual`
- `YOUR_USERNAME/distilbert-eou-en-hi-es`
### Tags to Add
When creating the repository, add these tags:
- `end-of-utterance`
- `eou-detection`
- `multilingual`
- `distilbert`
- `onnx`
- `quantized`
- `conversational-ai`
- `dialogue`
- `turn-taking`
- `text-classification`
## πŸ§ͺ Testing After Upload
After uploading, test the model:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Test loading
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
# Quick test
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(f"βœ… Model loaded and working! Logits: {outputs.logits}")
```
## πŸ“ Post-Upload Checklist
After successful upload:
- [ ] Verify all files are uploaded
- [ ] Test model loading via transformers
- [ ] Test ONNX model download
- [ ] Update README with correct username/repo paths
- [ ] Add license information
- [ ] Add model tags and metadata
- [ ] Test interactive script
- [ ] Share on social media/communities
## πŸ”— Useful Links
- Hugging Face Hub Documentation: https://huggingface.co/docs/hub
- Git LFS: https://git-lfs.github.com/
- Model Cards Guide: https://huggingface.co/docs/hub/model-cards
- ONNX Models: https://huggingface.co/docs/hub/onnx
## πŸ’‘ Tips
1. **Use descriptive commit messages** when updating the model
2. **Version your models** by creating tags (v1.0, v2.0, etc.)
3. **Monitor downloads** via your Hugging Face dashboard
4. **Respond to community questions** in the community tab
5. **Update metrics** as you improve the model
## πŸ†˜ Troubleshooting
### Git LFS Bandwidth Issues
If you hit LFS bandwidth limits:
- Use smaller model variant first
- Upload during off-peak hours
- Consider Hugging Face Pro for more bandwidth
### Authentication Issues
```bash
# Re-login
huggingface-cli login --token YOUR_TOKEN
# Or set token as environment variable
export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN
```
### Large File Upload Timeout
```bash
# Increase timeout
git config http.postBuffer 524288000
git config http.lowSpeedLimit 0
git config http.lowSpeedTime 999999
```
## βœ… Ready to Upload!
Your model is fully prepared and ready for upload to Hugging Face! πŸŽ‰