turnlet-bert-multilingual-eou / UPLOAD_GUIDE.md

Estonel

Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants

f70597d verified about 2 months ago

preview code

raw

history blame contribute delete

5.89 kB

Hugging Face Upload Guide

This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.

📦 Package Contents

This folder contains everything needed for a complete Hugging Face model repository:

Model Files

model.safetensors (517 MB) - PyTorch model weights in safetensors format
bert_model_optimized.onnx (517 MB) - Optimized ONNX model (FP32)
bert_model_optimized_dynamic_int8.onnx (132 MB) - ⭐ Quantized ONNX model (INT8, recommended)

Tokenizer Files

tokenizer.json - Fast tokenizer
tokenizer_config.json - Tokenizer configuration
vocab.txt - Vocabulary file
special_tokens_map.json - Special tokens mapping

Configuration Files

config.json - Model architecture configuration
metrics.yaml - Training and validation metrics

Documentation

README.md - Comprehensive model card and documentation
model_card.json - Machine-readable model metadata
requirements.txt - Python dependencies
.gitattributes - Git LFS configuration for large files

Code Examples

inference_example.py - Interactive demo and usage examples
UPLOAD_GUIDE.md - This file

🚀 Upload Steps

Option 1: Using Hugging Face CLI (Recommended)

# Install Hugging Face CLI
pip install huggingface-hub

# Login to Hugging Face
huggingface-cli login

# Navigate to the model folder
cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou

# Create repository (replace YOUR_USERNAME with your HF username)
huggingface-cli repo create turnlet-bert-multilingual-eou --type model

# Initialize git and git-lfs
git init
git lfs install
git lfs track "*.onnx"
git lfs track "*.safetensors"

# Add all files
git add .

# Commit
git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"

# Add remote (replace YOUR_USERNAME)
git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou

# Push to Hugging Face
git push -u origin main

Option 2: Using Python API

from huggingface_hub import HfApi, create_repo

# Initialize API
api = HfApi()

# Login (you'll be prompted for token)
from huggingface_hub import login
login()

# Create repository
repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
create_repo(repo_id, repo_type="model", exist_ok=True)

# Upload folder
api.upload_folder(
    folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
    repo_id=repo_id,
    repo_type="model",
)

print(f"✅ Model uploaded to: https://huggingface.co/{repo_id}")

Option 3: Manual Upload via Web Interface

Go to https://huggingface.co/new
Create a new model repository: turnlet-bert-multilingual-eou
Use the web interface to upload files:
- Upload large files (.onnx, .safetensors) via Git LFS
- Upload smaller files directly via web interface
Copy the README.md content to the model card

⚠️ Important Notes

Git LFS Required

The model files are large and require Git LFS (Large File Storage):

Make sure Git LFS is installed: git lfs install
The .gitattributes file is already configured
Files tracked: *.onnx, *.safetensors

File Sizes

Total repository size: ~1.2 GB
Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
Recommended for deployment: INT8 ONNX (132 MB)

Model Naming

Consider these naming conventions:

YOUR_USERNAME/turnlet-bert-multilingual-eou
YOUR_ORG/turnlet-eou-detection-multilingual
YOUR_USERNAME/distilbert-eou-en-hi-es

Tags to Add

When creating the repository, add these tags:

end-of-utterance
eou-detection
multilingual
distilbert
onnx
quantized
conversational-ai
dialogue
turn-taking
text-classification

🧪 Testing After Upload

After uploading, test the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Test loading
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")

# Quick test
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(f"✅ Model loaded and working! Logits: {outputs.logits}")

📝 Post-Upload Checklist

After successful upload:

Verify all files are uploaded
Test model loading via transformers
Test ONNX model download
Update README with correct username/repo paths
Add license information
Add model tags and metadata
Test interactive script
Share on social media/communities

🔗 Useful Links

Hugging Face Hub Documentation: https://huggingface.co/docs/hub
Git LFS: https://git-lfs.github.com/
Model Cards Guide: https://huggingface.co/docs/hub/model-cards
ONNX Models: https://huggingface.co/docs/hub/onnx

💡 Tips

Use descriptive commit messages when updating the model
Version your models by creating tags (v1.0, v2.0, etc.)
Monitor downloads via your Hugging Face dashboard
Respond to community questions in the community tab
Update metrics as you improve the model

🆘 Troubleshooting

Git LFS Bandwidth Issues

If you hit LFS bandwidth limits:

Use smaller model variant first
Upload during off-peak hours
Consider Hugging Face Pro for more bandwidth

Authentication Issues

# Re-login
huggingface-cli login --token YOUR_TOKEN

# Or set token as environment variable
export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN

Large File Upload Timeout

# Increase timeout
git config http.postBuffer 524288000
git config http.lowSpeedLimit 0
git config http.lowSpeedTime 999999

✅ Ready to Upload!

Your model is fully prepared and ready for upload to Hugging Face! 🎉