Estonel's picture
Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants
f70597d verified

Hugging Face Upload Guide

This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.

πŸ“¦ Package Contents

This folder contains everything needed for a complete Hugging Face model repository:

Model Files

  • model.safetensors (517 MB) - PyTorch model weights in safetensors format
  • bert_model_optimized.onnx (517 MB) - Optimized ONNX model (FP32)
  • bert_model_optimized_dynamic_int8.onnx (132 MB) - ⭐ Quantized ONNX model (INT8, recommended)

Tokenizer Files

  • tokenizer.json - Fast tokenizer
  • tokenizer_config.json - Tokenizer configuration
  • vocab.txt - Vocabulary file
  • special_tokens_map.json - Special tokens mapping

Configuration Files

  • config.json - Model architecture configuration
  • metrics.yaml - Training and validation metrics

Documentation

  • README.md - Comprehensive model card and documentation
  • model_card.json - Machine-readable model metadata
  • requirements.txt - Python dependencies
  • .gitattributes - Git LFS configuration for large files

Code Examples

  • inference_example.py - Interactive demo and usage examples
  • UPLOAD_GUIDE.md - This file

πŸš€ Upload Steps

Option 1: Using Hugging Face CLI (Recommended)

# Install Hugging Face CLI
pip install huggingface-hub

# Login to Hugging Face
huggingface-cli login

# Navigate to the model folder
cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou

# Create repository (replace YOUR_USERNAME with your HF username)
huggingface-cli repo create turnlet-bert-multilingual-eou --type model

# Initialize git and git-lfs
git init
git lfs install
git lfs track "*.onnx"
git lfs track "*.safetensors"

# Add all files
git add .

# Commit
git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"

# Add remote (replace YOUR_USERNAME)
git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou

# Push to Hugging Face
git push -u origin main

Option 2: Using Python API

from huggingface_hub import HfApi, create_repo

# Initialize API
api = HfApi()

# Login (you'll be prompted for token)
from huggingface_hub import login
login()

# Create repository
repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
create_repo(repo_id, repo_type="model", exist_ok=True)

# Upload folder
api.upload_folder(
    folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
    repo_id=repo_id,
    repo_type="model",
)

print(f"βœ… Model uploaded to: https://huggingface.co/{repo_id}")

Option 3: Manual Upload via Web Interface

  1. Go to https://huggingface.co/new
  2. Create a new model repository: turnlet-bert-multilingual-eou
  3. Use the web interface to upload files:
    • Upload large files (.onnx, .safetensors) via Git LFS
    • Upload smaller files directly via web interface
  4. Copy the README.md content to the model card

⚠️ Important Notes

Git LFS Required

The model files are large and require Git LFS (Large File Storage):

  • Make sure Git LFS is installed: git lfs install
  • The .gitattributes file is already configured
  • Files tracked: *.onnx, *.safetensors

File Sizes

  • Total repository size: ~1.2 GB
  • Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
  • Recommended for deployment: INT8 ONNX (132 MB)

Model Naming

Consider these naming conventions:

  • YOUR_USERNAME/turnlet-bert-multilingual-eou
  • YOUR_ORG/turnlet-eou-detection-multilingual
  • YOUR_USERNAME/distilbert-eou-en-hi-es

Tags to Add

When creating the repository, add these tags:

  • end-of-utterance
  • eou-detection
  • multilingual
  • distilbert
  • onnx
  • quantized
  • conversational-ai
  • dialogue
  • turn-taking
  • text-classification

πŸ§ͺ Testing After Upload

After uploading, test the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Test loading
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")

# Quick test
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(f"βœ… Model loaded and working! Logits: {outputs.logits}")

πŸ“ Post-Upload Checklist

After successful upload:

  • Verify all files are uploaded
  • Test model loading via transformers
  • Test ONNX model download
  • Update README with correct username/repo paths
  • Add license information
  • Add model tags and metadata
  • Test interactive script
  • Share on social media/communities

πŸ”— Useful Links

πŸ’‘ Tips

  1. Use descriptive commit messages when updating the model
  2. Version your models by creating tags (v1.0, v2.0, etc.)
  3. Monitor downloads via your Hugging Face dashboard
  4. Respond to community questions in the community tab
  5. Update metrics as you improve the model

πŸ†˜ Troubleshooting

Git LFS Bandwidth Issues

If you hit LFS bandwidth limits:

  • Use smaller model variant first
  • Upload during off-peak hours
  • Consider Hugging Face Pro for more bandwidth

Authentication Issues

# Re-login
huggingface-cli login --token YOUR_TOKEN

# Or set token as environment variable
export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN

Large File Upload Timeout

# Increase timeout
git config http.postBuffer 524288000
git config http.lowSpeedLimit 0
git config http.lowSpeedTime 999999

βœ… Ready to Upload!

Your model is fully prepared and ready for upload to Hugging Face! πŸŽ‰