File size: 5,887 Bytes
f70597d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# Hugging Face Upload Guide

This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.

## πŸ“¦ Package Contents

This folder contains everything needed for a complete Hugging Face model repository:

### Model Files
- **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format
- **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32)
- **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - ⭐ Quantized ONNX model (INT8, recommended)

### Tokenizer Files
- **`tokenizer.json`** - Fast tokenizer
- **`tokenizer_config.json`** - Tokenizer configuration
- **`vocab.txt`** - Vocabulary file
- **`special_tokens_map.json`** - Special tokens mapping

### Configuration Files
- **`config.json`** - Model architecture configuration
- **`metrics.yaml`** - Training and validation metrics

### Documentation
- **`README.md`** - Comprehensive model card and documentation
- **`model_card.json`** - Machine-readable model metadata
- **`requirements.txt`** - Python dependencies
- **`.gitattributes`** - Git LFS configuration for large files

### Code Examples
- **`inference_example.py`** - Interactive demo and usage examples
- **`UPLOAD_GUIDE.md`** - This file

## πŸš€ Upload Steps

### Option 1: Using Hugging Face CLI (Recommended)

```bash
# Install Hugging Face CLI
pip install huggingface-hub

# Login to Hugging Face
huggingface-cli login

# Navigate to the model folder
cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou

# Create repository (replace YOUR_USERNAME with your HF username)
huggingface-cli repo create turnlet-bert-multilingual-eou --type model

# Initialize git and git-lfs
git init
git lfs install
git lfs track "*.onnx"
git lfs track "*.safetensors"

# Add all files
git add .

# Commit
git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"

# Add remote (replace YOUR_USERNAME)
git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou

# Push to Hugging Face
git push -u origin main
```

### Option 2: Using Python API

```python
from huggingface_hub import HfApi, create_repo

# Initialize API
api = HfApi()

# Login (you'll be prompted for token)
from huggingface_hub import login
login()

# Create repository
repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
create_repo(repo_id, repo_type="model", exist_ok=True)

# Upload folder
api.upload_folder(
    folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
    repo_id=repo_id,
    repo_type="model",
)

print(f"βœ… Model uploaded to: https://huggingface.co/{repo_id}")
```

### Option 3: Manual Upload via Web Interface

1. Go to https://huggingface.co/new
2. Create a new model repository: `turnlet-bert-multilingual-eou`
3. Use the web interface to upload files:
   - Upload large files (`.onnx`, `.safetensors`) via Git LFS
   - Upload smaller files directly via web interface
4. Copy the README.md content to the model card

## ⚠️ Important Notes

### Git LFS Required
The model files are large and require Git LFS (Large File Storage):
- Make sure Git LFS is installed: `git lfs install`
- The `.gitattributes` file is already configured
- Files tracked: `*.onnx`, `*.safetensors`

### File Sizes
- Total repository size: ~1.2 GB
- Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
- Recommended for deployment: INT8 ONNX (132 MB)

### Model Naming
Consider these naming conventions:
- `YOUR_USERNAME/turnlet-bert-multilingual-eou`
- `YOUR_ORG/turnlet-eou-detection-multilingual`
- `YOUR_USERNAME/distilbert-eou-en-hi-es`

### Tags to Add
When creating the repository, add these tags:
- `end-of-utterance`
- `eou-detection`
- `multilingual`
- `distilbert`
- `onnx`
- `quantized`
- `conversational-ai`
- `dialogue`
- `turn-taking`
- `text-classification`

## πŸ§ͺ Testing After Upload

After uploading, test the model:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Test loading
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")

# Quick test
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(f"βœ… Model loaded and working! Logits: {outputs.logits}")
```

## πŸ“ Post-Upload Checklist

After successful upload:

- [ ] Verify all files are uploaded
- [ ] Test model loading via transformers
- [ ] Test ONNX model download
- [ ] Update README with correct username/repo paths
- [ ] Add license information
- [ ] Add model tags and metadata
- [ ] Test interactive script
- [ ] Share on social media/communities

## πŸ”— Useful Links

- Hugging Face Hub Documentation: https://huggingface.co/docs/hub
- Git LFS: https://git-lfs.github.com/
- Model Cards Guide: https://huggingface.co/docs/hub/model-cards
- ONNX Models: https://huggingface.co/docs/hub/onnx

## πŸ’‘ Tips

1. **Use descriptive commit messages** when updating the model
2. **Version your models** by creating tags (v1.0, v2.0, etc.)
3. **Monitor downloads** via your Hugging Face dashboard
4. **Respond to community questions** in the community tab
5. **Update metrics** as you improve the model

## πŸ†˜ Troubleshooting

### Git LFS Bandwidth Issues
If you hit LFS bandwidth limits:
- Use smaller model variant first
- Upload during off-peak hours
- Consider Hugging Face Pro for more bandwidth

### Authentication Issues
```bash
# Re-login
huggingface-cli login --token YOUR_TOKEN

# Or set token as environment variable
export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN
```

### Large File Upload Timeout
```bash
# Increase timeout
git config http.postBuffer 524288000
git config http.lowSpeedLimit 0
git config http.lowSpeedTime 999999
```

## βœ… Ready to Upload!

Your model is fully prepared and ready for upload to Hugging Face! πŸŽ‰