Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -53,10 +53,18 @@ LFM2-2.6B-Transcript is optimized for processing and summarizing meeting transcr
|
|
| 53 |
|
| 54 |
```
|
| 55 |
onnx/
|
| 56 |
-
βββ model.onnx # FP32
|
| 57 |
-
βββ
|
| 58 |
-
βββ
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
```
|
| 61 |
|
| 62 |
## Python
|
|
@@ -80,7 +88,12 @@ from transformers import AutoTokenizer
|
|
| 80 |
# Download model (Q4 recommended)
|
| 81 |
model_id = "LiquidAI/LFM2-2.6B-Transcript-ONNX"
|
| 82 |
model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
# Load model and tokenizer
|
| 86 |
session = ort.InferenceSession(model_path)
|
|
@@ -148,6 +161,14 @@ print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
|
|
| 148 |
npm install @huggingface/transformers
|
| 149 |
```
|
| 150 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
### Inference
|
| 152 |
|
| 153 |
```javascript
|
|
@@ -183,7 +204,6 @@ console.log(tokenizer.decode(output[0], { skip_special_tokens: true }));
|
|
| 183 |
|
| 184 |
### WebGPU Notes
|
| 185 |
|
| 186 |
-
- Enable WebGPU: `chrome://flags/#enable-unsafe-webgpu`
|
| 187 |
- Supported: Q4, FP16 (Q8 not supported on WebGPU)
|
| 188 |
|
| 189 |
## License
|
|
|
|
| 53 |
|
| 54 |
```
|
| 55 |
onnx/
|
| 56 |
+
βββ model.onnx # FP32 model graph
|
| 57 |
+
βββ model.onnx_data* # FP32 weights
|
| 58 |
+
βββ model_fp16.onnx # FP16 model graph
|
| 59 |
+
βββ model_fp16.onnx_data* # FP16 weights
|
| 60 |
+
βββ model_q4.onnx # Q4 model graph (recommended)
|
| 61 |
+
βββ model_q4.onnx_data # Q4 weights
|
| 62 |
+
βββ model_q8.onnx # Q8 model graph
|
| 63 |
+
βββ model_q8.onnx_data # Q8 weights
|
| 64 |
+
|
| 65 |
+
* Large models (>2GB) split weights across multiple files:
|
| 66 |
+
model.onnx_data, model.onnx_data_1, model.onnx_data_2, etc.
|
| 67 |
+
All data files must be in the same directory as the .onnx file.
|
| 68 |
```
|
| 69 |
|
| 70 |
## Python
|
|
|
|
| 88 |
# Download model (Q4 recommended)
|
| 89 |
model_id = "LiquidAI/LFM2-2.6B-Transcript-ONNX"
|
| 90 |
model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
|
| 91 |
+
|
| 92 |
+
# Download all data files (handles multiple splits for large models)
|
| 93 |
+
from huggingface_hub import list_repo_files
|
| 94 |
+
for f in list_repo_files(model_id):
|
| 95 |
+
if f.startswith("onnx/model_q4.onnx_data"):
|
| 96 |
+
hf_hub_download(model_id, f)
|
| 97 |
|
| 98 |
# Load model and tokenizer
|
| 99 |
session = ort.InferenceSession(model_path)
|
|
|
|
| 161 |
npm install @huggingface/transformers
|
| 162 |
```
|
| 163 |
|
| 164 |
+
### Enable WebGPU
|
| 165 |
+
|
| 166 |
+
WebGPU is required for browser inference. To enable:
|
| 167 |
+
|
| 168 |
+
1. **Chrome/Edge**: Navigate to `chrome://flags/#enable-unsafe-webgpu`, enable, and restart
|
| 169 |
+
2. **Verify**: Check `chrome://gpu` for "WebGPU" status
|
| 170 |
+
3. **Test**: Run `navigator.gpu.requestAdapter()` in DevTools console
|
| 171 |
+
|
| 172 |
### Inference
|
| 173 |
|
| 174 |
```javascript
|
|
|
|
| 204 |
|
| 205 |
### WebGPU Notes
|
| 206 |
|
|
|
|
| 207 |
- Supported: Q4, FP16 (Q8 not supported on WebGPU)
|
| 208 |
|
| 209 |
## License
|