LiquidAI
/

LFM2-2.6B-Transcript-ONNX

Text Generation

Model card Files Files and versions

ykhrustalev commited on Jan 9

Commit

dbe7f5c

·

verified ·

1 Parent(s): 19718dc

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +26 -6

README.md CHANGED Viewed

@@ -53,10 +53,18 @@ LFM2-2.6B-Transcript is optimized for processing and summarizing meeting transcr
 ```
 onnx/
-├── model.onnx              # FP32
-├── model_fp16.onnx         # FP16
-├── model_q4.onnx           # Q4 (recommended)
-└── model_q8.onnx           # Q8
 ```
 ## Python
@@ -80,7 +88,12 @@ from transformers import AutoTokenizer
 # Download model (Q4 recommended)
 model_id = "LiquidAI/LFM2-2.6B-Transcript-ONNX"
 model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
-data_path = hf_hub_download(model_id, "onnx/model_q4.onnx_data")
 # Load model and tokenizer
 session = ort.InferenceSession(model_path)
@@ -148,6 +161,14 @@ print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
 npm install @huggingface/transformers
 ```
 ### Inference
 ```javascript
@@ -183,7 +204,6 @@ console.log(tokenizer.decode(output[0], { skip_special_tokens: true }));
 ### WebGPU Notes
-- Enable WebGPU: `chrome://flags/#enable-unsafe-webgpu`
 - Supported: Q4, FP16 (Q8 not supported on WebGPU)
 ## License

 ```
 onnx/
+├── model.onnx              # FP32 model graph
+├── model.onnx_data*        # FP32 weights
+├── model_fp16.onnx         # FP16 model graph
+├── model_fp16.onnx_data*   # FP16 weights
+├── model_q4.onnx           # Q4 model graph (recommended)
+├── model_q4.onnx_data      # Q4 weights
+├── model_q8.onnx           # Q8 model graph
+└── model_q8.onnx_data      # Q8 weights
+* Large models (>2GB) split weights across multiple files:
+  model.onnx_data, model.onnx_data_1, model.onnx_data_2, etc.
+  All data files must be in the same directory as the .onnx file.
 ```
 ## Python
 # Download model (Q4 recommended)
 model_id = "LiquidAI/LFM2-2.6B-Transcript-ONNX"
 model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
+# Download all data files (handles multiple splits for large models)
+from huggingface_hub import list_repo_files
+for f in list_repo_files(model_id):
+    if f.startswith("onnx/model_q4.onnx_data"):
+        hf_hub_download(model_id, f)
 # Load model and tokenizer
 session = ort.InferenceSession(model_path)
 npm install @huggingface/transformers
 ```
+### Enable WebGPU
+WebGPU is required for browser inference. To enable:
+1. **Chrome/Edge**: Navigate to `chrome://flags/#enable-unsafe-webgpu`, enable, and restart
+2. **Verify**: Check `chrome://gpu` for "WebGPU" status
+3. **Test**: Run `navigator.gpu.requestAdapter()` in DevTools console
 ### Inference
 ```javascript
 ### WebGPU Notes
 - Supported: Q4, FP16 (Q8 not supported on WebGPU)
 ## License