ykhrustalev commited on
Commit
dbe7f5c
Β·
verified Β·
1 Parent(s): 19718dc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +26 -6
README.md CHANGED
@@ -53,10 +53,18 @@ LFM2-2.6B-Transcript is optimized for processing and summarizing meeting transcr
53
 
54
  ```
55
  onnx/
56
- β”œβ”€β”€ model.onnx # FP32
57
- β”œβ”€β”€ model_fp16.onnx # FP16
58
- β”œβ”€β”€ model_q4.onnx # Q4 (recommended)
59
- └── model_q8.onnx # Q8
 
 
 
 
 
 
 
 
60
  ```
61
 
62
  ## Python
@@ -80,7 +88,12 @@ from transformers import AutoTokenizer
80
  # Download model (Q4 recommended)
81
  model_id = "LiquidAI/LFM2-2.6B-Transcript-ONNX"
82
  model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
83
- data_path = hf_hub_download(model_id, "onnx/model_q4.onnx_data")
 
 
 
 
 
84
 
85
  # Load model and tokenizer
86
  session = ort.InferenceSession(model_path)
@@ -148,6 +161,14 @@ print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
148
  npm install @huggingface/transformers
149
  ```
150
 
 
 
 
 
 
 
 
 
151
  ### Inference
152
 
153
  ```javascript
@@ -183,7 +204,6 @@ console.log(tokenizer.decode(output[0], { skip_special_tokens: true }));
183
 
184
  ### WebGPU Notes
185
 
186
- - Enable WebGPU: `chrome://flags/#enable-unsafe-webgpu`
187
  - Supported: Q4, FP16 (Q8 not supported on WebGPU)
188
 
189
  ## License
 
53
 
54
  ```
55
  onnx/
56
+ β”œβ”€β”€ model.onnx # FP32 model graph
57
+ β”œβ”€β”€ model.onnx_data* # FP32 weights
58
+ β”œβ”€β”€ model_fp16.onnx # FP16 model graph
59
+ β”œβ”€β”€ model_fp16.onnx_data* # FP16 weights
60
+ β”œβ”€β”€ model_q4.onnx # Q4 model graph (recommended)
61
+ β”œβ”€β”€ model_q4.onnx_data # Q4 weights
62
+ β”œβ”€β”€ model_q8.onnx # Q8 model graph
63
+ └── model_q8.onnx_data # Q8 weights
64
+
65
+ * Large models (>2GB) split weights across multiple files:
66
+ model.onnx_data, model.onnx_data_1, model.onnx_data_2, etc.
67
+ All data files must be in the same directory as the .onnx file.
68
  ```
69
 
70
  ## Python
 
88
  # Download model (Q4 recommended)
89
  model_id = "LiquidAI/LFM2-2.6B-Transcript-ONNX"
90
  model_path = hf_hub_download(model_id, "onnx/model_q4.onnx")
91
+
92
+ # Download all data files (handles multiple splits for large models)
93
+ from huggingface_hub import list_repo_files
94
+ for f in list_repo_files(model_id):
95
+ if f.startswith("onnx/model_q4.onnx_data"):
96
+ hf_hub_download(model_id, f)
97
 
98
  # Load model and tokenizer
99
  session = ort.InferenceSession(model_path)
 
161
  npm install @huggingface/transformers
162
  ```
163
 
164
+ ### Enable WebGPU
165
+
166
+ WebGPU is required for browser inference. To enable:
167
+
168
+ 1. **Chrome/Edge**: Navigate to `chrome://flags/#enable-unsafe-webgpu`, enable, and restart
169
+ 2. **Verify**: Check `chrome://gpu` for "WebGPU" status
170
+ 3. **Test**: Run `navigator.gpu.requestAdapter()` in DevTools console
171
+
172
  ### Inference
173
 
174
  ```javascript
 
204
 
205
  ### WebGPU Notes
206
 
 
207
  - Supported: Q4, FP16 (Q8 not supported on WebGPU)
208
 
209
  ## License