Update README.md
Browse files
README.md
CHANGED
|
@@ -169,10 +169,14 @@ Exporting to ExecuTorch requires you clone and install [ExecuTorch](https://gith
|
|
| 169 |
|
| 170 |
|
| 171 |
## Convert quantized checkpoint to ExecuTorch's format
|
|
|
|
|
|
|
| 172 |
```
|
| 173 |
python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
|
| 174 |
```
|
| 175 |
|
|
|
|
|
|
|
| 176 |
## Export to an ExecuTorch *.pte with XNNPACK
|
| 177 |
```
|
| 178 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
|
@@ -188,7 +192,7 @@ python -m executorch.examples.models.llama.export_llama \
|
|
| 188 |
```
|
| 189 |
|
| 190 |
## Running in a mobile app
|
| 191 |
-
The
|
| 192 |
On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
| 193 |
|
| 194 |

|
|
|
|
| 169 |
|
| 170 |
|
| 171 |
## Convert quantized checkpoint to ExecuTorch's format
|
| 172 |
+
|
| 173 |
+
ExecuTorch expects the checkpoint keys to have certain names in order to export. The following script converts the quantized checkpoint from Hugging Face to the one ExecuTorch expects.
|
| 174 |
```
|
| 175 |
python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
|
| 176 |
```
|
| 177 |
|
| 178 |
+
Once the checkpoint is converted, we can export to ExecuTorch's PTE format.
|
| 179 |
+
|
| 180 |
## Export to an ExecuTorch *.pte with XNNPACK
|
| 181 |
```
|
| 182 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
|
|
|
| 192 |
```
|
| 193 |
|
| 194 |
## Running in a mobile app
|
| 195 |
+
The PTE file can be run with ExecuTorch. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
| 196 |
On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
| 197 |
|
| 198 |

|