Update README.md
Browse files
README.md
CHANGED
|
@@ -19,11 +19,11 @@ pipeline_tag: text-generation
|
|
| 19 |
|
| 20 |
Phi4-mini is quantized by the PyTorch team using an algorithm in torchao called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The model has 3-bit weight linears, 4-bit embeddings, and 8-bit dynamic activations. It is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
|
| 21 |
|
| 22 |
-
We provide the quantized pte for direct use in ExecuTorch. (The provided pte file is exported with a max_context_length of 1024. If you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
|
| 23 |
|
| 24 |
# Running in a Mobile App
|
| 25 |
|
| 26 |
-
The pte file can be run with ExecuTorch on a mobile phone. See the [instructions](https://docs.pytorch.org/executorch/0.7/llm/llama-demo-ios.html) for doing this in iOS. On iPhone 15 Pro, the model runs at 22 tokens/
|
| 27 |
|
| 28 |
# Quantization Recipe
|
| 29 |
|
|
|
|
| 19 |
|
| 20 |
Phi4-mini is quantized by the PyTorch team using an algorithm in torchao called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The model has 3-bit weight linears, 4-bit embeddings, and 8-bit dynamic activations. It is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
|
| 21 |
|
| 22 |
+
We provide the quantized pte for direct use in ExecuTorch. (The provided pte file is exported with a `max_context_length` of 1024. If you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
|
| 23 |
|
| 24 |
# Running in a Mobile App
|
| 25 |
|
| 26 |
+
The pte file can be run with ExecuTorch on a mobile phone. See the [instructions](https://docs.pytorch.org/executorch/0.7/llm/llama-demo-ios.html) for doing this in iOS. On iPhone 15 Pro, the model runs at 22 tokens/second and uses 1827 Mb of memory.
|
| 27 |
|
| 28 |
# Quantization Recipe
|
| 29 |
|