lvj commited on
Commit
70f6c69
·
verified ·
1 Parent(s): 1aac800

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,11 +19,11 @@ pipeline_tag: text-generation
19
 
20
  Phi4-mini is quantized by the PyTorch team using an algorithm in torchao called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The model has 3-bit weight linears, 4-bit embeddings, and 8-bit dynamic activations. It is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
21
 
22
- We provide the quantized pte for direct use in ExecuTorch. (The provided pte file is exported with a max_context_length of 1024. If you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
23
 
24
  # Running in a Mobile App
25
 
26
- The pte file can be run with ExecuTorch on a mobile phone. See the [instructions](https://docs.pytorch.org/executorch/0.7/llm/llama-demo-ios.html) for doing this in iOS. On iPhone 15 Pro, the model runs at 22 tokens/sec and uses 1827 Mb of memory.
27
 
28
  # Quantization Recipe
29
 
 
19
 
20
  Phi4-mini is quantized by the PyTorch team using an algorithm in torchao called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The model has 3-bit weight linears, 4-bit embeddings, and 8-bit dynamic activations. It is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
21
 
22
+ We provide the quantized pte for direct use in ExecuTorch. (The provided pte file is exported with a `max_context_length` of 1024. If you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
23
 
24
  # Running in a Mobile App
25
 
26
+ The pte file can be run with ExecuTorch on a mobile phone. See the [instructions](https://docs.pytorch.org/executorch/0.7/llm/llama-demo-ios.html) for doing this in iOS. On iPhone 15 Pro, the model runs at 22 tokens/second and uses 1827 Mb of memory.
27
 
28
  # Quantization Recipe
29