Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -19,11 +19,11 @@ pipeline_tag: text-generation
 Phi4-mini is quantized by the PyTorch team using an algorithm in torchao called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The model has 3-bit weight linears, 4-bit embeddings, and 8-bit dynamic activations. It is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
-We provide the quantized pte for direct use in ExecuTorch. (The provided pte file is exported with a max_context_length of 1024. If you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
 # Running in a Mobile App
-The pte file can be run with ExecuTorch on a mobile phone. See the [instructions](https://docs.pytorch.org/executorch/0.7/llm/llama-demo-ios.html) for doing this in iOS. On iPhone 15 Pro, the model runs at 22 tokens/sec and uses 1827 Mb of memory.
 # Quantization Recipe

 Phi4-mini is quantized by the PyTorch team using an algorithm in torchao called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The model has 3-bit weight linears, 4-bit embeddings, and 8-bit dynamic activations. It is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
+We provide the quantized pte for direct use in ExecuTorch. (The provided pte file is exported with a `max_context_length` of 1024. If you wish to change this, re-export the quantized model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
 # Running in a Mobile App
+The pte file can be run with ExecuTorch on a mobile phone. See the [instructions](https://docs.pytorch.org/executorch/0.7/llm/llama-demo-ios.html) for doing this in iOS. On iPhone 15 Pro, the model runs at 22 tokens/second and uses 1827 Mb of memory.
 # Quantization Recipe