Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Bringing SOTA quantization to mobile LLM deployment: A practical Executorch integration guide
|
| 2 |
+
|
| 3 |
+
Article: <\TODO>
|
| 4 |
+
|
| 5 |
+
## Usage
|
| 6 |
+
|
| 7 |
+
- Download and install the `.apk` file on your Android phone.
|
| 8 |
+
- Download the `.pte` and `.model` files and put them into the `/data/local/tmp/llama` folder on your Android phone.
|
| 9 |
+
- Running the app you will see the option to load the `.pte` and `.model` files. After loading them, you'll be able to chat with the model.
|
| 10 |
+
|
| 11 |
+
## Requirements
|
| 12 |
+
|
| 13 |
+
This app was tested on `Samsung S24 Ultra` running `Android 14`.
|
| 14 |
+
|
| 15 |
+
## Limitations
|
| 16 |
+
|
| 17 |
+
- Although the app looks like chat, generation requests are independent.
|
| 18 |
+
- Llama-3 chat template is hard-coded into the app.
|