File size: 4,404 Bytes
4f83a5e b0ec386 119e403 6b0d1fe b0ec386 6b0d1fe b0ec386 e053681 b0ec386 65460de 17fbbe7 b0ec386 af97bfd b0ec386 af97bfd b0ec386 65460de b0ec386 07b52fd b0ec386 07b52fd 65460de b0ec386 65460de b0ec386 6b0d1fe b0ec386 6b0d1fe 65460de b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 65460de 6b0d1fe b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 6b0d1fe b0ec386 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | ---
library_name: llima
license: mit
tags:
- llm
- generative_ai
- embedded
- sima
pipeline_tag: text-generation
base_model: microsoft/Phi-3.5-mini-instruct
---
# Phi-3.5-mini-instruct: Optimized for SiMa.ai Modalix
## Overview
This repository contains the **Phi-3.5-mini-instruct** model, optimized and compiled for the **SiMa.ai Modalix** platform.
- **Model Architecture:** Phi-3.5 Mini (3.8B parameters)
- **Quantization:** Hybrid
- **Prompt Processing:** A16W8 (16-bit activations, 8-bit weights)
- **Token Generation:** A16W4 (16-bit activations, 4-bit weights)
- **Maximum context length:** 2048
- **Source Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
## Performance
The following performance metrics were measured with an input sequence length of 128 tokens.
| Model | Precision | Device | Response Rate (tokens/sec) | Time To First Token (sec) |
|---|---|---|---|---|
| Phi-3.5-mini-instruct | A16W8/A16W4 | Modalix | 16.5 tokens/sec| 0.15 sec|
## Prerequisites
To run this model, you need:
1. **SiMa.ai Modalix Device**
2. **SiMa.ai CLI**: [Installed](https://docs.sima.ai/pages/sima_cli/main.html#installation) on your Modalix device.
3. **Hugging Face CLI**: For downloading the model.
## Installation & Deployment
Follow these steps to deploy the model to your Modalix device.
### 1. Install LLiMa Demo Application
> **Note:** This is a **one-time setup**. If you have already installed the LLiMa demo application (e.g. for another model), you can skip this step and continue with model download.
On your Modalix device, install the LLiMa demo application using the `sima-cli`:
```bash
# Create a directory for LLiMa
cd /media/nvme
mkdir llima
cd llima
# Install the LLiMa runtime code
sima-cli install -v 2.0.0 samples/llima -t select
```
> **Note:** To only download the LLiMa runtime code, select **🚫 Skip** when prompted.
### 2. Download the Model
Download the compiled model assets from this repository directly to your device.
```bash
# Download the model to a local directory
cd /media/nvme/llima
hf download simaai/Phi-3.5-mini-instruct-a16w4 --local-dir Phi-3.5-mini-instruct-a16w4
```
Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:
```bash
hf download simaai/Phi-3.5-mini-instruct-a16w4 --local-dir Phi-3.5-mini-instruct-a16w4
scp -r Phi-3.5-mini-instruct-a16w4 sima@<modalix-ip>:/media/nvme/llima/
```
*Replace \<modalix-ip\> with the IP address of your Modalix device.*
**Expected Directory Structure:**
```text
/media/nvme/llima/
├── simaai-genai-demo/ # The demo app
└── Phi-3.5-mini-instruct-a16w4/ # Your downloaded model
```
## Usage
### Run the Application
Navigate to the demo directory and start the application:
```bash
cd /media/nvme/llima/simaai-genai-demo
./run.sh
```
The script will detect the installed model(s) and prompt you to select one.
Once the application is running, open a browser and navigate to:
```text
https://<modalix-ip>:5000/
```
*Replace \<modalix-ip\> with the IP address of your Modalix device.*
### API Usage
To use OpenAI-compatible API, run the model in API mode:
```bash
cd /media/nvme/llima/simaai-genai-demo
./run.sh --httponly --api-only
```
You can interact with it using `curl` or Python.
**Example: Chat Completion**
```bash
curl -N -k -X POST "https://<modalix-ip>:5000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "Why is the sky blue?" }
],
"stream": true
}'
```
*Replace \<modalix-ip\> with the IP address of your Modalix device.*
## Limitations
- **Quantization**: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
## Troubleshooting
- **`sima-cli` not found**: Ensure that sima-cli is installed on your Modalix device.
- **Model can't be run**: Verify the model directory is exactly inside `/media/nvme/llima/` and not nested (e.g., `/media/nvme/llima/Phi-3.5-mini-instruct-a16w4/Phi-3.5-mini-instruct-a16w4`).
- **Permission Denied**: Ensure you have read/write permissions for the `/media/nvme` directory.
## Resources
- [SiMa.ai Documentation](https://docs.sima.ai)
- [SiMa.ai Hugging Face Organization](https://huggingface.co/simaai) |