File size: 4,404 Bytes
4f83a5e
b0ec386
119e403
6b0d1fe
b0ec386
 
 
 
 
 
6b0d1fe
 
b0ec386
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e053681
b0ec386
 
 
 
 
 
 
65460de
17fbbe7
b0ec386
 
 
af97bfd
b0ec386
 
 
 
af97bfd
b0ec386
 
65460de
b0ec386
 
 
 
 
 
 
 
07b52fd
b0ec386
 
 
 
 
07b52fd
65460de
b0ec386
65460de
b0ec386
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b0d1fe
b0ec386
6b0d1fe
65460de
 
 
 
 
 
b0ec386
6b0d1fe
b0ec386
 
 
 
 
6b0d1fe
b0ec386
6b0d1fe
b0ec386
6b0d1fe
b0ec386
 
 
 
 
 
 
 
 
 
65460de
6b0d1fe
b0ec386
6b0d1fe
b0ec386
6b0d1fe
 
b0ec386
6b0d1fe
b0ec386
 
 
6b0d1fe
b0ec386
6b0d1fe
b0ec386
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
library_name: llima
license: mit
tags:
- llm
- generative_ai
- embedded
- sima
pipeline_tag: text-generation
base_model: microsoft/Phi-3.5-mini-instruct
---

# Phi-3.5-mini-instruct: Optimized for SiMa.ai Modalix

## Overview

This repository contains the **Phi-3.5-mini-instruct** model, optimized and compiled for the **SiMa.ai Modalix** platform.

- **Model Architecture:** Phi-3.5 Mini (3.8B parameters)
- **Quantization:** Hybrid
  - **Prompt Processing:** A16W8 (16-bit activations, 8-bit weights)
  - **Token Generation:** A16W4 (16-bit activations, 4-bit weights)
- **Maximum context length:** 2048
- **Source Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)

## Performance

The following performance metrics were measured with an input sequence length of 128 tokens.

| Model | Precision | Device | Response Rate (tokens/sec) | Time To First Token (sec) |
|---|---|---|---|---|
| Phi-3.5-mini-instruct | A16W8/A16W4 | Modalix | 16.5 tokens/sec| 0.15 sec|


## Prerequisites

To run this model, you need:

1.  **SiMa.ai Modalix Device**
2.  **SiMa.ai CLI**: [Installed](https://docs.sima.ai/pages/sima_cli/main.html#installation) on your Modalix device.
3.  **Hugging Face CLI**: For downloading the model.

## Installation & Deployment

Follow these steps to deploy the model to your Modalix device.

### 1. Install LLiMa Demo Application
> **Note:** This is a **one-time setup**. If you have already installed the LLiMa demo application (e.g. for another model), you can skip this step and continue with model download.

On your Modalix device, install the LLiMa demo application using the `sima-cli`:

```bash
# Create a directory for LLiMa
cd /media/nvme
mkdir llima
cd llima

# Install the LLiMa runtime code
sima-cli install -v 2.0.0 samples/llima -t select
```
> **Note:** To only download the LLiMa runtime code, select **🚫 Skip** when prompted.

### 2. Download the Model

Download the compiled model assets from this repository directly to your device.

```bash
# Download the model to a local directory
cd /media/nvme/llima
hf download simaai/Phi-3.5-mini-instruct-a16w4 --local-dir Phi-3.5-mini-instruct-a16w4
```

Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:

```bash
hf download simaai/Phi-3.5-mini-instruct-a16w4 --local-dir Phi-3.5-mini-instruct-a16w4
scp -r Phi-3.5-mini-instruct-a16w4 sima@<modalix-ip>:/media/nvme/llima/
```
*Replace \<modalix-ip\> with the IP address of your Modalix device.*

**Expected Directory Structure:**

```text
/media/nvme/llima/
├── simaai-genai-demo/   # The demo app
└── Phi-3.5-mini-instruct-a16w4/        # Your downloaded model
```

## Usage

### Run the Application

Navigate to the demo directory and start the application:

```bash
cd /media/nvme/llima/simaai-genai-demo
./run.sh
```

The script will detect the installed model(s) and prompt you to select one.

Once the application is running, open a browser and navigate to:
```text
https://<modalix-ip>:5000/
```
*Replace \<modalix-ip\> with the IP address of your Modalix device.*

### API Usage

To use OpenAI-compatible API, run the model in API mode:
```bash
cd /media/nvme/llima/simaai-genai-demo
./run.sh --httponly --api-only
```

You can interact with it using `curl` or Python.

**Example: Chat Completion**

```bash
curl -N -k -X POST "https://<modalix-ip>:5000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "Why is the sky blue?" }
    ],
    "stream": true
  }'
```
*Replace \<modalix-ip\> with the IP address of your Modalix device.*

## Limitations

- **Quantization**: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.


## Troubleshooting

- **`sima-cli` not found**: Ensure that sima-cli is installed on your Modalix device.
- **Model can't be run**: Verify the model directory is exactly inside `/media/nvme/llima/` and not nested (e.g., `/media/nvme/llima/Phi-3.5-mini-instruct-a16w4/Phi-3.5-mini-instruct-a16w4`).
- **Permission Denied**: Ensure you have read/write permissions for the `/media/nvme` directory.

## Resources

- [SiMa.ai Documentation](https://docs.sima.ai)
- [SiMa.ai Hugging Face Organization](https://huggingface.co/simaai)