florianvoss commited on
Commit
0ee9af7
·
verified ·
1 Parent(s): 3e9d434

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +146 -0
README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: llima
3
+ license: apache-2.0
4
+ tags:
5
+ - llm
6
+ - generative_ai
7
+ - embedded
8
+ - sima
9
+ pipeline_tag: text-generation
10
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
11
+ ---
12
+
13
+ # Mistral-7B-Instruct-v0.3: Optimized for SiMa.ai Modalix
14
+
15
+ ## Overview
16
+
17
+ This repository contains the **Mistral-7B-Instruct-v0.3** model, optimized and compiled for the **SiMa.ai Modalix** platform.
18
+
19
+ - **Model Architecture:** Mistral-7B-Instruct-v0.3 (7B parameters)
20
+ - **Quantization:** Hybrid
21
+ - **Prompt Processing:** A16W8 (16-bit activations, 8-bit weights)
22
+ - **Token Generation:** A16W4 (16-bit activations, 4-bit weights)
23
+ - **Maximum context length:** 2048
24
+ - **Source Model:** [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
25
+
26
+ ## Performance
27
+
28
+ The following performance metrics were measured with an input sequence length of 128 tokens.
29
+
30
+ | Model | Precision | Device | Response Rate (tokens/sec) | Time To First Token (sec) |
31
+ |:---:|:---:|:---:|:---:|:---:|
32
+ | Mistral-7B-Instruct-v0.3 | A16W8/A16W4 | Modalix | 10.8 tokens/sec | 0.35 sec |
33
+
34
+
35
+ ## Prerequisites
36
+
37
+ To run this model, you need:
38
+
39
+ 1. **SiMa.ai Modalix Device**
40
+ 2. **SiMa.ai CLI**: [Installed](https://docs.sima.ai/pages/sima_cli/main.html#installation) on your Modalix device.
41
+ 3. **Hugging Face CLI**: For downloading the model.
42
+
43
+ ## Installation & Deployment
44
+
45
+ Follow these steps to deploy the model to your Modalix device.
46
+
47
+ ### 1. Install LLiMa Demo Application
48
+ > **Note:** This is a **one-time setup**. If you have already installed the LLiMa demo application (e.g. for another model), you can skip this step and continue with model download.
49
+
50
+ On your Modalix device, install the LLiMa demo application using the `sima-cli`:
51
+
52
+ ```bash
53
+ # Create a directory for LLiMa
54
+ cd /media/nvme
55
+ mkdir llima
56
+ cd llima
57
+
58
+ # Install the LLiMa runtime code
59
+ sima-cli install -v 2.0.0 samples/llima -t select
60
+ ```
61
+ > **Note:** To only download the LLiMa runtime code, select **🚫 Skip** when prompted.
62
+
63
+ ### 2. Download the Model
64
+
65
+ Download the compiled model assets from this repository directly to your device.
66
+
67
+ ```bash
68
+ # Download the model to a local directory
69
+ cd /media/nvme/llima
70
+ hf download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3-a16w4
71
+ ```
72
+
73
+ Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:
74
+
75
+ ```bash
76
+ hf download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3-a16w4
77
+ scp -r Mistral-7B-Instruct-v0.3-a16w4 sima@<modalix-ip>:/media/nvme/llima/
78
+ ```
79
+ *Replace \<modalix-ip\> with the IP address of your Modalix device.*
80
+
81
+ **Expected Directory Structure:**
82
+
83
+ ```text
84
+ /media/nvme/llima/
85
+ ├── simaai-genai-demo/ # The demo app
86
+ └── Mistral-7B-Instruct-v0.3-a16w4/ # Your downloaded model
87
+ ```
88
+
89
+ ## Usage
90
+
91
+ ### Run the Application
92
+
93
+ Navigate to the demo directory and start the application:
94
+
95
+ ```bash
96
+ cd /media/nvme/llima/simaai-genai-demo
97
+ ./run.sh
98
+ ```
99
+
100
+ The script will detect the installed model(s) and prompt you to select one.
101
+
102
+ Once the application is running, open a browser and navigate to:
103
+ ```text
104
+ https://<modalix-ip>:5000/
105
+ ```
106
+ *Replace \<modalix-ip\> with the IP address of your Modalix device.*
107
+
108
+ ### API Usage
109
+
110
+ To use OpenAI-compatible API, run the model in API mode:
111
+ ```bash
112
+ cd /media/nvme/llima/simaai-genai-demo
113
+ ./run.sh --httponly --api-only
114
+ ```
115
+
116
+ You can interact with it using `curl` or Python.
117
+
118
+ **Example: Chat Completion**
119
+
120
+ ```bash
121
+ curl -N -k -X POST "https://<modalix-ip>:5000/v1/chat/completions" \\
122
+ -H "Content-Type: application/json" \\
123
+ -d '{
124
+ "messages": [
125
+ { "role": "user", "content": "Why is the sky blue?" }
126
+ ],
127
+ "stream": true
128
+ }'
129
+ ```
130
+ *Replace \<modalix-ip\> with the IP address of your Modalix device.*
131
+
132
+ ## Limitations
133
+
134
+ - **Quantization**: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
135
+
136
+
137
+ ## Troubleshooting
138
+
139
+ - **`sima-cli` not found**: Ensure that sima-cli is installed on your Modalix device.
140
+ - **Model can't be run**: Verify the model directory is exactly inside `/media/nvme/llima/` and not nested (e.g., `/media/nvme/llima/Mistral-7B-Instruct-v0.3-a16w4/Mistral-7B-Instruct-v0.3-a16w4`).
141
+ - **Permission Denied**: Ensure you have read/write permissions for the `/media/nvme` directory.
142
+
143
+ ## Resources
144
+
145
+ - [SiMa.ai Documentation](https://docs.sima.ai)
146
+ - [SiMa.ai Hugging Face Organization](https://huggingface.co/simaai)