simaai
/

Mistral-7B-Instruct-v0.3-a16w4

+---
+library_name: llima
+license: apache-2.0
+tags:
+- llm
+- generative_ai
+- embedded
+- sima
+pipeline_tag: text-generation
+base_model: mistralai/Mistral-7B-Instruct-v0.3
+---
+# Mistral-7B-Instruct-v0.3: Optimized for SiMa.ai Modalix
+## Overview
+This repository contains the **Mistral-7B-Instruct-v0.3** model, optimized and compiled for the **SiMa.ai Modalix** platform.
+- **Model Architecture:** Mistral-7B-Instruct-v0.3 (7B parameters)
+- **Quantization:** Hybrid
+  - **Prompt Processing:** A16W8 (16-bit activations, 8-bit weights)
+  - **Token Generation:** A16W4 (16-bit activations, 4-bit weights)
+- **Maximum context length:** 2048
+- **Source Model:** [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
+## Performance
+The following performance metrics were measured with an input sequence length of 128 tokens.
+| Model | Precision | Device | Response Rate (tokens/sec) | Time To First Token (sec) |
+|:---:|:---:|:---:|:---:|:---:|
+| Mistral-7B-Instruct-v0.3 | A16W8/A16W4 | Modalix | 10.8 tokens/sec | 0.35 sec |
+## Prerequisites
+To run this model, you need:
+1.  **SiMa.ai Modalix Device**
+2.  **SiMa.ai CLI**: [Installed](https://docs.sima.ai/pages/sima_cli/main.html#installation) on your Modalix device.
+3.  **Hugging Face CLI**: For downloading the model.
+## Installation & Deployment
+Follow these steps to deploy the model to your Modalix device.
+### 1. Install LLiMa Demo Application
+> **Note:** This is a **one-time setup**. If you have already installed the LLiMa demo application (e.g. for another model), you can skip this step and continue with model download.
+On your Modalix device, install the LLiMa demo application using the `sima-cli`:
+```bash
+# Create a directory for LLiMa
+cd /media/nvme
+mkdir llima
+cd llima
+# Install the LLiMa runtime code
+sima-cli install -v 2.0.0 samples/llima -t select
+```
+> **Note:** To only download the LLiMa runtime code, select **🚫 Skip** when prompted.
+### 2. Download the Model
+Download the compiled model assets from this repository directly to your device.
+```bash
+# Download the model to a local directory
+cd /media/nvme/llima
+hf download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3-a16w4
+```
+Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:
+```bash
+hf download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3-a16w4
+scp -r Mistral-7B-Instruct-v0.3-a16w4 sima@<modalix-ip>:/media/nvme/llima/
+```
+*Replace \<modalix-ip\> with the IP address of your Modalix device.*
+**Expected Directory Structure:**
+```text
+/media/nvme/llima/
+├── simaai-genai-demo/   # The demo app
+└── Mistral-7B-Instruct-v0.3-a16w4/        # Your downloaded model
+```
+## Usage
+### Run the Application
+Navigate to the demo directory and start the application:
+```bash
+cd /media/nvme/llima/simaai-genai-demo
+./run.sh
+```
+The script will detect the installed model(s) and prompt you to select one.
+Once the application is running, open a browser and navigate to:
+```text
+https://<modalix-ip>:5000/
+```
+*Replace \<modalix-ip\> with the IP address of your Modalix device.*
+### API Usage
+To use OpenAI-compatible API, run the model in API mode:
+```bash
+cd /media/nvme/llima/simaai-genai-demo
+./run.sh --httponly --api-only
+```
+You can interact with it using `curl` or Python.
+**Example: Chat Completion**
+```bash
+curl -N -k -X POST "https://<modalix-ip>:5000/v1/chat/completions" \\
+  -H "Content-Type: application/json" \\
+  -d '{
+    "messages": [
+      { "role": "user", "content": "Why is the sky blue?" }
+    ],
+    "stream": true
+  }'
+```
+*Replace \<modalix-ip\> with the IP address of your Modalix device.*
+## Limitations
+- **Quantization**: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
+## Troubleshooting
+- **`sima-cli` not found**: Ensure that sima-cli is installed on your Modalix device.
+- **Model can't be run**: Verify the model directory is exactly inside `/media/nvme/llima/` and not nested (e.g., `/media/nvme/llima/Mistral-7B-Instruct-v0.3-a16w4/Mistral-7B-Instruct-v0.3-a16w4`).
+- **Permission Denied**: Ensure you have read/write permissions for the `/media/nvme` directory.
+## Resources
+- [SiMa.ai Documentation](https://docs.sima.ai)
+- [SiMa.ai Hugging Face Organization](https://huggingface.co/simaai)