--- library_name: llima license: mit tags: - llm - generative_ai - embedded - sima pipeline_tag: text-generation base_model: microsoft/Phi-3.5-mini-instruct --- # Phi-3.5-mini-instruct: Optimized for SiMa.ai Modalix ## Overview This repository contains the **Phi-3.5-mini-instruct** model, optimized and compiled for the **SiMa.ai Modalix** platform. - **Model Architecture:** Phi-3.5 Mini (3.8B parameters) - **Quantization:** Hybrid - **Prompt Processing:** A16W8 (16-bit activations, 8-bit weights) - **Token Generation:** A16W4 (16-bit activations, 4-bit weights) - **Maximum context length:** 2048 - **Source Model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) ## Performance The following performance metrics were measured with an input sequence length of 128 tokens. | Model | Precision | Device | Response Rate (tokens/sec) | Time To First Token (sec) | |---|---|---|---|---| | Phi-3.5-mini-instruct | A16W8/A16W4 | Modalix | 16.5 tokens/sec| 0.15 sec| ## Prerequisites To run this model, you need: 1. **SiMa.ai Modalix Device** 2. **SiMa.ai CLI**: [Installed](https://docs.sima.ai/pages/sima_cli/main.html#installation) on your Modalix device. 3. **Hugging Face CLI**: For downloading the model. ## Installation & Deployment Follow these steps to deploy the model to your Modalix device. ### 1. Install LLiMa Demo Application > **Note:** This is a **one-time setup**. If you have already installed the LLiMa demo application (e.g. for another model), you can skip this step and continue with model download. On your Modalix device, install the LLiMa demo application using the `sima-cli`: ```bash # Create a directory for LLiMa cd /media/nvme mkdir llima cd llima # Install the LLiMa runtime code sima-cli install -v 2.0.0 samples/llima -t select ``` > **Note:** To only download the LLiMa runtime code, select **🚫 Skip** when prompted. ### 2. Download the Model Download the compiled model assets from this repository directly to your device. ```bash # Download the model to a local directory cd /media/nvme/llima hf download simaai/Phi-3.5-mini-instruct-a16w4 --local-dir Phi-3.5-mini-instruct-a16w4 ``` Alternatively, you can download the compiled model to a Host and copy it to the Modalix device: ```bash hf download simaai/Phi-3.5-mini-instruct-a16w4 --local-dir Phi-3.5-mini-instruct-a16w4 scp -r Phi-3.5-mini-instruct-a16w4 sima@:/media/nvme/llima/ ``` *Replace \ with the IP address of your Modalix device.* **Expected Directory Structure:** ```text /media/nvme/llima/ ├── simaai-genai-demo/ # The demo app └── Phi-3.5-mini-instruct-a16w4/ # Your downloaded model ``` ## Usage ### Run the Application Navigate to the demo directory and start the application: ```bash cd /media/nvme/llima/simaai-genai-demo ./run.sh ``` The script will detect the installed model(s) and prompt you to select one. Once the application is running, open a browser and navigate to: ```text https://:5000/ ``` *Replace \ with the IP address of your Modalix device.* ### API Usage To use OpenAI-compatible API, run the model in API mode: ```bash cd /media/nvme/llima/simaai-genai-demo ./run.sh --httponly --api-only ``` You can interact with it using `curl` or Python. **Example: Chat Completion** ```bash curl -N -k -X POST "https://:5000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "role": "user", "content": "Why is the sky blue?" } ], "stream": true }' ``` *Replace \ with the IP address of your Modalix device.* ## Limitations - **Quantization**: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur. ## Troubleshooting - **`sima-cli` not found**: Ensure that sima-cli is installed on your Modalix device. - **Model can't be run**: Verify the model directory is exactly inside `/media/nvme/llima/` and not nested (e.g., `/media/nvme/llima/Phi-3.5-mini-instruct-a16w4/Phi-3.5-mini-instruct-a16w4`). - **Permission Denied**: Ensure you have read/write permissions for the `/media/nvme` directory. ## Resources - [SiMa.ai Documentation](https://docs.sima.ai) - [SiMa.ai Hugging Face Organization](https://huggingface.co/simaai)