gemma4-E4B-it: Optimized for SiMa.ai Modalix

Overview

This repository contains the gemma4-E4B-it model, optimized and compiled for the SiMa.ai Modalix platform.

Note: This model currently requires the beta/develop LLiMa runtime (sima-cli install -v 2.1.0 tools/llima -t develop).

  • Model Architecture: Gemma 4 E4B-it
  • Quantization: Hybrid
    • Vision / Per-layer / Grouped Language: INT8
    • Remaining Language Stages: INT4
  • Maximum context length: 2048
  • Input Resolution: 480x480 (Fixed)
  • Source Model: google/gemma-4-E4B-it

Performance

The following performance metrics were measured with a short text prompt and one image input.

Model Precision Device Response Rate (tokens/sec) Time To First Token (sec)
gemma4-E4B-it INT8/INT4 hybrid Modalix 20 tokens/sec 0.20 sec

Prerequisites

To run this model, you need:

  1. SiMa.ai Modalix Device
  2. SiMa.ai CLI: Installed on your Modalix device.
  3. Hugging Face CLI: For downloading the model.

Installation & Deployment

Follow these steps to deploy the model to your Modalix device.

1. Install LLiMa Demo Application

Note: This is a one-time setup. If you have already installed the LLiMa demo application (e.g. for another model), you can skip this step and continue with model download.

On your Modalix device, install the LLiMa demo application using the sima-cli:

# Create a directory for LLiMa
cd /media/nvme
mkdir llima
cd llima
# Install the LLiMa runtime code
sima-cli install -v 2.1.0 tools/llima -t develop

Note: This uses the current beta/develop LLiMa install.

2. Download the Model

Download the compiled model assets from this repository directly to your device.

# Download the model to a local directory
llima pull gemma4-E4B-it

Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:

hf download simaai/gemma4-E4B-it --local-dir gemma4-E4B-it
scp -r gemma4-E4B-it sima@<modalix-ip>:/media/nvme/llima/models/

Replace <modalix-ip> with the IP address of your Modalix device.

Expected Directory Structure:

/media/nvme/llima/
β”œβ”€β”€ run.sh
└── models/
    └── gemma4-E4B-it/   # The compiled model

Usage

Run the Application

Navigate to the demo directory and start the application:

cd /media/nvme/llima/
./run.sh

The script will detect the installed model(s) and prompt you to select one.

Once the application is running, open a browser and navigate to:

https://<modalix-ip>:5000/

Replace <modalix-ip> with the IP address of your Modalix device.

API Usage

To use OpenAI-compatible API, run the model in API mode:

llima run gemma4-E4B-it --mode web

You can interact with it using curl or Python.

Example: Chat Completion

# Note: Replace <YOUR_BASE64_STRING_HERE> with an actual base64 encoded image string.
curl -N -k -X POST "https://<modalix-ip>:9998/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sima-vlm",
    "stream": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,<YOUR_BASE64_STRING_HERE>"
            }
          },
          {
            "type": "text",
            "text": "Describe the image in two sentences."
          }
        ]
      }
    ]
  }'

Replace <modalix-ip> with the IP address of your Modalix device.

Limitations

  • Quantization: This model uses a hybrid INT8/INT4 compilation strategy for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
  • Fixed Resolution: This version has been specifically optimized and fixed to 480x480 resolution at compile time to achieve maximum throughput and efficiency on the SiMa.ai MLA.
  • Single-image support: This build is intended for text plus one image input.

Troubleshooting

  • sima-cli not found: Ensure that sima-cli is installed on your Modalix device.
  • Model can't be run: Verify the model directory is exactly inside /media/nvme/llima/models/ and not nested (e.g., /media/nvme/llima/models/gemma4-E4B-it/gemma4-E4B-it).
  • Permission Denied: Ensure you have read/write permissions for the /media/nvme directory.

Resources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for simaai/gemma4-E4B-it

Finetuned
(191)
this model

Collection including simaai/gemma4-E4B-it