intel_google-gemma-3-4b-it-int4

This repository contains the Gemma-3-4b-it model optimized for Intel hardware using OpenVINO™ and quantized to INT4 precision.

It is designed for high-performance inference on edge devices such as Intel Core, Xeon, and Intel Arc.

Model Details

  • Developed by: Advantech-EIOT / Google
  • Architecture: Gemma-3
  • Task: Text Generation (Chat/Instruction)
  • Precision: INT4 (Weight Compression)
  • Optimization: OpenVINO™ Toolkit

Deployment with OpenVINO Model Server (OVMS)

OpenVINO Model Server (OVMS) provides a high-performance, scalable solution for serving this model via OpenAI-compatible APIs.

1. Prerequisite: Verify Model Files

Before launching the server, ensure your local directory contains the following OpenVINO IR files:

  • openvino_model.xml (Model topology)
  • openvino_model.bin (Model weights)
  • tokenizer_config.json and related tokenizer files

2. Launch with Docker

Use the latest OVMS container to serve the model.

docker run -d --rm -p 8000:8000 \
    -v $(pwd)/ov_model_dir:/workspace/model:ro \
    openvino/model_server:latest \
    --rest_port 8000 \
    --model_path /workspace/model \
    --model_name gemma3-4b-it \
    --plugin_config '{"PERFORMANCE_HINT": "LATENCY"}'

3. Usage via OpenAI API

Once the server is running, you can interact with it using the standard OpenAI client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

response = client.chat.completions.create(
    model="gemma3-4b-it",
    messages=[{"role": "user", "content": "How to optimize AI at the Edge?"}]
)

print(response.choices[0].message.content)

Hardware Compatibility

This 4B INT4 model is highly optimized for:

  • Intel Core Ultra (CPU/iGPU): Ideal for local AI PC deployments.
  • Advantech Edge AI Platforms: Such as those used in industrial or IoT environments.
  • Intel Xeon Scalable Processors: Efficient for high-throughput inference.
  • Intel Arc Discrete Graphics: Accelerated LLM performance.

Limitations and Disclaimer

Gemma-3 is a powerful model but may exhibit hallucinations.

Users should validate outputs for critical applications.

Please refer to the Google Gemma License for usage restrictions.

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Advantech-EIOT/intel_google-gemma-3-4b-it-int4