intel_google-gemma-3-4b-it-int4
- Model creator: google
- Original model: google-gemma-3-4b-it
This repository contains the Gemma-3-4b-it model optimized for Intel hardware using OpenVINO™ and quantized to INT4 precision.
It is designed for high-performance inference on edge devices such as Intel Core, Xeon, and Intel Arc.
Model Details
- Developed by: Advantech-EIOT / Google
- Architecture: Gemma-3
- Task: Text Generation (Chat/Instruction)
- Precision: INT4 (Weight Compression)
- Optimization: OpenVINO™ Toolkit
Deployment with OpenVINO Model Server (OVMS)
OpenVINO Model Server (OVMS) provides a high-performance, scalable solution for serving this model via OpenAI-compatible APIs.
1. Prerequisite: Verify Model Files
Before launching the server, ensure your local directory contains the following OpenVINO IR files:
openvino_model.xml(Model topology)openvino_model.bin(Model weights)tokenizer_config.jsonand related tokenizer files
2. Launch with Docker
Use the latest OVMS container to serve the model.
docker run -d --rm -p 8000:8000 \
-v $(pwd)/ov_model_dir:/workspace/model:ro \
openvino/model_server:latest \
--rest_port 8000 \
--model_path /workspace/model \
--model_name gemma3-4b-it \
--plugin_config '{"PERFORMANCE_HINT": "LATENCY"}'
3. Usage via OpenAI API
Once the server is running, you can interact with it using the standard OpenAI client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
model="gemma3-4b-it",
messages=[{"role": "user", "content": "How to optimize AI at the Edge?"}]
)
print(response.choices[0].message.content)
Hardware Compatibility
This 4B INT4 model is highly optimized for:
- Intel Core Ultra (CPU/iGPU): Ideal for local AI PC deployments.
- Advantech Edge AI Platforms: Such as those used in industrial or IoT environments.
- Intel Xeon Scalable Processors: Efficient for high-throughput inference.
- Intel Arc Discrete Graphics: Accelerated LLM performance.
Limitations and Disclaimer
Gemma-3 is a powerful model but may exhibit hallucinations.
Users should validate outputs for critical applications.
Please refer to the Google Gemma License for usage restrictions.
- Downloads last month
- 35