Gemma 3 270M-it (MLC-LLM Compiled)
This repository contains Gemma 3 270M-it compiled for on-device deployment using MLC-LLM.
The model has been optimized and compiled for WebGPU (browser/Tauri) and Android (Vulkan) targets.
Model Details
| Attribute | Details |
|---|---|
| Base Model | google/gemma-3-270m-it |
| Quantization | q4f16_1 (4-bit group quantization with float16 scales) |
| Context Length | 8192 tokens |
| Model Size | ~270 MB (Quantized Parameters) |
| Supported Backends | WebGPU (WASM), Android (Vulkan via TAR) |
| Task | Text Generation / Chat |
Use Cases
This lightweight model (270M parameters) is ideal for constrained environments where low latency and memory efficiency are critical.
- On-Device Assistants: Integrate into mobile or desktop apps (Tauri, Electron) without relying on cloud APIs.
- Journaling & Note-Taking: Private, local AI for summarizing thoughts or retrieving insights from personal data.
- Web-Based AI: Run LLMs directly in the user's browser using WebGPU.
- Education: A small, approachable model for learning about LLMs and local deployment.
Limitations
- Text-Only: The multimodal capabilities of the original Gemma 3 have been stripped to ensure compatibility with standard MLC text generation pipelines.
- Size: As a 270M parameter model, reasoning capabilities are limited compared to larger (2B, 7B) models. It is best suited for simpler tasks, summarization, and chat.
- Context: While it supports 8k context, performance on extremely long prompts may degrade on low-memory devices.
How to Use
1. WebLLM (Browser / Tauri)
You can run this model in the browser using WebLLM.
Requirements:
- A browser with WebGPU support (Chrome, Edge, etc.).
- The
gemma-3-270m-it-webgpu.wasmfile and strictmlc-chat-config.jsonfrom this repo.
Example Code (TypeScript/JS):
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const appConfig = {
model_list: [
{
"model": "http://localhost:8000/gemma-3-270m-it-mlc", // Path to model config & weights
"model_id": "gemma-3-270m-it",
"model_lib": "http://localhost:8000/gemma-3-270m-it-webgpu.wasm", // Path to compiled WASM
}
]
};
const engine = await CreateMLCEngine("gemma-3-270m-it", { appConfig });
const reply = await engine.chat.completions.create({
messages: [{ role: "user", content: "Explain quantum physics in one sentence." }]
});
console.log(reply.choices[0].message);
2. MLC-LLM CLI
If you have mlc_llm installed locally:
mlc_llm chat ./dist/gemma-3-270m-it-mlc --model-lib ./dist/libs/gemma-3-270m-it-webgpu.wasm
3. Android Integration
The release includes gemma-3-270m-android.tar, which is a static library capable of running on Android via Vulkan.
- Follow the MLC Android Guide.
- Use the
gemma-3-270m-android.taras your model library payload.
Architecture and Training
Please refer to the original Gemma 3 Technical Report for details on the architecture, training data, and alignment process.
Citation
@article{gemma_2024,
title={Gemma: Open Models Based on Gemini Research and Technology},
url={https://goo.gle/GemmaReport},
DOI={10.48550/arXiv.2403.08295},
publisher={arXiv},
year={2024},
}
License
This model is subject to the Gemma Terms of Use. By using this model, you agree to the terms outlined in the original repository.
- Downloads last month
- -