Gemma 3 270M-it (MLC-LLM Compiled)

This repository contains Gemma 3 270M-it compiled for on-device deployment using MLC-LLM.

The model has been optimized and compiled for WebGPU (browser/Tauri) and Android (Vulkan) targets.

Model Details

Attribute Details
Base Model google/gemma-3-270m-it
Quantization q4f16_1 (4-bit group quantization with float16 scales)
Context Length 8192 tokens
Model Size ~270 MB (Quantized Parameters)
Supported Backends WebGPU (WASM), Android (Vulkan via TAR)
Task Text Generation / Chat

Use Cases

This lightweight model (270M parameters) is ideal for constrained environments where low latency and memory efficiency are critical.

  • On-Device Assistants: Integrate into mobile or desktop apps (Tauri, Electron) without relying on cloud APIs.
  • Journaling & Note-Taking: Private, local AI for summarizing thoughts or retrieving insights from personal data.
  • Web-Based AI: Run LLMs directly in the user's browser using WebGPU.
  • Education: A small, approachable model for learning about LLMs and local deployment.

Limitations

  • Text-Only: The multimodal capabilities of the original Gemma 3 have been stripped to ensure compatibility with standard MLC text generation pipelines.
  • Size: As a 270M parameter model, reasoning capabilities are limited compared to larger (2B, 7B) models. It is best suited for simpler tasks, summarization, and chat.
  • Context: While it supports 8k context, performance on extremely long prompts may degrade on low-memory devices.

How to Use

1. WebLLM (Browser / Tauri)

You can run this model in the browser using WebLLM.

Requirements:

  • A browser with WebGPU support (Chrome, Edge, etc.).
  • The gemma-3-270m-it-webgpu.wasm file and strict mlc-chat-config.json from this repo.

Example Code (TypeScript/JS):

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const appConfig = {
  model_list: [
    {
      "model": "http://localhost:8000/gemma-3-270m-it-mlc", // Path to model config & weights
      "model_id": "gemma-3-270m-it",
      "model_lib": "http://localhost:8000/gemma-3-270m-it-webgpu.wasm", // Path to compiled WASM
    }
  ]
};

const engine = await CreateMLCEngine("gemma-3-270m-it", { appConfig });
const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Explain quantum physics in one sentence." }]
});
console.log(reply.choices[0].message);

2. MLC-LLM CLI

If you have mlc_llm installed locally:

mlc_llm chat ./dist/gemma-3-270m-it-mlc --model-lib ./dist/libs/gemma-3-270m-it-webgpu.wasm

3. Android Integration

The release includes gemma-3-270m-android.tar, which is a static library capable of running on Android via Vulkan.

  1. Follow the MLC Android Guide.
  2. Use the gemma-3-270m-android.tar as your model library payload.

Architecture and Training

Please refer to the original Gemma 3 Technical Report for details on the architecture, training data, and alignment process.

Citation

@article{gemma_2024,
    title={Gemma: Open Models Based on Gemini Research and Technology},
    url={https://goo.gle/GemmaReport},
    DOI={10.48550/arXiv.2403.08295},
    publisher={arXiv},
    year={2024},
}

License

This model is subject to the Gemma Terms of Use. By using this model, you agree to the terms outlined in the original repository.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vikramlingam/gemma-3-270m-it-mlc-webgpu

Finetuned
(1009)
this model