gemma 4
Collection
1 item • Updated
How to use over-show/gemma-4-e2b-it-text-only-4bit with MLX:
# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# Generate text with mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("over-show/gemma-4-e2b-it-text-only-4bit")
prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
text = generate(model, tokenizer, prompt=prompt, verbose=True)How to use over-show/gemma-4-e2b-it-text-only-4bit with MLX LM:
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "over-show/gemma-4-e2b-it-text-only-4bit"
# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "over-show/gemma-4-e2b-it-text-only-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "over-show/gemma-4-e2b-it-text-only-4bit",
"messages": [
{"role": "user", "content": "Hello"}
]
}'Text-only repack of mlx-community/gemma-4-e2b-it-4bit for Overshow local inference via MLX Swift.
This artefact keeps the Gemma 4 language model tensors and tokenizer files, strips the language_model. tensor prefix expected by the pinned MLX Swift text loader, and drops unused audio and vision tower tensors.
mlx-community/gemma-4-e2b-it-4bit99d9a53ff828d365a8ecae538e45f80a08d612cdscripts/repack-gemma4-text-only.py in over-show/appRequired files:
config.jsonmodel.safetensorsmodel.safetensors.index.jsontokenizer.jsontokenizer_config.jsongeneration_config.jsonLocal validation passed before publishing:
scripts/validate-mlx-helper.py: 5/5 commands passedswift test --filter Gemma4LoadSmokeTest: 2/2 tests passed4-bit
Base model
mlx-community/gemma-4-e2b-it-4bit