Initial BF16 bundle: Gemma-4-E4B BF16 + MERaLiON speech LoRA, 16.09% WER

83de0b8 verified 14 days ago

2.38 kB

	---
	library_name: mlx
	license: apache-2.0
	license_link: https://ai.google.dev/gemma/docs/gemma_4_license
	pipeline_tag: text-generation
	base_model: google/gemma-4-E4B-it
	tags:
	- mlx
	- gemma
	- gemma4
	- bfloat16
	- bf16
	- unquantized
	- apple-silicon
	---

	# Gemma-4-E4B-it MLX BF16

	Unquantized bfloat16 MLX conversion of [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) for Apple Silicon inference with [`mlx-lm`](https://github.com/ml-explore/mlx-lm).

	This repo is the plain 16-bit reference variant: no 8-bit, 4-bit, RotorQuant, TurboQuant, AWQ, GPTQ, or GGUF quantization is applied.

	## Provenance

	\| Field \| Value \|
	\|---\|---\|
	\| Source model \| [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) \|
	\| Format \| MLX safetensors \|
	\| Weight dtype \| `bfloat16` \|
	\| Tensor check \| 665 tensors, all `mlx.core.bfloat16` \|
	\| Local conversion tool \| `mlx-lm` \|
	\| License \| Apache 2.0 / Gemma license terms from upstream \|

	Conversion command:

	```bash
	mlx_lm.convert \
	--hf-path google/gemma-4-E4B-it \
	--mlx-path gemma-4-e4b-it-MLX-bf16 \
	--dtype bfloat16
	```

	## Why BF16?

	Gemma-4 is distributed natively in bfloat16. Keeping BF16 preserves the upstream numerical format while avoiding the quality/runtime tradeoffs of weight quantization.

	## Use with MLX

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("majentik/gemma-4-e4b-it-MLX-bf16")

	messages = [{"role": "user", "content": "Explain Singapore's MRT system in one paragraph."}]
	prompt = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_dict=False,
	)

	response = generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True)
	print(response)
	```

	## Relationship to quantized variants

	Use this repo when you want the unquantized BF16 reference decoder. For smaller/faster variants, use the existing quantized MLX repos under `majentik`, such as:

	- [`majentik/gemma-4-E4B-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/gemma-4-E4B-RotorQuant-MLX-8bit)
	- [`majentik/gemma-4-e4b-it-mlx-4bit`](https://huggingface.co/majentik/gemma-4-e4b-it-mlx-4bit)

	## Notes

	- This is a format conversion of the upstream Gemma-4 E4B instruct model, not a fine-tune.
	- The weights remain unquantized BF16.
	- For licensing and acceptable use, follow the upstream Gemma terms linked above.