Gemma-2B-APS-IT — 4-bit MLX Quantized

This is a 4-bit quantized version of google/gemma-2b-aps-it, converted for use with Apple's MLX framework.

Model Description

The base model, Gemma-2B-APS-IT, is a 2-billion parameter language model fine-tuned by Google for Abstractive Proposition Segmentation (APS). Given a text passage, the model segments the content into individual facts, statements, and ideas, restating them as full sentences with minimal changes to the original text.

Use Cases

Atomic claim extraction for fact-checking pipelines
Grounding and retrieval
Evaluation of generation tasks (e.g., summarisation)

Example

Input:

Sarah Stage, 30, welcomed James Hunter into the world on Tuesday. The baby boy weighed eight pounds seven ounces and was 22 inches long.

Output:

- Sarah Stage welcomed James Hunter into the world.
- Sarah Stage welcomed James Hunter on Tuesday.
- Sarah Stage is 30 years old.
- James Hunter weighed eight pounds seven ounces.
- James Hunter was 22 inches long.

Quantization Details

Parameter	Value
Method	Affine quantization
Bits	4
Group size	64
Original dtype	bfloat16
Framework	MLX
Quantized model size	~1.3 GB

Modifications from original: The original model weights were quantized from bfloat16 to 4-bit precision using MLX's quantization utilities. No other modifications were made to the model architecture or tokenizer.

How to Use

from mlx_lm import load, generate

model, tokenizer = load("your-username/gemma-2b-aps-4bit")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Your text passage here."}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(output)

Model Architecture

Parameter	Value
Architecture	GemmaForCausalLM
Hidden size	2048
Intermediate size	16384
Num attention heads	8
Num key-value heads	1
Num hidden layers	18
Head dim	256
Max position embeddings	8192
Vocab size	256,000

Files

model.safetensors — Quantized model weights
config.json — Model configuration
tokenizer.json / tokenizer_config.json — Tokenizer files
chat_template.jinja — Chat template
generation_config.json — Generation configuration

License

This model is a derivative of Google's Gemma and is distributed under the Gemma Terms of Use.

By using this model, you agree to the Gemma Terms of Use and the Gemma Prohibited Use Policy.

Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms

Citation

If you use this model, please cite the original Gemma model:

@article{gemma_2024,
    title={Gemma},
    url={https://ai.google.dev/gemma},
    publisher={Google DeepMind},
    year={2024}
}

Downloads last month: 8

Safetensors

Model size

0.4B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for JiggityNun/gemma-2b-aps-it-4bit

Base model

google/gemma-2b

Finetuned

google/gemma-2b-aps-it

Quantized

(7)

this model