Gemma-2B-APS-IT β€” 4-bit MLX Quantized

This is a 4-bit quantized version of google/gemma-2b-aps-it, converted for use with Apple's MLX framework.

Model Description

The base model, Gemma-2B-APS-IT, is a 2-billion parameter language model fine-tuned by Google for Abstractive Proposition Segmentation (APS). Given a text passage, the model segments the content into individual facts, statements, and ideas, restating them as full sentences with minimal changes to the original text.

Use Cases

  • Atomic claim extraction for fact-checking pipelines
  • Grounding and retrieval
  • Evaluation of generation tasks (e.g., summarisation)

Example

Input:

Sarah Stage, 30, welcomed James Hunter into the world on Tuesday. The baby boy weighed eight pounds seven ounces and was 22 inches long.

Output:

- Sarah Stage welcomed James Hunter into the world.
- Sarah Stage welcomed James Hunter on Tuesday.
- Sarah Stage is 30 years old.
- James Hunter weighed eight pounds seven ounces.
- James Hunter was 22 inches long.

Quantization Details

Parameter Value
Method Affine quantization
Bits 4
Group size 64
Original dtype bfloat16
Framework MLX
Quantized model size ~1.3 GB

Modifications from original: The original model weights were quantized from bfloat16 to 4-bit precision using MLX's quantization utilities. No other modifications were made to the model architecture or tokenizer.

How to Use

from mlx_lm import load, generate

model, tokenizer = load("your-username/gemma-2b-aps-4bit")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Your text passage here."}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(output)

Model Architecture

Parameter Value
Architecture GemmaForCausalLM
Hidden size 2048
Intermediate size 16384
Num attention heads 8
Num key-value heads 1
Num hidden layers 18
Head dim 256
Max position embeddings 8192
Vocab size 256,000

Files

  • model.safetensors β€” Quantized model weights
  • config.json β€” Model configuration
  • tokenizer.json / tokenizer_config.json β€” Tokenizer files
  • chat_template.jinja β€” Chat template
  • generation_config.json β€” Generation configuration

License

This model is a derivative of Google's Gemma and is distributed under the Gemma Terms of Use.

By using this model, you agree to the Gemma Terms of Use and the Gemma Prohibited Use Policy.

Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms

Citation

If you use this model, please cite the original Gemma model:

@article{gemma_2024,
    title={Gemma},
    url={https://ai.google.dev/gemma},
    publisher={Google DeepMind},
    year={2024}
}
Downloads last month
19
Safetensors
Model size
0.4B params
Tensor type
BF16
Β·
U32
Β·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for JiggityNun/gemma-2b-aps-it-4bit

Base model

google/gemma-2b
Quantized
(7)
this model