Gemma-2B-APS-IT β 4-bit MLX Quantized
This is a 4-bit quantized version of google/gemma-2b-aps-it, converted for use with Apple's MLX framework.
Model Description
The base model, Gemma-2B-APS-IT, is a 2-billion parameter language model fine-tuned by Google for Abstractive Proposition Segmentation (APS). Given a text passage, the model segments the content into individual facts, statements, and ideas, restating them as full sentences with minimal changes to the original text.
Use Cases
- Atomic claim extraction for fact-checking pipelines
- Grounding and retrieval
- Evaluation of generation tasks (e.g., summarisation)
Example
Input:
Sarah Stage, 30, welcomed James Hunter into the world on Tuesday. The baby boy weighed eight pounds seven ounces and was 22 inches long.
Output:
- Sarah Stage welcomed James Hunter into the world.
- Sarah Stage welcomed James Hunter on Tuesday.
- Sarah Stage is 30 years old.
- James Hunter weighed eight pounds seven ounces.
- James Hunter was 22 inches long.
Quantization Details
| Parameter | Value |
|---|---|
| Method | Affine quantization |
| Bits | 4 |
| Group size | 64 |
| Original dtype | bfloat16 |
| Framework | MLX |
| Quantized model size | ~1.3 GB |
Modifications from original: The original model weights were quantized from bfloat16 to 4-bit precision using MLX's quantization utilities. No other modifications were made to the model architecture or tokenizer.
How to Use
from mlx_lm import load, generate
model, tokenizer = load("your-username/gemma-2b-aps-4bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Your text passage here."}],
tokenize=False,
add_generation_prompt=True,
)
output = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(output)
Model Architecture
| Parameter | Value |
|---|---|
| Architecture | GemmaForCausalLM |
| Hidden size | 2048 |
| Intermediate size | 16384 |
| Num attention heads | 8 |
| Num key-value heads | 1 |
| Num hidden layers | 18 |
| Head dim | 256 |
| Max position embeddings | 8192 |
| Vocab size | 256,000 |
Files
model.safetensorsβ Quantized model weightsconfig.jsonβ Model configurationtokenizer.json/tokenizer_config.jsonβ Tokenizer fileschat_template.jinjaβ Chat templategeneration_config.jsonβ Generation configuration
License
This model is a derivative of Google's Gemma and is distributed under the Gemma Terms of Use.
By using this model, you agree to the Gemma Terms of Use and the Gemma Prohibited Use Policy.
Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms
Citation
If you use this model, please cite the original Gemma model:
@article{gemma_2024,
title={Gemma},
url={https://ai.google.dev/gemma},
publisher={Google DeepMind},
year={2024}
}
- Downloads last month
- 19
4-bit