Instructions to use JiggityNun/gemma-2b-aps-it-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use JiggityNun/gemma-2b-aps-it-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("JiggityNun/gemma-2b-aps-it-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use JiggityNun/gemma-2b-aps-it-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "JiggityNun/gemma-2b-aps-it-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "JiggityNun/gemma-2b-aps-it-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JiggityNun/gemma-2b-aps-it-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Gemma-2B-APS-IT โ 4-bit MLX Quantized
This is a 4-bit quantized version of google/gemma-2b-aps-it, converted for use with Apple's MLX framework.
Model Description
The base model, Gemma-2B-APS-IT, is a 2-billion parameter language model fine-tuned by Google for Abstractive Proposition Segmentation (APS). Given a text passage, the model segments the content into individual facts, statements, and ideas, restating them as full sentences with minimal changes to the original text.
Use Cases
- Atomic claim extraction for fact-checking pipelines
- Grounding and retrieval
- Evaluation of generation tasks (e.g., summarisation)
Example
Input:
Sarah Stage, 30, welcomed James Hunter into the world on Tuesday. The baby boy weighed eight pounds seven ounces and was 22 inches long.
Output:
- Sarah Stage welcomed James Hunter into the world.
- Sarah Stage welcomed James Hunter on Tuesday.
- Sarah Stage is 30 years old.
- James Hunter weighed eight pounds seven ounces.
- James Hunter was 22 inches long.
Quantization Details
| Parameter | Value |
|---|---|
| Method | Affine quantization |
| Bits | 4 |
| Group size | 64 |
| Original dtype | bfloat16 |
| Framework | MLX |
| Quantized model size | ~1.3 GB |
Modifications from original: The original model weights were quantized from bfloat16 to 4-bit precision using MLX's quantization utilities. No other modifications were made to the model architecture or tokenizer.
How to Use
from mlx_lm import load, generate
model, tokenizer = load("your-username/gemma-2b-aps-4bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Your text passage here."}],
tokenize=False,
add_generation_prompt=True,
)
output = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(output)
Model Architecture
| Parameter | Value |
|---|---|
| Architecture | GemmaForCausalLM |
| Hidden size | 2048 |
| Intermediate size | 16384 |
| Num attention heads | 8 |
| Num key-value heads | 1 |
| Num hidden layers | 18 |
| Head dim | 256 |
| Max position embeddings | 8192 |
| Vocab size | 256,000 |
Files
model.safetensorsโ Quantized model weightsconfig.jsonโ Model configurationtokenizer.json/tokenizer_config.jsonโ Tokenizer fileschat_template.jinjaโ Chat templategeneration_config.jsonโ Generation configuration
License
This model is a derivative of Google's Gemma and is distributed under the Gemma Terms of Use.
By using this model, you agree to the Gemma Terms of Use and the Gemma Prohibited Use Policy.
Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms
Citation
If you use this model, please cite the original Gemma model:
@article{gemma_2024,
title={Gemma},
url={https://ai.google.dev/gemma},
publisher={Google DeepMind},
year={2024}
}
- Downloads last month
- 60
4-bit