|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- trl |
|
|
- grpo |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- openai/gsm8k |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google/gemma-3-4b-it |
|
|
--- |
|
|
# Gemma-3-4b Reasoning R1 Model Card |
|
|
|
|
|
Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks. |
|
|
|
|
|
The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Description |
|
|
|
|
|
Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations. |
|
|
|
|
|
### Training Dataset |
|
|
|
|
|
- **GSM8K (English)**: Specialized dataset for mathematical and logical reasoning problems. |
|
|
|
|
|
### Intended Use |
|
|
|
|
|
#### Direct Use |
|
|
|
|
|
The model is specifically designed for structured reasoning tasks, including: |
|
|
|
|
|
- Mathematical and logical reasoning |
|
|
- Multi-step problem solving |
|
|
- Instruction-based reasoning |
|
|
|
|
|
#### Out-of-scope Use |
|
|
|
|
|
This model should not be used for unethical or malicious activities that breach legal and ethical standards. |
|
|
|
|
|
## How to Use |
|
|
|
|
|
The model uses structured XML templates for dialogue and reasoning tasks: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name = "ericrisco/gemma-3-4b-reasoning" |
|
|
|
|
|
prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, device_map="auto", torch_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
messages = [{"role": "user", "content": prompt}] |
|
|
|
|
|
input_text = tokenizer.apply_chat_template( |
|
|
messages, tokenize=False, add_generation_prompt=True |
|
|
) |
|
|
|
|
|
inputs = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
outputs = model.generate(**inputs, max_new_tokens=200) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
print(response) |
|
|
``` |
|
|
|
|
|
# Performance |
|
|
|
|
|
The **Gemma-3-4b Reasoning** model exhibits robust internal **Chain-of-Thought (CoT)** capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
The model is primarily optimized for **numeric and structured reasoning** and might produce less accurate or unexpected results when applied to unrelated tasks. |
|
|
|
|
|
## Citations |
|
|
|
|
|
- *Gemma Multimodal Reasoning Model* by Google |
|
|
- *GRPO Implementation* by TRL |
|
|
|
|
|
## Author |
|
|
|
|
|
**Eric Risco** |