ericrisco's picture
Update README.md
92367cf verified
---
library_name: transformers
tags:
- trl
- grpo
license: apache-2.0
datasets:
- openai/gsm8k
language:
- en
base_model:
- google/gemma-3-4b-it
---
# Gemma-3-4b Reasoning R1 Model Card
Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks.
The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible.
## Model Details
### Description
Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations.
### Training Dataset
- **GSM8K (English)**: Specialized dataset for mathematical and logical reasoning problems.
### Intended Use
#### Direct Use
The model is specifically designed for structured reasoning tasks, including:
- Mathematical and logical reasoning
- Multi-step problem solving
- Instruction-based reasoning
#### Out-of-scope Use
This model should not be used for unethical or malicious activities that breach legal and ethical standards.
## How to Use
The model uses structured XML templates for dialogue and reasoning tasks:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "ericrisco/gemma-3-4b-reasoning"
prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, device_map="auto", torch_dtype=torch.bfloat16
)
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
# Performance
The **Gemma-3-4b Reasoning** model exhibits robust internal **Chain-of-Thought (CoT)** capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks.
## Limitations
The model is primarily optimized for **numeric and structured reasoning** and might produce less accurate or unexpected results when applied to unrelated tasks.
## Citations
- *Gemma Multimodal Reasoning Model* by Google
- *GRPO Implementation* by TRL
## Author
**Eric Risco**