ericrisco
/

gemma-3-4b-reasoning

text-generation-inference

Model card Files Files and versions

gemma-3-4b-reasoning / README.md

ericrisco's picture

Update README.md

92367cf verified 11 months ago

|

history blame contribute delete

2.81 kB

	---
	library_name: transformers
	tags:
	- trl
	- grpo
	license: apache-2.0
	datasets:
	- openai/gsm8k
	language:
	- en
	base_model:
	- google/gemma-3-4b-it
	---
	# Gemma-3-4b Reasoning R1 Model Card

	Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks.

	The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible.

	## Model Details

	### Description

	Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations.

	### Training Dataset

	- GSM8K (English): Specialized dataset for mathematical and logical reasoning problems.

	### Intended Use

	#### Direct Use

	The model is specifically designed for structured reasoning tasks, including:

	- Mathematical and logical reasoning
	- Multi-step problem solving
	- Instruction-based reasoning

	#### Out-of-scope Use

	This model should not be used for unethical or malicious activities that breach legal and ethical standards.

	## How to Use

	The model uses structured XML templates for dialogue and reasoning tasks:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "ericrisco/gemma-3-4b-reasoning"

	prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name, device_map="auto", torch_dtype=torch.bfloat16
	)

	messages = [{"role": "user", "content": prompt}]

	input_text = tokenizer.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)

	inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=200)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(response)
	```

	# Performance

	The Gemma-3-4b Reasoning model exhibits robust internal Chain-of-Thought (CoT) capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks.

	## Limitations

	The model is primarily optimized for numeric and structured reasoning and might produce less accurate or unexpected results when applied to unrelated tasks.

	## Citations

	- Gemma Multimodal Reasoning Model by Google
	- GRPO Implementation by TRL

	## Author

	Eric Risco