ericrisco
/

salamandra-2b-r1

text-generation-inference

Model card Files Files and versions

salamandra-2b-r1 / README.md

ericrisco's picture

Update README.md

4743df1 verified 11 months ago

|

history blame contribute delete

2.79 kB

	---
	base_model: BSC-LT/salamandra-2b-instruct
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	- grpo
	license: apache-2.0
	language:
	- en
	datasets:
	- ericrisco/gsm8k-translated-catalan
	- ericrisco/gsm8k-translated-spanish
	- openai/gsm8k
	---
	# Salamandra 2B Reasoning R1 Model Card

	Salamandra is a highly multilingual model pre-trained from scratch that comes in different sizes. This model card corresponds to the 2B instructed version, fine-tuned using GRPO (Group Reward Policy Optimization) and Unsloth.

	To visit the model cards of other Salamandra versions, please refer to the Model Index.

	The entire Salamandra family is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.

	## Model Details

	### Description
	Salamandra-2B is a reasoning-focused transformer-based language model fine-tuned with GRPO. It has been trained on high-quality datasets, including:

	- GSM8K (English)
	- GSM8K Translated (Spanish)
	- GSM8K Translated (Catalan)

	This dataset selection allows the model to reason through complex problems in multiple languages. Instead of relying on traditional supervised fine-tuning, GRPO optimizes the model through reward-based reinforcement learning, making it more adaptive to structured reasoning tasks.

	## Intended Use

	### Direct Use
	The model is designed as a reasoning assistant capable of structured problem-solving across different domains. It can be used for:
	- Logical and mathematical reasoning tasks
	- Multi-step question answering
	- Instruction following in multilingual contexts

	### Out-of-scope Use
	The model is not intended for malicious applications or any activity that violates legal or ethical standards.

	## How to Use

	The instruction-following models use the ChatML template for structured dialogue formatting:

	```python
	from datetime import datetime
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "ericrisco/salamandra-2b-r1"

	text = "At what temperature does water boil?"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype=torch.bfloat16
	)

	message = [ { "role": "user", "content": text } ]
	date_string = datetime.today().strftime('%Y-%m-%d')

	prompt = tokenizer.apply_chat_template(
	message,
	tokenize=False,
	add_generation_prompt=True,
	date_string=date_string
	)

	inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
	outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))