Sweaterdog
/

Smol-reason2.1

Model card Files Files and versions

Smol-reason2.1 / README.md

Sweaterdog's picture

Update README.md

6b318ff verified 9 months ago

|

history blame contribute delete

2.12 kB

	---
	license: apache-2.0
	datasets:
	- Sweaterdog/Smol-reason2.1
	language:
	- en
	base_model:
	- unsloth/Qwen2.5-3B-Instruct-bnb-4bit
	---

	# 🧠Smol-reason2.1🧠

	This is my third GRPO reasoning model, I was exploring fine tuning on my own hardware, and found it to work with 3B models.

	System prompt:
	```
	You are a reasoning LLM named Smol-reason2.1, developed by Sweaterdog. Respond in the following format:
	<think>

	...reason in long recursive loops here...

	</think>

	...answer here...

	Start your response with <think>

	```

	And in accordance to the output format, the model responds like this:
	```
	<think>

	Okay, lets break down the users issue.

	...more reasoning...

	Therefore x should be the answer
	</think>

	X is the answer because...
	```

	# Features

	## Flexible reasoning

	You can modify the system prompt to change the way the model reasons, by default, it is told to reason about code snippets, which I found works best for everything.

	## Logical reasoning

	This is the first model I have seen which can answer "The Mango Puzzle", which goes like this:
	```
	If I give you 15 mangoes, and then you give 14 away, then recieve 60 more mangoes, how many mangoes did you not sell?
	```

	The correct answer is `75 Mangoes`, most LLMs take "Give Away" as a form of sale, so they typically say `61 Mangoes`

	## Code reasoning

	This model is capable of thinking about how to design complex code problems before tackling the entire file.

	## Mathematical reasoning

	This model is capable of breaking down math equations, and checking its own work before responding with an answer.

	## Medical reasoning

	This model is capable of taking in symptoms of a disease, as well as the patients condition, and properly prescribing a diagnosis.

	# Design

	This model was trained off of Qwen2.5 3B and trained on a dataset I put together comprised of Coding, Healthcare, and Math

	To be specific, this model was trained off of Smol-reason2, for longer and on a larger dataset of reasoning data from DeepSeek-R1

	This model has RoPE scaling up to `65536`, and the Q8_0 model can fit on a single GPU with the full context length.