Mini-Think-Base-1B / README.md

Update README.md

86460e3 verified 10 months ago

3.71 kB

	---
	license: llama3.2
	datasets:
	- openai/gsm8k
	language:
	- en
	base_model:
	- unsloth/Llama-3.2-1B-Instruct
	library_name: transformers
	tags:
	- llama
	- think
	---
	# MiniThink-1B-base

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/646ba0d4c7f672003c851ed2/rsr_FSCzYXN5OTf5UrvCU.png)


	MiniThink-1B is an experiment to reproduce the "Aha!" moment in AI.
	Is is trained using a modified version of the method used in the [Unsloth R1 training blog](https://unsloth.ai/blog/r1-reasoning) and the [notebook provided for training LLama 3.1 8B to learn R1 reasoning ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb).

	MiniThink is a fine-tuned version of the `unsloth/Llama-3.2-1B-Instruct` model.


	## Model Details

	- Base Model: `unsloth/Llama-3.2-1B-Instruct`
	- Training: Fine-tuned using progressive LoRA (ranks: 16 → 32 → 64) with Unsloth's optimization framework
	- Task: Mathematical and logical reasoning with explicit, step-by-step thought processes
	- Training Data: GSM8K dataset enhanced with think-aloud prompting
	- Input Format: Questions requiring detailed, structured reasoning
	- Output Format: A comprehensive thinking process enclosed in `<think>` tags, followed by the final answer

	## Dataset used

	The model was trained on a modified version of Openai's GSM8K dataset, which contains 8K math word problems with one-number answers.
	To improve the training results, the dataset was slightly modified to exclude comma or period-separated numbers.

	## System Prompt

	The model is trained with the following system prompt to guide its reasoning process:
	```
	# Define special tokens for thinking process
	THINK_START = "<think>"
	THINK_END = "</think>"

	SYSTEM_PROMPT = f"""Show your reasoning process using <think> tags, then provide your answer. For example:

	Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"

	{THINK_START}
	Let me solve this step by step:
	- Janet starts with 3 apples
	- She buys 2 more apples
	- I need to add: 3 + 2 = 5
	Wait, let me verify:
	- Initial apples: 3
	- Added apples: 2
	Yes, the total is 5 apples
	{THINK_END}

	5"""
	```


	## Usage

	The model expects a chat-like input and responds with a structured breakdown of its reasoning. For example:

	Input:

	Question: “Janet has 3 apples. She buys 2 more. How many apples does she have?”

	Output:
	```
	<think>
	Let me solve this step by step:
	- Janet starts with 3 apples
	- She buys 2 more apples
	- I need to add: 3 + 2 = 5
	Wait, let me verify:
	- Initial apples: 3
	- Added apples: 2
	Yes, the total is 5 apples
	</think>
	5
	```
	## Limitations

	- Being a 1B-parameter model, its performance is naturally more limited compared to larger models.
	- Optimized for mathematical and logical tasks; however, complex computations may occasionally yield errors.
	- Always verify critical outputs.

	## Training

	The model was trained using:
	- Progressive LoRA: Gradually increasing ranks from 16 to 32 and finally 64
	- Mixed Precision Training: Utilizing bf16 where supported for optimal performance
	- GRPO (Guided Reward Policy Optimization): Implemented via the Unsloth framework for guided training
	- Data: GSM8K dataset enriched with explicit think-aloud examples

	## License

	This model adheres to the licensing terms of the base Llama-3.2 1B model. Please refer to Meta's Llama-3.2 1B license for details on usage terms and conditions.

	## Framework

	Developed using the [Unsloth Framework](https://github.com/unslothai/unsloth), this model leverages advanced techniques like GRPO and progressive LoRA optimization for efficient training and fine-tuning of large language models.