Update README.md
Browse files
README.md
CHANGED
|
@@ -1,26 +1,75 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
MiniThink is a fine-tuned version of Llama-3 1B that specializes in showing its thinking process. The model was trained using the Unsloth optimization framework to demonstrate its reasoning through explicit thinking tags before providing answers.
|
| 4 |
|
| 5 |
## Model Details
|
| 6 |
|
| 7 |
-
- **Base Model**:
|
| 8 |
-
- **Training**: Fine-tuned using
|
| 9 |
-
- **Task**: Mathematical reasoning with explicit
|
| 10 |
-
- **Training Data**: GSM8K dataset with
|
| 11 |
-
- **Input Format**: Questions requiring
|
| 12 |
-
- **Output Format**:
|
| 13 |
|
| 14 |
-
##
|
| 15 |
|
| 16 |
-
The model
|
|
|
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
| 19 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
```
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
```
|
| 25 |
<think>
|
| 26 |
Let me solve this step by step:
|
|
@@ -32,29 +81,26 @@ Wait, let me verify:
|
|
| 32 |
- Added apples: 2
|
| 33 |
Yes, the total is 5 apples
|
| 34 |
</think>
|
| 35 |
-
|
| 36 |
5
|
| 37 |
```
|
| 38 |
-
|
| 39 |
## Limitations
|
| 40 |
|
| 41 |
-
-
|
| 42 |
-
-
|
| 43 |
-
-
|
| 44 |
|
| 45 |
## Training
|
| 46 |
|
| 47 |
The model was trained using:
|
| 48 |
-
- Progressive LoRA ranks
|
| 49 |
-
- Mixed
|
| 50 |
-
- GRPO (Guided Reward Policy Optimization)
|
| 51 |
-
- GSM8K dataset
|
| 52 |
-
- Unsloth optimization framework for efficient training and fine-tuning
|
| 53 |
|
| 54 |
## License
|
| 55 |
|
| 56 |
-
This model
|
| 57 |
|
| 58 |
## Framework
|
| 59 |
|
| 60 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: llama3.2
|
| 3 |
+
datasets:
|
| 4 |
+
- openai/gsm8k
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
base_model:
|
| 8 |
+
- unsloth/Llama-3.2-1B-Instruct
|
| 9 |
+
library_name: transformers
|
| 10 |
+
tags:
|
| 11 |
+
- llama
|
| 12 |
+
- think
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# MiniThink-1B-base
|
| 16 |
+
|
| 17 |
+
MiniThink-1B is an experiment to reproduce the "Aha!" moment in AI.
|
| 18 |
+
Is is trained using a modified version of the method used in the [Unsloth R1 training blog](https://unsloth.ai/blog/r1-reasoning) and the [notebook provided for training LLama 3.1 8B to learn R1 reasoning ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb).
|
| 19 |
+
|
| 20 |
+
MiniThink is a fine-tuned version of the `unsloth/Llama-3.2-1B-Instruct` model.
|
| 21 |
|
|
|
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
|
| 25 |
+
- **Base Model**: `unsloth/Llama-3.2-1B-Instruct`
|
| 26 |
+
- **Training**: Fine-tuned using progressive LoRA (ranks: 16 → 32 → 64) with Unsloth's optimization framework
|
| 27 |
+
- **Task**: Mathematical and logical reasoning with explicit, step-by-step thought processes
|
| 28 |
+
- **Training Data**: GSM8K dataset enhanced with think-aloud prompting
|
| 29 |
+
- **Input Format**: Questions requiring detailed, structured reasoning
|
| 30 |
+
- **Output Format**: A comprehensive thinking process enclosed in `<think>` tags, followed by the final answer
|
| 31 |
|
| 32 |
+
## Dataset used
|
| 33 |
|
| 34 |
+
The model was trained on a modified version of Openai's GSM8K dataset, which contains 8K math word problems with one-number answers.
|
| 35 |
+
To improve the training results, the dataset was slightly modified to exclude comma or period-separated numbers.
|
| 36 |
|
| 37 |
+
## System Prompt
|
| 38 |
+
|
| 39 |
+
The model is trained with the following system prompt to guide its reasoning process:
|
| 40 |
```
|
| 41 |
+
# Define special tokens for thinking process
|
| 42 |
+
THINK_START = "<think>"
|
| 43 |
+
THINK_END = "</think>"
|
| 44 |
+
|
| 45 |
+
SYSTEM_PROMPT = f"""Show your reasoning process using <think> tags, then provide your answer. For example:
|
| 46 |
+
|
| 47 |
Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"
|
| 48 |
+
|
| 49 |
+
{THINK_START}
|
| 50 |
+
Let me solve this step by step:
|
| 51 |
+
- Janet starts with 3 apples
|
| 52 |
+
- She buys 2 more apples
|
| 53 |
+
- I need to add: 3 + 2 = 5
|
| 54 |
+
Wait, let me verify:
|
| 55 |
+
- Initial apples: 3
|
| 56 |
+
- Added apples: 2
|
| 57 |
+
Yes, the total is 5 apples
|
| 58 |
+
{THINK_END}
|
| 59 |
+
|
| 60 |
+
5"""
|
| 61 |
```
|
| 62 |
|
| 63 |
+
|
| 64 |
+
## Usage
|
| 65 |
+
|
| 66 |
+
The model expects a chat-like input and responds with a structured breakdown of its reasoning. For example:
|
| 67 |
+
|
| 68 |
+
**Input:**
|
| 69 |
+
|
| 70 |
+
Question: “Janet has 3 apples. She buys 2 more. How many apples does she have?”
|
| 71 |
+
|
| 72 |
+
**Output:**
|
| 73 |
```
|
| 74 |
<think>
|
| 75 |
Let me solve this step by step:
|
|
|
|
| 81 |
- Added apples: 2
|
| 82 |
Yes, the total is 5 apples
|
| 83 |
</think>
|
|
|
|
| 84 |
5
|
| 85 |
```
|
|
|
|
| 86 |
## Limitations
|
| 87 |
|
| 88 |
+
- Being a 1B-parameter model, its performance is naturally more limited compared to larger models.
|
| 89 |
+
- Optimized for mathematical and logical tasks; however, complex computations may occasionally yield errors.
|
| 90 |
+
- Always verify critical outputs.
|
| 91 |
|
| 92 |
## Training
|
| 93 |
|
| 94 |
The model was trained using:
|
| 95 |
+
- **Progressive LoRA**: Gradually increasing ranks from 16 to 32 and finally 64
|
| 96 |
+
- **Mixed Precision Training**: Utilizing bf16 where supported for optimal performance
|
| 97 |
+
- **GRPO (Guided Reward Policy Optimization)**: Implemented via the Unsloth framework for guided training
|
| 98 |
+
- **Data**: GSM8K dataset enriched with explicit think-aloud examples
|
|
|
|
| 99 |
|
| 100 |
## License
|
| 101 |
|
| 102 |
+
This model adheres to the licensing terms of the base Llama-3 model. Please refer to Meta's Llama-3 license for details on usage terms and conditions.
|
| 103 |
|
| 104 |
## Framework
|
| 105 |
|
| 106 |
+
Developed using the [Unsloth Framework](https://github.com/unslothai/unsloth), this model leverages advanced techniques like GRPO and progressive LoRA optimization for efficient training and fine-tuning of large language models.
|