Wladastic commited on
Commit
9b23760
·
verified ·
1 Parent(s): eb6aa72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -24
README.md CHANGED
@@ -1,26 +1,75 @@
1
- # MiniThink-1B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- MiniThink is a fine-tuned version of Llama-3 1B that specializes in showing its thinking process. The model was trained using the Unsloth optimization framework to demonstrate its reasoning through explicit thinking tags before providing answers.
4
 
5
  ## Model Details
6
 
7
- - **Base Model**: Meta's Llama-3 1B
8
- - **Training**: Fine-tuned using LoRA with progressive rank expansion (16 → 32 → 64) via Unsloth
9
- - **Task**: Mathematical reasoning with explicit thinking process
10
- - **Training Data**: GSM8K dataset with added think-aloud prompting
11
- - **Input Format**: Questions requiring mathematical or logical reasoning
12
- - **Output Format**: Responses with thinking process enclosed in `<think>` tags followed by the final answer
13
 
14
- ## Usage
15
 
16
- The model expects inputs in a chat format and will respond with its thinking process enclosed in `<think>` tags, followed by the final answer. For example:
 
17
 
18
- Input:
 
 
19
  ```
 
 
 
 
 
 
20
  Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ```
22
 
23
- Output:
 
 
 
 
 
 
 
 
 
24
  ```
25
  <think>
26
  Let me solve this step by step:
@@ -32,29 +81,26 @@ Wait, let me verify:
32
  - Added apples: 2
33
  Yes, the total is 5 apples
34
  </think>
35
-
36
  5
37
  ```
38
-
39
  ## Limitations
40
 
41
- - The model is based on a 1B parameter model, so its capabilities are more limited compared to larger models
42
- - Best suited for mathematical and logical reasoning tasks where step-by-step thinking is beneficial
43
- - May occasionally provide incorrect calculations despite showing its work
44
 
45
  ## Training
46
 
47
  The model was trained using:
48
- - Progressive LoRA ranks: 16 32 64
49
- - Mixed precision training (bf16 where supported)
50
- - GRPO (Guided Reward Policy Optimization) implemented via Unsloth
51
- - GSM8K dataset for mathematical reasoning with explicit think-aloud examples
52
- - Unsloth optimization framework for efficient training and fine-tuning
53
 
54
  ## License
55
 
56
- This model inherits its license from the base Llama-3 model. Please refer to Meta's Llama-3 license for usage terms and conditions.
57
 
58
  ## Framework
59
 
60
- This model was trained using the Unsloth framework (https://github.com/unslothai/unsloth), which provided optimizations for efficient training and fine-tuning of LLMs. The training process utilized Unsloth's implementation of GRPO (Guided Reward Policy Optimization) and its progressive LoRA capabilities.
 
1
+ ---
2
+ license: llama3.2
3
+ datasets:
4
+ - openai/gsm8k
5
+ language:
6
+ - en
7
+ base_model:
8
+ - unsloth/Llama-3.2-1B-Instruct
9
+ library_name: transformers
10
+ tags:
11
+ - llama
12
+ - think
13
+ ---
14
+
15
+ # MiniThink-1B-base
16
+
17
+ MiniThink-1B is an experiment to reproduce the "Aha!" moment in AI.
18
+ Is is trained using a modified version of the method used in the [Unsloth R1 training blog](https://unsloth.ai/blog/r1-reasoning) and the [notebook provided for training LLama 3.1 8B to learn R1 reasoning ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb).
19
+
20
+ MiniThink is a fine-tuned version of the `unsloth/Llama-3.2-1B-Instruct` model.
21
 
 
22
 
23
  ## Model Details
24
 
25
+ - **Base Model**: `unsloth/Llama-3.2-1B-Instruct`
26
+ - **Training**: Fine-tuned using progressive LoRA (ranks: 16 → 32 → 64) with Unsloth's optimization framework
27
+ - **Task**: Mathematical and logical reasoning with explicit, step-by-step thought processes
28
+ - **Training Data**: GSM8K dataset enhanced with think-aloud prompting
29
+ - **Input Format**: Questions requiring detailed, structured reasoning
30
+ - **Output Format**: A comprehensive thinking process enclosed in `<think>` tags, followed by the final answer
31
 
32
+ ## Dataset used
33
 
34
+ The model was trained on a modified version of Openai's GSM8K dataset, which contains 8K math word problems with one-number answers.
35
+ To improve the training results, the dataset was slightly modified to exclude comma or period-separated numbers.
36
 
37
+ ## System Prompt
38
+
39
+ The model is trained with the following system prompt to guide its reasoning process:
40
  ```
41
+ # Define special tokens for thinking process
42
+ THINK_START = "<think>"
43
+ THINK_END = "</think>"
44
+
45
+ SYSTEM_PROMPT = f"""Show your reasoning process using <think> tags, then provide your answer. For example:
46
+
47
  Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"
48
+
49
+ {THINK_START}
50
+ Let me solve this step by step:
51
+ - Janet starts with 3 apples
52
+ - She buys 2 more apples
53
+ - I need to add: 3 + 2 = 5
54
+ Wait, let me verify:
55
+ - Initial apples: 3
56
+ - Added apples: 2
57
+ Yes, the total is 5 apples
58
+ {THINK_END}
59
+
60
+ 5"""
61
  ```
62
 
63
+
64
+ ## Usage
65
+
66
+ The model expects a chat-like input and responds with a structured breakdown of its reasoning. For example:
67
+
68
+ **Input:**
69
+
70
+ Question: “Janet has 3 apples. She buys 2 more. How many apples does she have?”
71
+
72
+ **Output:**
73
  ```
74
  <think>
75
  Let me solve this step by step:
 
81
  - Added apples: 2
82
  Yes, the total is 5 apples
83
  </think>
 
84
  5
85
  ```
 
86
  ## Limitations
87
 
88
+ - Being a 1B-parameter model, its performance is naturally more limited compared to larger models.
89
+ - Optimized for mathematical and logical tasks; however, complex computations may occasionally yield errors.
90
+ - Always verify critical outputs.
91
 
92
  ## Training
93
 
94
  The model was trained using:
95
+ - **Progressive LoRA**: Gradually increasing ranks from 16 to 32 and finally 64
96
+ - **Mixed Precision Training**: Utilizing bf16 where supported for optimal performance
97
+ - **GRPO (Guided Reward Policy Optimization)**: Implemented via the Unsloth framework for guided training
98
+ - **Data**: GSM8K dataset enriched with explicit think-aloud examples
 
99
 
100
  ## License
101
 
102
+ This model adheres to the licensing terms of the base Llama-3 model. Please refer to Meta's Llama-3 license for details on usage terms and conditions.
103
 
104
  ## Framework
105
 
106
+ Developed using the [Unsloth Framework](https://github.com/unslothai/unsloth), this model leverages advanced techniques like GRPO and progressive LoRA optimization for efficient training and fine-tuning of large language models.