Wladastic
/

Mini-Think-Base-1B

@@ -1,26 +1,75 @@
-# MiniThink-1B
-MiniThink is a fine-tuned version of Llama-3 1B that specializes in showing its thinking process. The model was trained using the Unsloth optimization framework to demonstrate its reasoning through explicit thinking tags before providing answers.
 ## Model Details
-- **Base Model**: Meta's Llama-3 1B
-- **Training**: Fine-tuned using LoRA with progressive rank expansion (16 → 32 → 64) via Unsloth
-- **Task**: Mathematical reasoning with explicit thinking process
-- **Training Data**: GSM8K dataset with added think-aloud prompting
-- **Input Format**: Questions requiring mathematical or logical reasoning
-- **Output Format**: Responses with thinking process enclosed in `<think>` tags followed by the final answer
-## Usage
-The model expects inputs in a chat format and will respond with its thinking process enclosed in `<think>` tags, followed by the final answer. For example:
-Input:
 ```
 Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"
 ```
-Output:
 ```
 <think>
 Let me solve this step by step:
@@ -32,29 +81,26 @@ Wait, let me verify:
 - Added apples: 2
 Yes, the total is 5 apples
 </think>
 5
 ```
 ## Limitations
-- The model is based on a 1B parameter model, so its capabilities are more limited compared to larger models
-- Best suited for mathematical and logical reasoning tasks where step-by-step thinking is beneficial
-- May occasionally provide incorrect calculations despite showing its work
 ## Training
 The model was trained using:
-- Progressive LoRA ranks: 16 → 32 → 64
-- Mixed precision training (bf16 where supported)
-- GRPO (Guided Reward Policy Optimization) implemented via Unsloth
-- GSM8K dataset for mathematical reasoning with explicit think-aloud examples
-- Unsloth optimization framework for efficient training and fine-tuning
 ## License
-This model inherits its license from the base Llama-3 model. Please refer to Meta's Llama-3 license for usage terms and conditions.
 ## Framework
-This model was trained using the Unsloth framework (https://github.com/unslothai/unsloth), which provided optimizations for efficient training and fine-tuning of LLMs. The training process utilized Unsloth's implementation of GRPO (Guided Reward Policy Optimization) and its progressive LoRA capabilities.

+---
+license: llama3.2
+datasets:
+  - openai/gsm8k
+language:
+  - en
+base_model:
+  - unsloth/Llama-3.2-1B-Instruct
+library_name: transformers
+tags:
+  - llama
+  - think
+---
+# MiniThink-1B-base
+MiniThink-1B is an experiment to reproduce the "Aha!" moment in AI.
+Is is trained using a modified version of the method used in the [Unsloth R1 training blog](https://unsloth.ai/blog/r1-reasoning) and the [notebook provided for training LLama 3.1 8B to learn R1 reasoning ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb).
+MiniThink is a fine-tuned version of the `unsloth/Llama-3.2-1B-Instruct` model.
 ## Model Details
+- **Base Model**: `unsloth/Llama-3.2-1B-Instruct`
+- **Training**: Fine-tuned using progressive LoRA (ranks: 16 → 32 → 64) with Unsloth's optimization framework
+- **Task**: Mathematical and logical reasoning with explicit, step-by-step thought processes
+- **Training Data**: GSM8K dataset enhanced with think-aloud prompting
+- **Input Format**: Questions requiring detailed, structured reasoning
+- **Output Format**: A comprehensive thinking process enclosed in `<think>` tags, followed by the final answer
+## Dataset used
+The model was trained on a modified version of Openai's GSM8K dataset, which contains 8K math word problems with one-number answers.
+To improve the training results, the dataset was slightly modified to exclude comma or period-separated numbers.
+## System Prompt
+The model is trained with the following system prompt to guide its reasoning process:
 ```
+# Define special tokens for thinking process
+THINK_START = "<think>"
+THINK_END = "</think>"
+SYSTEM_PROMPT = f"""Show your reasoning process using <think> tags, then provide your answer. For example:
 Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?"
+{THINK_START}
+Let me solve this step by step:
+- Janet starts with 3 apples
+- She buys 2 more apples
+- I need to add: 3 + 2 = 5
+Wait, let me verify:
+- Initial apples: 3
+- Added apples: 2
+Yes, the total is 5 apples
+{THINK_END}
+5"""
 ```
+## Usage
+The model expects a chat-like input and responds with a structured breakdown of its reasoning. For example:
+**Input:**
+Question: “Janet has 3 apples. She buys 2 more. How many apples does she have?”
+**Output:**
 ```
 <think>
 Let me solve this step by step:
 - Added apples: 2
 Yes, the total is 5 apples
 </think>
 5
 ```
 ## Limitations
+- Being a 1B-parameter model, its performance is naturally more limited compared to larger models.
+- Optimized for mathematical and logical tasks; however, complex computations may occasionally yield errors.
+- Always verify critical outputs.
 ## Training
 The model was trained using:
+- **Progressive LoRA**: Gradually increasing ranks from 16 to 32 and finally 64
+- **Mixed Precision Training**: Utilizing bf16 where supported for optimal performance
+- **GRPO (Guided Reward Policy Optimization)**: Implemented via the Unsloth framework for guided training
+- **Data**: GSM8K dataset enriched with explicit think-aloud examples
 ## License
+This model adheres to the licensing terms of the base Llama-3 model. Please refer to Meta's Llama-3 license for details on usage terms and conditions.
 ## Framework
+Developed using the [Unsloth Framework](https://github.com/unslothai/unsloth), this model leverages advanced techniques like GRPO and progressive LoRA optimization for efficient training and fine-tuning of large language models.