|
|
--- |
|
|
license: llama3.2 |
|
|
datasets: |
|
|
- openai/gsm8k |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- unsloth/Llama-3.2-1B-Instruct |
|
|
library_name: transformers |
|
|
tags: |
|
|
- llama |
|
|
- think |
|
|
--- |
|
|
# MiniThink-1B-base |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
MiniThink-1B is an experiment to reproduce the "Aha!" moment in AI. |
|
|
Is is trained using a modified version of the method used in the [Unsloth R1 training blog](https://unsloth.ai/blog/r1-reasoning) and the [notebook provided for training LLama 3.1 8B to learn R1 reasoning ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb). |
|
|
|
|
|
MiniThink is a fine-tuned version of the `unsloth/Llama-3.2-1B-Instruct` model. |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: `unsloth/Llama-3.2-1B-Instruct` |
|
|
- **Training**: Fine-tuned using progressive LoRA (ranks: 16 → 32 → 64) with Unsloth's optimization framework |
|
|
- **Task**: Mathematical and logical reasoning with explicit, step-by-step thought processes |
|
|
- **Training Data**: GSM8K dataset enhanced with think-aloud prompting |
|
|
- **Input Format**: Questions requiring detailed, structured reasoning |
|
|
- **Output Format**: A comprehensive thinking process enclosed in `<think>` tags, followed by the final answer |
|
|
|
|
|
## Dataset used |
|
|
|
|
|
The model was trained on a modified version of Openai's GSM8K dataset, which contains 8K math word problems with one-number answers. |
|
|
To improve the training results, the dataset was slightly modified to exclude comma or period-separated numbers. |
|
|
|
|
|
## System Prompt |
|
|
|
|
|
The model is trained with the following system prompt to guide its reasoning process: |
|
|
``` |
|
|
# Define special tokens for thinking process |
|
|
THINK_START = "<think>" |
|
|
THINK_END = "</think>" |
|
|
|
|
|
SYSTEM_PROMPT = f"""Show your reasoning process using <think> tags, then provide your answer. For example: |
|
|
|
|
|
Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?" |
|
|
|
|
|
{THINK_START} |
|
|
Let me solve this step by step: |
|
|
- Janet starts with 3 apples |
|
|
- She buys 2 more apples |
|
|
- I need to add: 3 + 2 = 5 |
|
|
Wait, let me verify: |
|
|
- Initial apples: 3 |
|
|
- Added apples: 2 |
|
|
Yes, the total is 5 apples |
|
|
{THINK_END} |
|
|
|
|
|
5""" |
|
|
``` |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
The model expects a chat-like input and responds with a structured breakdown of its reasoning. For example: |
|
|
|
|
|
**Input:** |
|
|
|
|
|
Question: “Janet has 3 apples. She buys 2 more. How many apples does she have?” |
|
|
|
|
|
**Output:** |
|
|
``` |
|
|
<think> |
|
|
Let me solve this step by step: |
|
|
- Janet starts with 3 apples |
|
|
- She buys 2 more apples |
|
|
- I need to add: 3 + 2 = 5 |
|
|
Wait, let me verify: |
|
|
- Initial apples: 3 |
|
|
- Added apples: 2 |
|
|
Yes, the total is 5 apples |
|
|
</think> |
|
|
5 |
|
|
``` |
|
|
## Limitations |
|
|
|
|
|
- Being a 1B-parameter model, its performance is naturally more limited compared to larger models. |
|
|
- Optimized for mathematical and logical tasks; however, complex computations may occasionally yield errors. |
|
|
- Always verify critical outputs. |
|
|
|
|
|
## Training |
|
|
|
|
|
The model was trained using: |
|
|
- **Progressive LoRA**: Gradually increasing ranks from 16 to 32 and finally 64 |
|
|
- **Mixed Precision Training**: Utilizing bf16 where supported for optimal performance |
|
|
- **GRPO (Guided Reward Policy Optimization)**: Implemented via the Unsloth framework for guided training |
|
|
- **Data**: GSM8K dataset enriched with explicit think-aloud examples |
|
|
|
|
|
## License |
|
|
|
|
|
This model adheres to the licensing terms of the base Llama-3.2 1B model. Please refer to Meta's Llama-3.2 1B license for details on usage terms and conditions. |
|
|
|
|
|
## Framework |
|
|
|
|
|
Developed using the [Unsloth Framework](https://github.com/unslothai/unsloth), this model leverages advanced techniques like GRPO and progressive LoRA optimization for efficient training and fine-tuning of large language models. |
|
|
|