| | --- |
| | library_name: transformers |
| | tags: |
| | - trl |
| | - sft |
| | base_model: |
| | - HuggingFaceTB/SmolLM2-1.7B-Instruct |
| | datasets: |
| | - ngxson/MiniThinky-dataset |
| | --- |
| | |
| | # MiniThinky 1.7B (based on SmolLM2) |
| |
|
| | > [!IMPORTANT] |
| | > This checkpoint still have a high loss value, so the model will hallucinate the response quite a lot. |
| |
|
| | My first trial to fine tune a small model to add reasoning capability. |
| |
|
| | Chat template is the same with llama 3, but the response will be as follow: |
| |
|
| | ``` |
| | <|thinking|>{thinking_process} |
| | <|answer|> |
| | {real_answer} |
| | ``` |
| |
|
| | ## IMPORTANT: System message |
| |
|
| | The model is **very sensitive** to system message. Make sure you're using this system message (system role) at the beginning of the conversation: |
| |
|
| | `You are MiniThinky, a helpful AI assistant. You always think before giving the answer. Use <|thinking|> before thinking and <|answer|> before giving the answer.` |
| |
|
| | --- |
| |
|
| | TODO: include more info here + maybe do some benchmarks? (Plz add a discussion if you're interested) |
| |
|