| --- |
| license: apache-2.0 |
| base_model: |
| - mistralai/Mistral-Small-24B-Base-2501 |
| - arcee-ai/Arcee-Blitz |
| datasets: |
| - open-thoughts/OpenThoughts-114k |
| - Undi95/R1-RP-ShareGPT3 |
| --- |
| <p align="left"> |
| <img width="45%" src="V3.png"> |
| </p> |
|
|
|
|
| # Introduction |
| This model is my feeble attempt at reproducing the R1 at home experience, following the (un)official Deepseek formula: \ |
| `Deepseek V3 + Unhinged Reasoning = R1` \ |
| Substituting out some variables we get: \ |
| `Arcee Blitz V3 Distill + Unhinged R1 Reasoning Traces = MistralSmallV3R` \ |
| \ |
| I'm quite happy with how this model turned out; It's definitely weaker than QwQ for coding and long reasoning problems but a lot more personable and well rounded overall. |
| The faster speed and VRAM savings are quite nice as well for how long these models can reason and how much context they can burn through. |
| Stability is satisfactory too and the model appears to function well even without super aggressive sampling like the base instruct finetune. |
|
|
| ## Use Cases |
| Just like the base model, MistralSmallV3R is a solid `all rounder` and should be able to effectively generalize its reasoning capabilities, |
| as care was taken to train this model on |
| a wide variety of reasoning tasks, including math, coding, roleplay, etc. |
| In particular, MistralSmallV3R appears very good at `contextual and emotional reasoning` |
| compared to other reasoning models that I have tested so far. MistralSmallV3R also tends to spend good portion of its thought considering what the user wants, |
| hopefully giving this model higher `resistance to poor prompting`. |
| Compared to other mid-range reasoning models such as QwQ it |
| also has markedly `better prose quality` without sounding like too much of a try hard like some of the Gutenberg models. |
| This model has also inherited a notable negative bias from the R1 synthetic data it was trained on. \ |
| Supports up to 32k context. \ |
| Should remain usable with only `12GB of VRAM` when quanted to IQ3_M or IQ3_S. |
|
|
| ## Recommended Settings: |
| - Template: Mistral Tekken V7 |
| - Temp: 0.1-0.7, depending on use case. |
| - TopP: 0.95 |
| - MinP: 0.05 |
| - Rep Pen: Not needed, but if you do use it keep the range low! |
| - Do not use DRY! |
| - System prompt: Trained on Sillytavern defaults. Add `Reason out how you will respond between the <think> and </think> tags.` to the end of your system prompt. Optionally you may wish to prompt the model to have positive bias. |
| - Add the `<think>` tag to the beginning of the response. |
| - Enable reasoning auto-parse, and remove the newlines from the reasoning formatting prefix and suffix. |
| - Low temps allow for longer reasoning |
| - Q4_K_M is reccomended as the smallest effectively lossless quant. |
|
|
| ## Benchmarks |
| TODO |
|
|
| ## Dataset |
| Was trained on an interleaved dataset of 40% LimaRP-R1 and 60% Openthoughts. |
|
|
| ## Thanks |
| Major thanks to Arcee AI for their excellent Arcee Blitz finetune and general mergekit nonsense. \ |
| Undi95 for proving the viability of a stable reasoning finetune of MS3. \ |
| OpenThoughts for their dataset of the same name. \ |
| Mistral, for their continuing dedication to the open source community. |