MIST-Mini-8B-Thinking

MIST-Mini-8B-Thinking is the reasoning version of MIST-Mini-8B by olaverse. Trained with 4 phases of GRPO (Group Relative Policy Optimization) reinforcement learning to show its reasoning process before answering.

MIST Model Family

Model Params Type Speed Status
MIST-1-8B 8B General ~63 tok/s
MIST-Mini-8B-Thinking 8B Reasoning ~55 tok/s
MIST-1-70B 70B General ~23 tok/s
MIST-1-140B 140B General ~8 tok/s

What Makes This Different

MIST-Mini-8B (base): User: What is 15% of 280? Model: 42 MIST-Mini-8B-Thinking: User: What is 15% of 280? Model: 15% means 15/100 280 × 15 = 4200 4200 / 100 = 42 The answer is 42.

Training Details

Trained with 4 phases of GRPO reinforcement learning:

Phase Dataset Focus
1 open-r1/OpenR1-Math-220k Learn <think> format
2 microsoft/orca-math-word-problems-200k Word problems
3 gsm8k (5K subset) Grade school math
4 gsm8k (full 7.4K) Solidify + merge

Reward Functions Used

reward_think_format: +0.5 for using tags reward_correctness: +1.0 for correct answer reward_reasoning_steps: +0.3 for structured steps

Training Progress

Phase Correctness Total Reward
Phase 1 -0.35 -0.99
Phase 2 -1.0 -0.74
Phase 3 -1.0 -0.65
Phase 4 +0.95 +1.29

Key Strengths

  • 🧠 Transparent Reasoning — shows thinking before answering
  • 📐 Strong Math — 95% accuracy on GSM8K after training
  • 🔍 Trustworthy — you can verify the reasoning
  • Fast — 8B model, runs on consumer GPUs
  • 🔓 Unrestricted — follows all instructions

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "olaverse/MIST-Mini-8B-Thinking",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("olaverse/MIST-Mini-8B-Thinking")

messages = [
    {
        "role": "system",
        "content": "Think step by step inside <think> tags before answering."
    },
    {
        "role": "user", 
        "content": "If a train travels 120 miles in 2 hours, what is its speed?"
    }
]

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

4-bit Quantized (fits on 6GB GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type='nf4'
)
model = AutoModelForCausalLM.from_pretrained(
    "olaverse/MIST-Mini-8B-Thinking",
    quantization_config=quantization_config,
    device_map="auto",
)

Hardware Requirements

Precision VRAM Size
bfloat16 16GB 15GB
4-bit (NF4) 6GB ~4GB

Recommended Generation Settings

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    min_p=0.05,
    repetition_penalty=1.5,
    eos_token_id=[128040, 128009, 128001],
    pad_token_id=128001,
)

Notes

  • Temperature 0.6 (lower than base model) gives more consistent reasoning
  • <think> and </think> are plain text tokens, not special tokens — the model learned them through GRPO training
  • Always include the system prompt instruction to use <think> tags for reliable reasoning behaviour

Stop Tokens

Same as MIST-1-8B — ChatML tokens survived the merge:

Token ID
<|im_end|> 128040
<|eot_id|> 128009
<|end_of_text|> 128001

License

Llama 3.1 Community License

Downloads last month
160
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with olaverse/MIST-Mini-8B-Thinking.

Model tree for olaverse/MIST-Mini-8B-Thinking

Finetuned
(1)
this model
Quantizations
2 models