| | --- |
| | base_model: unsloth/magistral-small-2506 |
| | tags: |
| | - text-generation-inference |
| | - transformers |
| | - unsloth |
| | - mistral |
| | license: apache-2.0 |
| | language: |
| | - en |
| | library_name: transformers |
| | --- |
| | |
| | ### highly experimental model , might not work as expected |
| | # π§ Daemontatox/mini-overthinker |
| |
|
| | **A highly experimental attempt to fine-tune [Magistral (Mistral)](https://huggingface.co/unsloth/magistral-small-2506) for enhanced staged reasoning with self-reflective thinking patterns.** |
| |
|
| | --- |
| |
|
| | ## π Summary |
| |
|
| | * **Base Model**: [`unsloth/magistral-small-2506`](https://huggingface.co/unsloth/magistral-small-2506) |
| | * **Fine-tuned by**: `Daemontatox` |
| | * **Model Name**: `Daemontatox/mini-overthinker` |
| | * **License**: Apache 2.0 |
| | * **Language**: English |
| | * **Status**: π¬ Experimental β *Not intended for production use.* |
| |
|
| | --- |
| |
|
| | ## β οΈ Disclaimer |
| |
|
| | > This model is **not designed for production**. It is an **experimental prototype** to explore cognitive-loop-style reasoning with reflection. It may behave unpredictably, hallucinate, or fail to follow standard instruction formats. Use only for research and prototyping. |
| |
|
| | --- |
| |
|
| | ## π§ Motivation |
| |
|
| | This model was fine-tuned to: |
| |
|
| | * Think in **staged batches**. |
| | * Insert **intermediate reasoning steps**. |
| | * Pause to **self-reflect** on its own outputs. |
| | * Encourage **Theory-of-Mind-like behavior** via structured thinking templates. |
| |
|
| | Inspired by the *SUPERTHINKER* design used in [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER), this model attempts a similar multi-phase thought process in a lightweight setup. |
| |
|
| | > **Special thanks** to the creators of [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER) for the dataset structure and inspiration behind this staged reasoning approach. |
| |
|
| | --- |
| |
|
| | ## π§ͺ Example Prompt Structure |
| |
|
| | ```text |
| | Q: What are the downsides of AI regulation? |
| | |
| | Think Step 1: |
| | <|THINK|> Regulation might slow innovation. It could also centralize power in large companies. |
| | |
| | Answer Attempt 1: |
| | <|ANSWER|> Slower innovation and reduced competition. |
| | |
| | Reflection: |
| | <|REFLECT|> The points are valid, but lack mention of potential misalignment with global norms. |
| | |
| | Final Answer: |
| | <|FINAL|> The main downsides are slower innovation, centralized control, and difficulty in harmonizing global frameworks. |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π§ Inference Code (Transformers) |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer |
| | import torch |
| | |
| | model_id = "Daemontatox/mini-overthinker" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") |
| | |
| | streamer = TextStreamer(tokenizer) |
| | |
| | prompt = """Q: What is intelligence? |
| | |
| | Think Step 1: |
| | <|THINK|> Intelligence involves pattern recognition, abstraction, and reasoning. |
| | |
| | Answer Attempt 1: |
| | <|ANSWER|> The ability to reason, learn, and adapt. |
| | |
| | Reflection: |
| | <|REFLECT|> Lacks mention of creativity and problem-solving aspects. |
| | |
| | Final Answer: |
| | <|FINAL|> Intelligence is the ability to reason, learn, adapt, and solve problems creatively. |
| | """ |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| | outputs = model.generate(**inputs, max_new_tokens=200, streamer=streamer) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π« Limitations |
| |
|
| | * Requires **explicit token triggers** (`<|THINK|>`, `<|REFLECT|>`, etc.) |
| | * May **hallucinate** or get stuck in loops. |
| | * Behavior can degrade in **zero-shot** usage. |
| | * Not benchmarked, **no alignment or safety tuning** applied. |
| |
|
| | --- |
| |
|
| | ## β
Intended For |
| |
|
| | * Research in **cognitive loops** |
| | * LLM **agent architecture prototyping** |
| | * Simulating **multi-phase reasoning** |
| |
|
| | --- |
| |
|
| | ## β Not Recommended For |
| |
|
| | * Real-world deployment |
| | * Safety-critical tasks |
| | * Answer quality evaluation without verification |
| |
|
| | --- |
| |
|
| | ## π Citation |
| |
|
| | ``` |
| | @misc{mini-overthinker2025, |
| | author = {Daemontatox}, |
| | title = {Mini-Overthinker: Experimental Staged Reasoning Model}, |
| | year = {2025}, |
| | howpublished = {\url{https://huggingface.co/Daemontatox/mini-overthinker}}, |
| | note = {Fine-tuned from unsloth/magistral-small-2506 using ideas from HelpingAI/Dhanishtha-2.0-SUPERTHINKER} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|