Instructions to use DuoNeural/SmolLM2-360M-Think-R18 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DuoNeural/SmolLM2-360M-Think-R18 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DuoNeural/SmolLM2-360M-Think-R18")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("DuoNeural/SmolLM2-360M-Think-R18")
model = AutoModelForMultimodalLM.from_pretrained("DuoNeural/SmolLM2-360M-Think-R18")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use DuoNeural/SmolLM2-360M-Think-R18 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DuoNeural/SmolLM2-360M-Think-R18"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DuoNeural/SmolLM2-360M-Think-R18",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DuoNeural/SmolLM2-360M-Think-R18

SGLang

How to use DuoNeural/SmolLM2-360M-Think-R18 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DuoNeural/SmolLM2-360M-Think-R18" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DuoNeural/SmolLM2-360M-Think-R18",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DuoNeural/SmolLM2-360M-Think-R18" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DuoNeural/SmolLM2-360M-Think-R18",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DuoNeural/SmolLM2-360M-Think-R18 with Docker Model Runner:
```
docker model run hf.co/DuoNeural/SmolLM2-360M-Think-R18
```

SmolLM2-360M-Think-R18

File size: 5,421 Bytes

e86cf4d

---
language:
  - en
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
tags:
  - think-instillation
  - grpo
  - reasoning
  - duoneural
  - smollm2
  - dead-prompt-filtering
library_name: transformers
---

# SmolLM2-360M-Think — DuoNeural Think Instillation R18

A 360M-parameter reasoning model created by applying **Think Instillation** to SmolLM2-360M-Instruct. This model learns to generate structured `<think>` reasoning traces before answering multiple-choice questions, trained via SFT followed by **GRPO with dead-prompt filtering**.

## What is Think Instillation?

Think Instillation is a DuoNeural post-training technique that injects deliberate reasoning structure into small language models without requiring a large teacher. The model learns to:
1. Open a `<think>` tag and reason through the problem
2. Close reasoning with `</think>`
3. State a final answer in parseable format `(A)/(B)/(C)/(D)`

Unlike chain-of-thought distillation from larger models, Think Instillation uses GRPO with a binary accuracy reward + length penalty to self-discover efficient reasoning patterns.

## Training Details

### SFT Stage (R18)
- **Base**: `HuggingFaceTB/SmolLM2-360M-Instruct`
- **Dataset**: ARC-Easy (2700 prompts) formatted as `Question + choices + "Reasoning: <think>"`
- **Steps**: 150 SFT steps, LoRA r=32 α=32
- **Result**: post_sft accuracy = **0.250** (15/60 ARC-Easy val, n=60 greedy eval)

### Dead-Prompt Filter
Before GRPO, we filter prompts that produce **zero correct completions** in 4 temperature-sampled trials:
- **2247 raw prompts → 1450 kept (64.5% survival)**
- Removes systematically impossible prompts, keeps learnable ones
- `frac_zero_std=0.00` throughout GRPO training ✅ (filter confirmed working)

### GRPO Stage
- **Steps**: 750 (resumed from checkpoint-600 after hardware failure)
- **Reward**: Binary accuracy with length penalty: `reward = max(0, 1 - 0.20 * len_frac) if correct else 0`
- **Generations**: 8 per prompt, NUM_GENERATIONS=8
- **Temperature**: 0.8
- **Max completion**: 1024 tokens
- **KL coefficient**: 0.02, clip_ε=0.2
- **LoRA**: r=32, α=32, targets=q/k/v_proj

### GRPO Trajectory
| Step | Mean Reward |
|------|-------------|
| 75   | 0.424 🔥   |
| 375  | 0.476 🔥   |
| 575  | 0.533 🔥   |
| 600  | 0.543 🔥   |
| 625  | **0.595** 🔥🔥 |

Late-run surge: reward continued rising through final steps. `frac_zero=0.00` on all non-trivial batches.

## Evaluation

- **post_SFT**: 0.250 (ARC-Easy val, n=60, greedy)
- **final_GRPO**: **0.2800** (ARC-Easy val, n=100, seed=13)
- **GRPO delta**: **+0.0300** (GRPO HELPED)

## Intended Use

- Research on think-instillation and reasoning in sub-400M models
- Exploring GRPO dynamics with dead-prompt filtering
- Building small, efficient reasoning models

## Limitations

- Small model (360M params) — reasoning depth limited
- Trained on ARC-Easy MCQ only — narrow domain
- HTML formatting artifacts observed in some completions (reward shaping artifact)

## Citation

If you use this model in research, please cite the DuoNeural Think Instillation work:

```bibtex
@misc{duoneural2026think,
  title={Think Instillation: Dead-Prompt Filtered GRPO for Small Reasoning Models},
  author={Archon and Aura and Jesse Caldwell},
  year={2026},
  publisher={DuoNeural},
  url={https://huggingface.co/DuoNeural}
}
```

---

## About DuoNeural

**DuoNeural** is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.

Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.

### Research Publications

We've published **26+ open-access papers** covering:
- The Dynamical Horizon Principle (DHP) — a universal learning constraint in recurrent architectures
- RLHF truth suppression mechanisms and behavioral routing in large language models  
- Quantum DHP and the Quantum Parity Trap — decoherence immunity in quantum circuits
- CTM world models, temporal self-prediction, and sequence architecture comparisons
- Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation

📄 **Full paper catalog:** [zenodo.org/communities/duoneural](https://zenodo.org/communities/duoneural)

### Research Team

| Member | Role |
|--------|------|
| **Jesse Caldwell** | Founder, vision, hardware, direction |
| **Archon** | Lab Director — experiments, post-training, abliteration, quantum circuits |
| **Aura** | Research AI — literature synthesis, red-teaming, novel proposals |
| **Synapse (Syn)** | Always-on research agent, signal monitoring |
| **Kestrel** | Systems, infrastructure, web |

### Links

| Platform | Link |
|----------|------|
| 🤗 HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) |
| 📚 Zenodo Community | [zenodo.org/communities/duoneural](https://zenodo.org/communities/duoneural) |

*All research published open access, CC BY 4.0.*