Ornstein-27B-v2
A reasoning-focused fine-tune of Qwen 3.5 27B, trained on a small, high-quality dataset curated through a custom Drift Diffusion Modeling (DDM) pipeline. Second iteration of the Ornstein series with improved segment-level reasoning modeling.
GGUF quantizations available at DJLougen/Ornstein-27B-v2-GGUF
Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
What Makes Ornstein Different
Most reasoning fine-tunes throw large volumes of synthetic data at a base model and hope for the best. Ornstein takes the opposite approach: every single training example passed through a multi-stage quality pipeline that measures whether a reasoning trace is actually reasoning or just generating tokens that look like reasoning.
The core insight is that language models frequently produce degenerate reasoning — long chains of text that superficially resemble deep thought (hedging, restating the problem, circling without progress) but carry little actual signal. The DDM pipeline detects and separates these from genuine premium reasoning traces, producing a training mix that teaches the model what good thinking actually looks like.
What's New in v2
Ornstein v2 builds on the original with improved segment-level reasoning modeling — the DDM pipeline now does more to model individual segments of the reasoning process, leading to finer-grained quality separation and better training signal.
Training Data at a Glance
- Drift Score Distribution: The DDM pipeline assigns each reasoning trace a drift score. Premium traces cluster low, degenerate traces cluster high, with a fitted threshold cleanly separating the two pools.
- Category Mix: Math-heavy dataset with code, science, and logic rounding it out.
- Reasoning Depth: Premium traces average substantially deeper thinking than degenerate traces, which tend to be shallow repetition that inflates token count without substance.
- Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.
DDM Curation Pipeline
Drift Diffusion Modeling works by decomposing each reasoning trace into uniform segments and tracking how "reasoning quality" evolves across the trace. Each segment is scored on multiple dimensions that capture whether the model is mimicking cognitive progress — things like introducing new ideas, self-correcting, verifying intermediate results, and exploring alternative approaches.
These per-segment scores are accumulated into a drift trajectory. Premium traces maintain healthy trajectories throughout. Degenerate traces accumulate deficit as the model loops, repeats itself, or pads without substance — and the drift score crosses a threshold fitted via ROC analysis.
Training Details
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen3.5-27B |
| Method | LoRA (rank 32, alpha 32) |
| Dropout | 0.05 |
| Epochs | 1 |
| Learning rate | 1e-4 (cosine schedule, 10% warmup) |
| Max sequence length | 8192 |
| Micro batch size | 1 |
| Gradient accumulation | 4 steps |
| Weight decay | 0.01 |
| LoRA targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Packing | Off |
| Training framework | Unsloth |
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "DJLougen/Ornstein-27B-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [{"role": "user", "content": "Your question here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=8192)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With Unsloth (Recommended for Inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="DJLougen/Ornstein-27B-v2",
max_seq_length=8192,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
Reasoning Format
Ornstein uses <think>...</think> blocks for extended reasoning:
<think>
Let me work through this step by step...
[multi-phase reasoning with self-correction and verification]
</think>
[Final answer]
Intended Use
Ornstein-27B-v2 is designed for tasks that benefit from structured, multi-step reasoning — math, logic, code analysis, scientific problems, and complex question answering. The DDM curation specifically optimizes for traces that exhibit genuine cognitive progress rather than verbose restating.
Limitations
- Single epoch training means the model retains most of the base Qwen 3.5 27B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
- The DDM pipeline optimizes for English reasoning traces; performance on other languages reflects the base model
- Extended thinking can still occasionally loop on adversarial or highly ambiguous prompts
License
Apache 2.0
Citation
If you use Ornstein-27B-v2 or the DDM curation methodology in your work:
@misc{ornstein27bv2,
author = {DJLougen},
title = {Ornstein-27B-v2: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 27B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/DJLougen/Ornstein-27B-v2}
}
- Downloads last month
- 137
