language:
- en
license: apache-2.0
tags:
- causal-lm
- reasoning
- thought-experiments
- chain-of-thought
- sft
- dpo
- alignment
- small-language-model
- custom-architecture
base_model: tensorfiend/DotLM-165M
datasets:
- tensorfiend/SimpleThoughts
pipeline_tag: text-generation
library_name: transformers
DotLM
DotLM is a minimal 165M parameter model, from-scratch transformer trained entirely on the
SimpleThoughts dataset. It uses explicit <think>...</think>
chain-of-thought traces to reason through intuitive physics, logic, causal inference, and other everyday phenomena before producing an
answer.
Model Details
Architecture
| Parameter | Value |
|---|---|
| Parameters | ~165M |
| Layers | 24 |
| Model dimension | 768 |
| FFN hidden dim | 2048 (SwiGLU) |
| Attention heads | 6 |
| KV heads (GQA) | 2 |
| Head dimension | 128 |
| Context length | 4096 tokens |
| Vocabulary size | 16,384 (BPE) |
| Positional encoding | RoPE (θ = 10,000) |
| Normalization | RMSNorm (ε = 1e-6) |
| Tied embeddings | Yes |
Key design choices: Grouped-Query Attention (GQA) with 3:1 head ratio for efficient KV memory, SwiGLU activations, pre-norm architecture, and bf16 mixed-precision training throughout.
Training Pipeline
The model was trained sequentially across four stages using the DotLM framework:
| Stage | Dataset | Samples | Objective |
|---|---|---|---|
| Pretraining | SimpleThoughts/pretrain | 352,214 | Next-token prediction |
| SFT | SimpleThoughts/sft | 25,788 | ChatML instruction following |
| Alignment | SimpleThoughts/alignment | 7,172 | Reference-free DPO (SimPO-style) |
| Reasoning | SimpleThoughts/reasoning | 6,300 | Chain-of-thought with <think> traces |
Special Tokens
| Token | Purpose |
|---|---|
<|im_start|> |
Start of turn (BOS) |
<|im_end|> |
End of turn |
<think> |
Begin reasoning trace |
</think> |
End reasoning trace |
<endoftext> |
End of sequence (EOS) |
<pad> |
Padding |
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "tensorfiend/DotLM-165M"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
).to(device)
user_query = "If a ball is placed inside a box and the box is sealed, where is the ball?"
prompt = f"<|im_start|>user\n{user_query}<|im_end|>\n<|im_start|>assistant\n<think>"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_k=50,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Prompt Format
DotLM uses the ChatML format with an explicit reasoning prefix:
<|im_start|>user
{your question}<|im_end|>
<|im_start|>assistant
<think>
{model reasons here}
</think>
{final answer}
Performance & Limitations
- Scale: At 165M parameters, DotLM is a research-scale model. It is not competitive with large-scale LLMs on general benchmarks.
- Domain: The model is specialized on thought experiments — intuitive physics, causal reasoning, spatial reasoning, theory of mind, and related domains. It may underperform on unrelated topics.
- Reasoning quality: The chain-of-thought traces are coherent on in-distribution thought experiments but may hallucinate or ramble on out-of-distribution inputs.
- Context: Maximum context length is 4,096 tokens.
- Safety: No RLHF safety training was applied. Not suitable for deployment in user-facing products without additional safety measures.
Training Details
Checkout the blog for training details: DotLM - An end-to-end trained 165M model (coming soon)
Related Resources
- Dataset: SimpleThoughts
- Training code: DotLM (coming soon)
Citation
@misc{dotlm2026, author = {Shanmukh}, title = {DotLM-165M: A Minimal Reasoning Language Model Trained on Thought Experiments}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/tensorfiend/DotLM-165M} }