DotLM-165M / README.md
tensorfiend's picture
Update README.md
9b021de verified
metadata
language:
  - en
license: apache-2.0
tags:
  - causal-lm
  - reasoning
  - thought-experiments
  - chain-of-thought
  - sft
  - dpo
  - alignment
  - small-language-model
  - custom-architecture
base_model: tensorfiend/DotLM-165M
datasets:
  - tensorfiend/SimpleThoughts
pipeline_tag: text-generation
library_name: transformers

DotLM

DotLM is a minimal 165M parameter model, from-scratch transformer trained entirely on the SimpleThoughts dataset. It uses explicit <think>...</think> chain-of-thought traces to reason through intuitive physics, logic, causal inference, and other everyday phenomena before producing an answer.

Model Details

Architecture

Parameter Value
Parameters ~165M
Layers 24
Model dimension 768
FFN hidden dim 2048 (SwiGLU)
Attention heads 6
KV heads (GQA) 2
Head dimension 128
Context length 4096 tokens
Vocabulary size 16,384 (BPE)
Positional encoding RoPE (θ = 10,000)
Normalization RMSNorm (ε = 1e-6)
Tied embeddings Yes

Key design choices: Grouped-Query Attention (GQA) with 3:1 head ratio for efficient KV memory, SwiGLU activations, pre-norm architecture, and bf16 mixed-precision training throughout.

Training Pipeline

The model was trained sequentially across four stages using the DotLM framework:

Stage Dataset Samples Objective
Pretraining SimpleThoughts/pretrain 352,214 Next-token prediction
SFT SimpleThoughts/sft 25,788 ChatML instruction following
Alignment SimpleThoughts/alignment 7,172 Reference-free DPO (SimPO-style)
Reasoning SimpleThoughts/reasoning 6,300 Chain-of-thought with <think> traces

Special Tokens

Token Purpose
<|im_start|> Start of turn (BOS)
<|im_end|> End of turn
<think> Begin reasoning trace
</think> End reasoning trace
<endoftext> End of sequence (EOS)
<pad> Padding

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "tensorfiend/DotLM-165M"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
).to(device)

user_query = "If a ball is placed inside a box and the box is sealed, where is the ball?"

prompt = f"<|im_start|>user\n{user_query}<|im_end|>\n<|im_start|>assistant\n<think>"

inputs = tokenizer(prompt, return_tensors="pt").to(device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_k=50,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Prompt Format

DotLM uses the ChatML format with an explicit reasoning prefix:

<|im_start|>user
{your question}<|im_end|>
<|im_start|>assistant
<think>
{model reasons here}
</think>
{final answer}

Performance & Limitations

  • Scale: At 165M parameters, DotLM is a research-scale model. It is not competitive with large-scale LLMs on general benchmarks.
  • Domain: The model is specialized on thought experiments — intuitive physics, causal reasoning, spatial reasoning, theory of mind, and related domains. It may underperform on unrelated topics.
  • Reasoning quality: The chain-of-thought traces are coherent on in-distribution thought experiments but may hallucinate or ramble on out-of-distribution inputs.
  • Context: Maximum context length is 4,096 tokens.
  • Safety: No RLHF safety training was applied. Not suitable for deployment in user-facing products without additional safety measures.

Training Details

Checkout the blog for training details: DotLM - An end-to-end trained 165M model (coming soon)

Related Resources

Citation

@misc{dotlm2026, author = {Shanmukh}, title = {DotLM-165M: A Minimal Reasoning Language Model Trained on Thought Experiments}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/tensorfiend/DotLM-165M} }

License

https://www.apache.org/licenses/LICENSE-2.0