--- language: - en license: apache-2.0 tags: - causal-lm - reasoning - thought-experiments - chain-of-thought - sft - dpo - alignment - small-language-model - custom-architecture base_model: tensorfiend/DotLM-165M datasets: - tensorfiend/SimpleThoughts pipeline_tag: text-generation library_name: transformers --- # DotLM DotLM is a minimal 165M parameter model, from-scratch transformer trained entirely on the [SimpleThoughts](https://huggingface.co/datasets/tensorfiend/SimpleThoughts) dataset. It uses explicit `...` chain-of-thought traces to reason through intuitive physics, logic, causal inference, and other everyday phenomena before producing an answer. ## Model Details ### Architecture | Parameter | Value | |---|---| | Parameters | ~165M | | Layers | 24 | | Model dimension | 768 | | FFN hidden dim | 2048 (SwiGLU) | | Attention heads | 6 | | KV heads (GQA) | 2 | | Head dimension | 128 | | Context length | 4096 tokens | | Vocabulary size | 16,384 (BPE) | | Positional encoding | RoPE (θ = 10,000) | | Normalization | RMSNorm (ε = 1e-6) | | Tied embeddings | Yes | **Key design choices:** Grouped-Query Attention (GQA) with 3:1 head ratio for efficient KV memory, SwiGLU activations, pre-norm architecture, and bf16 mixed-precision training throughout. ### Training Pipeline The model was trained sequentially across four stages using the [DotLM framework](https://github.com/shanmukh05/DotLM): | Stage | Dataset | Samples | Objective | |---|---|---|---| | Pretraining | SimpleThoughts/pretrain | 352,214 | Next-token prediction | | SFT | SimpleThoughts/sft | 25,788 | ChatML instruction following | | Alignment | SimpleThoughts/alignment | 7,172 | Reference-free DPO (SimPO-style) | | Reasoning | SimpleThoughts/reasoning | 6,300 | Chain-of-thought with `` traces | ### Special Tokens | Token | Purpose | |---|---| | `<\|im_start\|>` | Start of turn (BOS) | | `<\|im_end\|>` | End of turn | | `` | Begin reasoning trace | | `` | End reasoning trace | | `` | End of sequence (EOS) | | `` | Padding | ## Usage ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM repo_id = "tensorfiend/DotLM-165M" device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32, ).to(device) user_query = "If a ball is placed inside a box and the box is sealed, where is the ball?" prompt = f"<|im_start|>user\n{user_query}<|im_end|>\n<|im_start|>assistant\n" inputs = tokenizer(prompt, return_tensors="pt").to(device) outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_k=50, do_sample=True, eos_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(outputs[0], skip_special_tokens=False)) ``` ### Prompt Format DotLM uses the ChatML format with an explicit reasoning prefix: ``` <|im_start|>user {your question}<|im_end|> <|im_start|>assistant {model reasons here} {final answer} ``` ## Performance & Limitations - Scale: At 165M parameters, DotLM is a research-scale model. It is not competitive with large-scale LLMs on general benchmarks. - Domain: The model is specialized on thought experiments — intuitive physics, causal reasoning, spatial reasoning, theory of mind, and related domains. It may underperform on unrelated topics. - Reasoning quality: The chain-of-thought traces are coherent on in-distribution thought experiments but may hallucinate or ramble on out-of-distribution inputs. - Context: Maximum context length is 4,096 tokens. - Safety: No RLHF safety training was applied. Not suitable for deployment in user-facing products without additional safety measures. ## Training Details Checkout the blog for training details: [DotLM - An end-to-end trained 165M model](https://www.tensorwrites.com/) (coming soon) Related Resources - Dataset: [SimpleThoughts](https://huggingface.co/datasets/tensorfiend/SimpleThoughts) - Training code: [DotLM](https://github.com/shanmukh05/DotLM) (coming soon) ## Citation @misc{dotlm2026, author = {Shanmukh}, title = {DotLM-165M: A Minimal Reasoning Language Model Trained on Thought Experiments}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/tensorfiend/DotLM-165M} } ## License https://www.apache.org/licenses/LICENSE-2.0