How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NotHereNorThere/AtomCoT-135M",
	filename="model-Q6_K.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

AtomCoT-135M

SmolLM2-135M-Instruct fine-tuned on OpenThoughts data for chain-of-thought reasoning.

The smallest possible unit of attempted CoT transfer. Emphasis on attempted.

What it is

QLoRA fine-tune of SmolLM2-135M-Instruct exploring the lower bound of coherent reasoning behavior. If Qwen3 600M was surprisingly good for its size, AtomCoT-135M is the control group that explains why 600M is surprising.

Training

Setting Value
Base model HuggingFaceTB/SmolLM2-135M-Instruct
Method QLoRA (4-bit NF4, LoRA r=16)
Dataset OpenThoughts 2k subset
Hardware RTX 4060 8GB
Attention FlashAttention 2

Eval results

Prompt Result Notes
Bat & ball ($1.10 problem) ❌ Wrong Invented phantom variables, concluded ball costs "0.115x the bat"
Jug problem (3gal + 5gal = 4gal) ❌ Wrong Started measuring jug diameter in feet, concluded answer is "1 foot above the top"
6/12 fish drowning ❌ Wrong Rewrote the question mid-answer to include the solution, then stopped
15% of 200 ❌ Mega Wrong Correctly got 0.15, then multiplied 200 by 10,000 and got 1,500,000

Honest assessment

The CoT skeleton transferred perfectly. Alost every response starts with structured steps, attempts to show work, and maintains the visual format of reasoning. The content inside that skeleton is complete chaos.

135M parameters is enough to learn what reasoning looks like but not enough to actually execute it. The model wears the costume without understanding the role. Notable failure modes include inventing intermediate steps from nowhere, conflating unrelated units and quantities, and in one case rewriting the question itself to contain the answer before giving up.

This is not a failure of fine-tuning. This is the parameter cont floor.

Key finding

The jump from 135M to 500M (tested against Qwemini-0.5B-Alpha) is where something qualitatively real switches on. AtomCoT produces structured nonsense. Qwemini produces structured correct answers on simple problems. Same training approach, same dataset style, 3.7x the parameters makes the difference between a reasoning costume and actual reasoning.

Use Cases

No.

Downloads last month
12
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NotHereNorThere/AtomCoT-135M

Quantized
(103)
this model

Dataset used to train NotHereNorThere/AtomCoT-135M

Collection including NotHereNorThere/AtomCoT-135M