Progressive Cognitive Architecture — 3B Dream LoRA (English)

⚠️ Inverse scaling case study — Qwen2.5-3B trained with progressive 4-phase training + SVD Dream Pruning. Demonstrates that compression techniques effective on small models can hurt larger ones.

📊 Results

Metric	Score
Composite Score	66.0
Exact Accuracy	56.2% ± 4.2
Adversarial Robustness	34.0% ± 6.0
Delegation Accuracy	100.0% ± 0.0
Delegation Rate	85.3% ± 3.1
Magnitude Sense (OoM±1)	100.0% ± 0.0
Catastrophic Errors	41.3% ± 13.7

Results: mean ± std over 3 seeds (42, 43, 44), 50 samples × 5 dimensions per seed.

⚠️ Inverse Scaling Effect

This model demonstrates a key finding of the research: Dream pruning helps 1.5B but hurts 3B.

Metric	3B Dream (this)	3B Flat	1.5B Dream
Composite	66.0	78.5	87.6
Adversarial	34.0%	84.7%	84.0%
Catastrophic	41.3%	0.0%	0.0%

Hypothesis: The LoRA-to-base-weight ratio explains this. Rank-16 LoRA adapters represent a larger proportion of 1.5B parameters than 3B. SVD compression (16→8) on the larger model creates adapters too weak to steer behavior reliably — strong enough to interfere, too weak to guide. An adaptive compression ratio (e.g., rank 16→12 for 3B) would likely resolve this.

🧠 Progressive Cognitive Architecture

A bio-inspired 4-phase training methodology:

Phase	Name	What happens
1	Foundation	Learn exact arithmetic via LoRA fine-tuning
2	Consolidation	SVD Dream Pruning (rank 16→8) compresses knowledge into intuition
3	Delegation	Learn complexity-aware routing: compute internally vs. delegate to tool
4	Orchestration	Full pipeline: intuit → route → tool → validate

Guiding Principle: Knowledge doesn't disappear — it collapses into attractors. Intuition is the compressed residue of experience.

🌙 Dream Pruning (SVD Low-Rank Factorization)

Instead of zeroing out small weights (magnitude pruning), Dream Pruning uses SVD decomposition to reduce the effective rank of LoRA matrices from 16 to 8. This preserves the principal directions ("logical connections") while discarding noise — analogous to memory consolidation during sleep.

W = U·Σ·V^T  →  W' = U[:,:k]·Σ[:k,:k]·V^T[:k,:]   (k=8)

🔧 Training Configuration

Parameter	Value
Base Model	Qwen/Qwen2.5-3B
LoRA Rank	16 (→ 8 after SVD)
LoRA Alpha	32
LoRA Targets	q_proj, k_proj, v_proj, o_proj
Dropout	0.05
Training Data	~6,000 English arithmetic examples
Hardware	NVIDIA T4 16GB

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B", device_map="auto", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B")

model = PeftModel.from_pretrained(
    base_model,
    "dexmac/progressive-cognitive-qwen3b-dream-lora",
    subfolder="lora_adapters"
)

messages = [{"role": "user", "content": "Solve: 342 * 67"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔗 Related Models

1.5B Dream LoRA — Same architecture, smaller model (best overall)
3B Flat LoRA — 3B control (outperforms this)
Results Dataset — Raw evaluation data
GitHub — Full source code

📝 Citation

@software{progressive_cognitive_2026,
  author = {Dex Mac},
  title = {Progressive Cognitive Architecture for LLMs},
  year = {2026},
  url = {https://github.com/dexmac221/progressive-cognitive},
  version = {1.0.0}
}

📄 License

Apache 2.0

Downloads last month: -

Model tree for dexmac/progressive-cognitive-qwen3b-dream-lora

Base model

Qwen/Qwen2.5-3B

Adapter

(424)

this model

Evaluation results

Exact Accuracy (%)
self-reported

56.200
Composite Cognitive Score
self-reported

66.000