Progressive Cognitive Architecture β 3B Dream LoRA (English)
β οΈ Inverse scaling case study β Qwen2.5-3B trained with progressive 4-phase training + SVD Dream Pruning. Demonstrates that compression techniques effective on small models can hurt larger ones.
π Results
| Metric | Score |
|---|---|
| Composite Score | 66.0 |
| Exact Accuracy | 56.2% Β± 4.2 |
| Adversarial Robustness | 34.0% Β± 6.0 |
| Delegation Accuracy | 100.0% Β± 0.0 |
| Delegation Rate | 85.3% Β± 3.1 |
| Magnitude Sense (OoMΒ±1) | 100.0% Β± 0.0 |
| Catastrophic Errors | 41.3% Β± 13.7 |
Results: mean Β± std over 3 seeds (42, 43, 44), 50 samples Γ 5 dimensions per seed.
β οΈ Inverse Scaling Effect
This model demonstrates a key finding of the research: Dream pruning helps 1.5B but hurts 3B.
| Metric | 3B Dream (this) | 3B Flat | 1.5B Dream |
|---|---|---|---|
| Composite | 66.0 | 78.5 | 87.6 |
| Adversarial | 34.0% | 84.7% | 84.0% |
| Catastrophic | 41.3% | 0.0% | 0.0% |
Hypothesis: The LoRA-to-base-weight ratio explains this. Rank-16 LoRA adapters represent a larger proportion of 1.5B parameters than 3B. SVD compression (16β8) on the larger model creates adapters too weak to steer behavior reliably β strong enough to interfere, too weak to guide. An adaptive compression ratio (e.g., rank 16β12 for 3B) would likely resolve this.
π§ Progressive Cognitive Architecture
A bio-inspired 4-phase training methodology:
| Phase | Name | What happens |
|---|---|---|
| 1 | Foundation | Learn exact arithmetic via LoRA fine-tuning |
| 2 | Consolidation | SVD Dream Pruning (rank 16β8) compresses knowledge into intuition |
| 3 | Delegation | Learn complexity-aware routing: compute internally vs. delegate to tool |
| 4 | Orchestration | Full pipeline: intuit β route β tool β validate |
Guiding Principle: Knowledge doesn't disappear β it collapses into attractors. Intuition is the compressed residue of experience.
π Dream Pruning (SVD Low-Rank Factorization)
Instead of zeroing out small weights (magnitude pruning), Dream Pruning uses SVD decomposition to reduce the effective rank of LoRA matrices from 16 to 8. This preserves the principal directions ("logical connections") while discarding noise β analogous to memory consolidation during sleep.
W = U·Σ·V^T β W' = U[:,:k]Β·Ξ£[:k,:k]Β·V^T[:k,:] (k=8)
π§ Training Configuration
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-3B |
| LoRA Rank | 16 (β 8 after SVD) |
| LoRA Alpha | 32 |
| LoRA Targets | q_proj, k_proj, v_proj, o_proj |
| Dropout | 0.05 |
| Training Data | ~6,000 English arithmetic examples |
| Hardware | NVIDIA T4 16GB |
π Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B", device_map="auto", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B")
model = PeftModel.from_pretrained(
base_model,
"dexmac/progressive-cognitive-qwen3b-dream-lora",
subfolder="lora_adapters"
)
messages = [{"role": "user", "content": "Solve: 342 * 67"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Related Models
- 1.5B Dream LoRA β Same architecture, smaller model (best overall)
- 3B Flat LoRA β 3B control (outperforms this)
- Results Dataset β Raw evaluation data
- GitHub β Full source code
π Citation
@software{progressive_cognitive_2026,
author = {Dex Mac},
title = {Progressive Cognitive Architecture for LLMs},
year = {2026},
url = {https://github.com/dexmac221/progressive-cognitive},
version = {1.0.0}
}
π License
Apache 2.0
- Downloads last month
- -
Model tree for dexmac/progressive-cognitive-qwen3b-dream-lora
Base model
Qwen/Qwen2.5-3BEvaluation results
- Exact Accuracy (%)self-reported56.200
- Composite Cognitive Scoreself-reported66.000