Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -5,6 +5,7 @@ library_name: peft
|
|
| 5 |
base_model: Qwen/Qwen2.5-1.5B
|
| 6 |
tags:
|
| 7 |
- lora
|
|
|
|
| 8 |
- cognitive-architecture
|
| 9 |
- progressive-learning
|
| 10 |
- dream-pruning
|
|
@@ -18,66 +19,94 @@ datasets:
|
|
| 18 |
pipeline_tag: text-generation
|
| 19 |
---
|
| 20 |
|
| 21 |
-
#
|
| 22 |
|
| 23 |
-
**
|
| 24 |
|
| 25 |
-
##
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
| 30 |
-
2. **Consolidation** β SVD Dream Pruning compresses exact circuits into intuition (rank 16β8), then fine-tune on approximation (1,500 examples)
|
| 31 |
-
3. **Delegation** β Learn when to delegate to a calculator tool vs compute internally (1,500 examples)
|
| 32 |
-
4. **Orchestration** β Full pipeline: intuition β routing β tool β validation (1,000 examples)
|
| 33 |
|
| 34 |
-
##
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|--------|-----------|-----------|------|
|
| 42 |
-
| Exact Accuracy | 58.6% Β± 2.9 | 60.6% Β± 3.8 | 18.2% Β± 2.9 |
|
| 43 |
-
| Number Sense | 60.0% Β± 0.8 | 0.0% | 57.0% Β± 1.4 |
|
| 44 |
-
| Metacognition (delegation) | **100.0%** | 0.0% | 84.9% |
|
| 45 |
-
| Sensible Errors | 81.3% | β | β |
|
| 46 |
|
| 47 |
-
**Key insight**: Flat-LoRA wins on raw accuracy but *destroys* number sense and metacognition. Dream-LoRA preserves both while achieving comparable accuracy.
|
| 48 |
|
| 49 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
```python
|
| 52 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 53 |
from peft import PeftModel
|
| 54 |
|
| 55 |
-
base_model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
|
|
|
| 56 |
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
|
|
|
| 64 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 65 |
```
|
| 66 |
|
| 67 |
-
##
|
| 68 |
|
| 69 |
-
-
|
| 70 |
-
-
|
| 71 |
-
- **Dream
|
| 72 |
-
-
|
| 73 |
-
- **Hardware**: NVIDIA T4 (16GB VRAM) on Hugging Face Spaces
|
| 74 |
-
- **Training time**: ~45 minutes
|
| 75 |
|
| 76 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
- **Article**: [What if AI Models Learned Like Humans Do?](https://medium.com/towards-artificial-intelligence/what-if-ai-models-learned-like-humans-do-c69c19f29d0c)
|
| 79 |
-
- **GitHub**: [dexmac221/progressive-cognitive](https://github.com/dexmac221/progressive-cognitive)
|
| 80 |
|
| 81 |
-
##
|
| 82 |
|
| 83 |
Apache 2.0
|
|
|
|
|
|
| 5 |
base_model: Qwen/Qwen2.5-1.5B
|
| 6 |
tags:
|
| 7 |
- lora
|
| 8 |
+
- peft
|
| 9 |
- cognitive-architecture
|
| 10 |
- progressive-learning
|
| 11 |
- dream-pruning
|
|
|
|
| 19 |
pipeline_tag: text-generation
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# Architettura Cognitiva Progressiva β Dream-LoRA con SVD Pruning (Italiano)
|
| 23 |
|
| 24 |
+
**Modello principale italiano** β Qwen2.5-1.5B addestrato con architettura cognitiva progressiva a 4 fasi + **SVD Dream Pruning** (rank 16β8).
|
| 25 |
|
| 26 |
+
## π Risultati
|
| 27 |
|
| 28 |
+
| Metrica | Dream-LoRA (questo) | Progressive-LoRA | Flat-LoRA |
|
| 29 |
+
|---------|---------------------|------------------|-----------|
|
| 30 |
+
| Accuratezza Esatta | **58.6% Β± 2.9** | 37.0% Β± 0.5 | 60.6% |
|
| 31 |
+
| Number Sense | **60.0% Β± 0.8** | 57.7% Β± 0.5 | 0.0% |
|
| 32 |
+
| Metacognizione | **100.0%** | 98.5% | 0.0% |
|
| 33 |
|
| 34 |
+
Il passaggio da magnitude pruning a SVD Dream Pruning ha migliorato significativamente l'accuratezza esatta (+21.6pp) preservando number sense e metacognizione.
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
## π§ Progressive Cognitive Architecture
|
| 37 |
|
| 38 |
+
A bio-inspired 4-phase training methodology:
|
| 39 |
|
| 40 |
+
| Phase | Name | What happens |
|
| 41 |
+
|-------|------|-------------|
|
| 42 |
+
| 1 | **Foundation** | Learn exact arithmetic via LoRA fine-tuning |
|
| 43 |
+
| 2 | **Consolidation** | SVD Dream Pruning (rank 16β8) compresses knowledge into intuition |
|
| 44 |
+
| 3 | **Delegation** | Learn complexity-aware routing: compute internally vs. delegate to tool |
|
| 45 |
+
| 4 | **Orchestration** | Full pipeline: intuit β route β tool β validate |
|
| 46 |
|
| 47 |
+
**Guiding Principle:** *Knowledge doesn't disappear β it collapses into attractors. Intuition is the compressed residue of experience.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
|
|
|
| 49 |
|
| 50 |
+
## π Dream Pruning (Fattorizzazione SVD a Basso Rango)
|
| 51 |
+
|
| 52 |
+
Invece di azzerare i pesi piccoli, il Dream Pruning usa la **decomposizione SVD** per ridurre il rango effettivo delle matrici LoRA da 16 a 8. Preserva le direzioni principali ("connessioni logiche") scartando il rumore β analogo al consolidamento della memoria durante il sonno.
|
| 53 |
+
|
| 54 |
+
## π§ Configurazione
|
| 55 |
+
|
| 56 |
+
| Parametro | Valore |
|
| 57 |
+
|-----------|--------|
|
| 58 |
+
| Modello Base | Qwen/Qwen2.5-1.5B |
|
| 59 |
+
| LoRA Rank | 16 (β 8 dopo SVD) |
|
| 60 |
+
| LoRA Alpha | 32 |
|
| 61 |
+
| Target LoRA | q_proj, k_proj, v_proj, o_proj |
|
| 62 |
+
| Tipo Pruning | SVD Low-Rank Factorization |
|
| 63 |
+
| Lingua Dati | Italiano |
|
| 64 |
+
|
| 65 |
+
## π Uso Rapido
|
| 66 |
|
| 67 |
```python
|
| 68 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 69 |
from peft import PeftModel
|
| 70 |
|
| 71 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
| 72 |
+
"Qwen/Qwen2.5-1.5B", device_map="auto", torch_dtype="auto"
|
| 73 |
+
)
|
| 74 |
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
|
| 75 |
|
| 76 |
+
model = PeftModel.from_pretrained(
|
| 77 |
+
base_model,
|
| 78 |
+
"dexmac/progressive-cognitive-dream-lora",
|
| 79 |
+
subfolder="lora_adapters"
|
| 80 |
+
)
|
| 81 |
|
| 82 |
+
messages = [{"role": "user", "content": "Risolvi: 342 * 67"}]
|
| 83 |
+
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 84 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 85 |
+
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
|
| 86 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 87 |
```
|
| 88 |
|
| 89 |
+
## π Modelli Correlati
|
| 90 |
|
| 91 |
+
- [Progressive-LoRA (IT)](https://huggingface.co/dexmac/progressive-cognitive-lora) β Primo prototipo con magnitude pruning
|
| 92 |
+
- [Flat-LoRA (IT)](https://huggingface.co/dexmac/progressive-cognitive-baseline-lora) β Controllo senza fasi
|
| 93 |
+
- [**1.5B Dream (EN)**](https://huggingface.co/dexmac/progressive-cognitive-dream-lora-en) β Miglior modello (inglese, composite 87.6)
|
| 94 |
+
- [GitHub](https://github.com/dexmac221/progressive-cognitive) β Codice sorgente completo
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
## π Citation
|
| 97 |
+
|
| 98 |
+
```bibtex
|
| 99 |
+
@software{progressive_cognitive_2026,
|
| 100 |
+
author = {Dex Mac},
|
| 101 |
+
title = {Progressive Cognitive Architecture for LLMs},
|
| 102 |
+
year = {2026},
|
| 103 |
+
url = {https://github.com/dexmac221/progressive-cognitive},
|
| 104 |
+
version = {1.0.0}
|
| 105 |
+
}
|
| 106 |
+
```
|
| 107 |
|
|
|
|
|
|
|
| 108 |
|
| 109 |
+
## π License
|
| 110 |
|
| 111 |
Apache 2.0
|
| 112 |
+
|