Upload README.md

Browse files

Files changed (1) hide show

README.md +22 -9

README.md CHANGED Viewed

@@ -46,21 +46,34 @@ This enables **60 total refinement steps** (30 layers × 2 steps each) throughou
 Evaluated on LM-Evaluation-Harness:
-| Task | Metric | Asterisk-Pi | Asterisk (Base) | Δ |
-|------|--------|-------------|-----------------|---|
-| **ARC-Challenge** | acc_norm | **0.3038** | 0.2884 | +0.0154 |
-| **ARC-Easy** | acc_norm | **0.5412** | 0.5450 | -0.0038 |
-| **HellaSwag** | acc_norm | **0.4207** | 0.4430 | -0.0223 |
-| **PIQA** | acc_norm | **0.6703** | 0.6770 | -0.0067 |
-| **WinoGrande** | acc | **0.5391** | 0.5210 | +0.0181 |
 ### Analysis
-π-Flow shows improvements on:
 - **ARC-Challenge** (+1.54%): More challenging reasoning benefits from iterative refinement
 - **WinoGrande** (+1.81%): Multi-step resolution helps with pronoun disambiguation
-Mixed results on simpler tasks suggest π-flow adds reasoning depth that's most beneficial for complex multi-step problems.
 ## Architecture

 Evaluated on LM-Evaluation-Harness:
+| Task | Metric | Asterisk-Pi<br>(173.7M) | Asterisk<br>(171.2M) | SmolLM2-135M<br>(135.6M) | Gemma-3-270m-it<br>(270M) | Δ vs Asterisk | Δ vs SmolLM2 | Δ vs Gemma-3 |
+|------|--------|-------------|-----------------|--------------|----------------|---------------|--------------|--------------|
+| **ARC-Challenge** | acc_norm | **0.3038** | 0.2884 | 0.2773 | 0.2730 | +0.0154 | **+0.0265** | **+0.0308** |
+| **ARC-Easy** | acc_norm | **0.5412** | **0.5450** | 0.4899 | 0.5059 | -0.0038 | **+0.0513** | **+0.0353** |
+| **HellaSwag** | acc_norm | 0.4207 | **0.4430** | 0.4293 | 0.3937 | -0.0223 | -0.0086 | **+0.0270** |
+| **PIQA** | acc_norm | 0.6703 | **0.6770** | 0.6632 | 0.6692 | -0.0067 | **+0.0071** | +0.0011 |
+| **WinoGrande** | acc | **0.5391** | 0.5210 | 0.5154 | 0.5257 | +0.0181 | **+0.0237** | +0.0134 |
 ### Analysis
+**π-Flow improvements over base Asterisk:**
 - **ARC-Challenge** (+1.54%): More challenging reasoning benefits from iterative refinement
 - **WinoGrande** (+1.81%): Multi-step resolution helps with pronoun disambiguation
+**Improvements over SmolLM2-135M base:**
+- **ARC-Challenge** (+2.65%): Hybrid architecture + π-flow significantly improves complex reasoning
+- **ARC-Easy** (+5.13%): Strong gains on elementary science questions
+- **WinoGrande** (+2.37%): Better pronoun disambiguation through iterative refinement
+- **PIQA** (+0.71%): Modest gains on physical commonsense
+**Outperforming Gemma-3-270m-it (with 96M fewer parameters):**
+- **ARC-Challenge** (+3.08%): Superior reasoning despite being 35% smaller
+- **ARC-Easy** (+3.53%): Significant advantage on elementary science
+- **HellaSwag** (+2.70%): Much stronger commonsense reasoning
+- **WinoGrande** (+1.34%): Better coreference resolution
+- **PIQA** (+0.11%): Comparable physical reasoning
+**Key insight**: Asterisk-Pi (173.7M params) consistently outperforms the much larger Gemma-3-270m-it (270M params), demonstrating that the hybrid ASPP-Attention architecture with π-flow refinement achieves superior parameter efficiency. The structured reasoning approach enables better performance per parameter, especially on complex multi-step reasoning tasks.
 ## Architecture