OzTianlu commited on
Commit
21b4315
·
verified ·
1 Parent(s): c90fe04

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -9
README.md CHANGED
@@ -46,21 +46,34 @@ This enables **60 total refinement steps** (30 layers × 2 steps each) throughou
46
 
47
  Evaluated on LM-Evaluation-Harness:
48
 
49
- | Task | Metric | Asterisk-Pi | Asterisk (Base) | Δ |
50
- |------|--------|-------------|-----------------|---|
51
- | **ARC-Challenge** | acc_norm | **0.3038** | 0.2884 | +0.0154 |
52
- | **ARC-Easy** | acc_norm | **0.5412** | 0.5450 | -0.0038 |
53
- | **HellaSwag** | acc_norm | **0.4207** | 0.4430 | -0.0223 |
54
- | **PIQA** | acc_norm | **0.6703** | 0.6770 | -0.0067 |
55
- | **WinoGrande** | acc | **0.5391** | 0.5210 | +0.0181 |
56
 
57
  ### Analysis
58
 
59
- π-Flow shows improvements on:
60
  - **ARC-Challenge** (+1.54%): More challenging reasoning benefits from iterative refinement
61
  - **WinoGrande** (+1.81%): Multi-step resolution helps with pronoun disambiguation
62
 
63
- Mixed results on simpler tasks suggest π-flow adds reasoning depth that's most beneficial for complex multi-step problems.
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ## Architecture
66
 
 
46
 
47
  Evaluated on LM-Evaluation-Harness:
48
 
49
+ | Task | Metric | Asterisk-Pi<br>(173.7M) | Asterisk<br>(171.2M) | SmolLM2-135M<br>(135.6M) | Gemma-3-270m-it<br>(270M) | Δ vs Asterisk | Δ vs SmolLM2 | Δ vs Gemma-3 |
50
+ |------|--------|-------------|-----------------|--------------|----------------|---------------|--------------|--------------|
51
+ | **ARC-Challenge** | acc_norm | **0.3038** | 0.2884 | 0.2773 | 0.2730 | +0.0154 | **+0.0265** | **+0.0308** |
52
+ | **ARC-Easy** | acc_norm | **0.5412** | **0.5450** | 0.4899 | 0.5059 | -0.0038 | **+0.0513** | **+0.0353** |
53
+ | **HellaSwag** | acc_norm | 0.4207 | **0.4430** | 0.4293 | 0.3937 | -0.0223 | -0.0086 | **+0.0270** |
54
+ | **PIQA** | acc_norm | 0.6703 | **0.6770** | 0.6632 | 0.6692 | -0.0067 | **+0.0071** | +0.0011 |
55
+ | **WinoGrande** | acc | **0.5391** | 0.5210 | 0.5154 | 0.5257 | +0.0181 | **+0.0237** | +0.0134 |
56
 
57
  ### Analysis
58
 
59
+ **π-Flow improvements over base Asterisk:**
60
  - **ARC-Challenge** (+1.54%): More challenging reasoning benefits from iterative refinement
61
  - **WinoGrande** (+1.81%): Multi-step resolution helps with pronoun disambiguation
62
 
63
+ **Improvements over SmolLM2-135M base:**
64
+ - **ARC-Challenge** (+2.65%): Hybrid architecture + π-flow significantly improves complex reasoning
65
+ - **ARC-Easy** (+5.13%): Strong gains on elementary science questions
66
+ - **WinoGrande** (+2.37%): Better pronoun disambiguation through iterative refinement
67
+ - **PIQA** (+0.71%): Modest gains on physical commonsense
68
+
69
+ **Outperforming Gemma-3-270m-it (with 96M fewer parameters):**
70
+ - **ARC-Challenge** (+3.08%): Superior reasoning despite being 35% smaller
71
+ - **ARC-Easy** (+3.53%): Significant advantage on elementary science
72
+ - **HellaSwag** (+2.70%): Much stronger commonsense reasoning
73
+ - **WinoGrande** (+1.34%): Better coreference resolution
74
+ - **PIQA** (+0.11%): Comparable physical reasoning
75
+
76
+ **Key insight**: Asterisk-Pi (173.7M params) consistently outperforms the much larger Gemma-3-270m-it (270M params), demonstrating that the hybrid ASPP-Attention architecture with π-flow refinement achieves superior parameter efficiency. The structured reasoning approach enables better performance per parameter, especially on complex multi-step reasoning tasks.
77
 
78
  ## Architecture
79