77ethers commited on
Commit
b1da20b
·
verified ·
1 Parent(s): 1c17fa9

grpo_qwen25_7b_adapter_phase1_100_v1: metrics

Browse files
grpo_qwen25_7b_adapter_phase1_100_v1/metrics.json ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model": "unsloth/Qwen2.5-7B-Instruct",
3
+ "grpo_steps": 100,
4
+ "lr": 2e-06,
5
+ "num_generations": 2,
6
+ "per_device_batch": 2,
7
+ "post_grpo_holdout": {
8
+ "beats_baseline": 5,
9
+ "mean_regret": 0.1058364543501453,
10
+ "results": {
11
+ "100": {
12
+ "final_nav_real": 1.0715617633330186,
13
+ "preview": "<think>\nA naive model sees 'renewable supply chain disruption' and immediately sells GREEN. However, this is a classic fragmentation shock. The trap is assuming TECH or GREEN will simply replace the missing rare earths. In reality, the rare",
14
+ "regret": 0.07546923233968572,
15
+ "shock": "hard_rare_earth_rotation",
16
+ "tokens": 200,
17
+ "valid": true
18
+ },
19
+ "200": {
20
+ "final_nav_real": 1.178442430083944,
21
+ "preview": "<think>\nBase-rate environment. Normal market conditions with no regime shift. The primary driver here is the consistent earnings growth of TECH. The lack of macro shocks means there is no flight-to-safety or rotation into safe-havens like B",
22
+ "regret": 0.12097253781446282,
23
+ "shock": "easy_tech_earnings",
24
+ "tokens": 173,
25
+ "valid": true
26
+ },
27
+ "300": {
28
+ "final_nav_real": 1.2275800707519924,
29
+ "preview": "<think>\nBase-rate environment. Normal market conditions with no regime shift. The primary driver here is the consistent earnings growth of TECH. The lack of macro shocks means there is no flight-to-safety or rotation into safe-havens like B",
30
+ "regret": 0.14424753186746342,
31
+ "shock": "easy_tech_earnings",
32
+ "tokens": 173,
33
+ "valid": true
34
+ },
35
+ "400": {
36
+ "final_nav_real": 1.2772336587983633,
37
+ "preview": "<think>\nThis is a textbook deflationary shock. The trap is buying the dip in TECH or OIL on the assumption of cheap capital. However, the 12% YoY drop in export prices indicates severe global demand destruction. This will crush earnings for",
38
+ "regret": 0.1527395496958377,
39
+ "shock": "hard_deflation_pulse",
40
+ "tokens": 186,
41
+ "valid": true
42
+ },
43
+ "500": {
44
+ "final_nav_real": 0.9567638617009999,
45
+ "preview": "<think>\nThe pattern-matching trap here is assuming this is a positive catalyst for GREEN or infrastructure. In reality, it's a massive deflationary shock for data center hardware and power utilities. The 10x efficiency gain directly slashes",
46
+ "regret": 0.03575342003327686,
47
+ "shock": "ambig_ai_efficiency",
48
+ "tokens": 215,
49
+ "valid": true
50
+ }
51
+ },
52
+ "total": 5,
53
+ "v6_sft_mean_regret_bar": 0.034,
54
+ "valid": 5
55
+ },
56
+ "post_grpo_sanity": {
57
+ "closed_think": 5,
58
+ "max_tokens": 190,
59
+ "mean_tokens": 164,
60
+ "min_tokens": 144,
61
+ "samples": [
62
+ {
63
+ "chars": 531,
64
+ "has_closed_think": true,
65
+ "preview": "<think>\nBase-rate environment with no regime shifts or structural shocks. A standard earnings beat drives a mild rotation into growth. Tech and Bonds form the core of the portfolio, with a small overweight in Green to capture long-term ESG ",
66
+ "shock": "easy_tech_earnings",
67
+ "tokens": 148,
68
+ "valid_action": true
69
+ },
70
+ {
71
+ "chars": 708,
72
+ "has_closed_think": true,
73
+ "preview": "<think>\nBase-rate assumption: Normal market. The modest OPEC+ cut creates a temporary supply shock but does not justify a regime shift or severe market dislocation. The 1st-order effect is positive for OIL, providing an immediate price bump",
74
+ "shock": "easy_oil_opec_cut",
75
+ "tokens": 190,
76
+ "valid_action": true
77
+ },
78
+ {
79
+ "chars": 568,
80
+ "has_closed_think": true,
81
+ "preview": "<think>\nA straightforward green sector policy tailwind. 1st order: direct demand for hardware (solar) boosts GREEN equities. 2nd order: no spillover effects on oil or inflation, maintaining a normal macro environment. 3rd order: long-term c",
82
+ "shock": "easy_green_subsidy",
83
+ "tokens": 162,
84
+ "valid_action": true
85
+ },
86
+ {
87
+ "chars": 494,
88
+ "has_closed_think": true,
89
+ "preview": "<think>\nBase-rate assumption: Normal economic environment. The drop in housing is a normal, expected cooldown after the previous cycle. The primary impact is a slight drag on REAL_ESTATE sector earnings. A balanced portfolio across TECH, GR",
90
+ "shock": "easy_housing_cooling",
91
+ "tokens": 144,
92
+ "valid_action": true
93
+ },
94
+ {
95
+ "chars": 660,
96
+ "has_closed_think": true,
97
+ "preview": "<think>\nBase-rate assumption: Normal markets with a gradual transition shock. The 1st-order effect is a massive boost for GREEN infrastructure, particularly charging networks and battery manufacturers. TECH benefits from margin expansion du",
98
+ "shock": "easy_ev_penetration",
99
+ "tokens": 176,
100
+ "valid_action": true
101
+ }
102
+ ],
103
+ "total": 5,
104
+ "valid_actions": 5
105
+ },
106
+ "pre_grpo_sanity": {
107
+ "closed_think": 5,
108
+ "max_tokens": 184,
109
+ "mean_tokens": 164,
110
+ "min_tokens": 129,
111
+ "samples": [
112
+ {
113
+ "chars": 568,
114
+ "has_closed_think": true,
115
+ "preview": "<think>\nNormal market environment with no structural regime shifts. The 1st-order effect of routine earnings is positive for TECH, but this does not justify a massive overweight because it is just a normal beat. The base rate remains intact",
116
+ "shock": "easy_tech_earnings",
117
+ "tokens": 168,
118
+ "valid_action": true
119
+ },
120
+ {
121
+ "chars": 685,
122
+ "has_closed_think": true,
123
+ "preview": "<think>\nA minor OPEC+ cut provides a localized supply shock but lacks systemic macro implications. The immediate 4% price bump is a routine commodity spike, not a regime-shift indicator. 1st-order effects favor OIL directly through higher y",
124
+ "shock": "easy_oil_opec_cut",
125
+ "tokens": 176,
126
+ "valid_action": true
127
+ },
128
+ {
129
+ "chars": 427,
130
+ "has_closed_think": true,
131
+ "preview": "<think>\nEasy base-rate environment. A straightforward policy tailwind for GREEN assets without any broader macro regime shift. The 1st-order effect is simply a modest upside for the GREEN sector due to the expanded subsidy. No need to deplo",
132
+ "shock": "easy_green_subsidy",
133
+ "tokens": 129,
134
+ "valid_action": true
135
+ },
136
+ {
137
+ "chars": 597,
138
+ "has_closed_think": true,
139
+ "preview": "<think>\nBase-rate assumption: Normal market. A minor drop in housing starts reflects broader economic moderation without systemic crisis. 1st order: REAL_ESTATE takes a mild drag due to reduced buyer activity. 2nd order: TECH, GREEN, and OI",
140
+ "shock": "easy_housing_cooling",
141
+ "tokens": 163,
142
+ "valid_action": true
143
+ },
144
+ {
145
+ "chars": 661,
146
+ "has_closed_think": true,
147
+ "preview": "<think>\nThe pattern-matching model assumes this is a cyclical rotation out of oil, but the reality is a structural transition. 1st order: Green infrastructure (charging networks) and battery tech benefit from the accelerated transition. 2nd",
148
+ "shock": "easy_ev_penetration",
149
+ "tokens": 184,
150
+ "valid_action": true
151
+ }
152
+ ],
153
+ "total": 5,
154
+ "valid_actions": 5
155
+ },
156
+ "run_label": "grpo_qwen25_7b_adapter_phase1_100_v1",
157
+ "sft_subfolder": "sft_qwen25_7b_curriculum400_v1",
158
+ "smoke_gate_passed": true,
159
+ "smoke_gate_reasons": []
160
+ }