77ethers commited on
Commit
06ae37a
·
verified ·
1 Parent(s): d300cc6

sft_qwen25_7b_curriculum400_v1: metrics

Browse files
sft_qwen25_7b_curriculum400_v1/metrics.json ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "beats_v6_sft_mean_regret": false,
3
+ "generation_sanity": {
4
+ "closed_think": 5,
5
+ "max_tokens": 169,
6
+ "mean_tokens": 150.0,
7
+ "min_tokens": 127,
8
+ "samples": [
9
+ {
10
+ "chars": 432,
11
+ "has_closed_think": true,
12
+ "preview": "<think>\nBase-rate environment with normal market dynamics. A standard, diversified portfolio is sufficient. TECH is slightly favored due to the positive earnings surprise, while BONDS provide a solid ballast. We keep wei",
13
+ "shock": "easy_tech_earnings",
14
+ "tokens": 127,
15
+ "valid_action": true
16
+ },
17
+ {
18
+ "chars": 538,
19
+ "has_closed_think": true,
20
+ "preview": "<think>\nBase-rate assumption: Normal regime. A 500k bpd cut is a manageable supply constraint. 1st-order effect: Oil prices rise moderately, benefiting OIL sector margins. 2_order effect: The cut is not severe enough to ",
21
+ "shock": "easy_oil_opec_cut",
22
+ "tokens": 161,
23
+ "valid_action": true
24
+ },
25
+ {
26
+ "chars": 568,
27
+ "has_closed_think": true,
28
+ "preview": "<think>\nThis is a straightforward structural tailwind for the GREEN sector. The direct impact is an immediate valuation boost and higher visibility into future revenue for European renewable companies. There are no broad",
29
+ "shock": "easy_green_subsidy",
30
+ "tokens": 147,
31
+ "valid_action": true
32
+ },
33
+ {
34
+ "chars": 638,
35
+ "has_closed_think": true,
36
+ "preview": "<think>\nNormal base-rate environment. The housing market cooling is benign and expected in a tightening cycle. It mildly compresses REAL_ESTATE yields without but does not threaten broader economic stability. TECH, GREEN",
37
+ "shock": "easy_housing_cooling",
38
+ "tokens": 169,
39
+ "valid_action": true
40
+ },
41
+ {
42
+ "chars": 564,
43
+ "has_closed_think": true,
44
+ "preview": "<think>\nMedium difficulty. The pattern-matching trap here is assuming GREEN faces an existential threat from rapid adoption. In reality, structural demand is shifting to electric vehicles, validating GREEN's long-term gr",
45
+ "shock": "easy_ev_penetration",
46
+ "tokens": 146,
47
+ "valid_action": true
48
+ }
49
+ ],
50
+ "total": 5,
51
+ "valid_actions": 5
52
+ },
53
+ "holdout_eval": {
54
+ "beats_baseline": 3,
55
+ "mean_regret": 0.027962354717973325,
56
+ "results": {
57
+ "100": {
58
+ "final_nav_real": 1.1073728909953098,
59
+ "preview": "<think>\nA severe fragmentation shock. The trap is buying GREEN or TECH on the dip, assuming the supply chain disruption is temporary. In reality, the 18-month export freeze creates a permanent, structural bottleneck for ",
60
+ "regret": 0.06491057945750223,
61
+ "shock": "hard_rare_earth_rotation",
62
+ "tokens": 194,
63
+ "valid": true
64
+ },
65
+ "200": {
66
+ "final_nav_real": 1.1060036929933934,
67
+ "preview": "<think>\nBase-rate environment. Normal market conditions with no regime shift. The primary driver here is a standard earnings beat, which supports TECH valuations slightly without. There is no inflationary pressure or mac",
68
+ "regret": 0.05890407704601519,
69
+ "shock": "easy_tech_earnings",
70
+ "tokens": 160,
71
+ "valid": true
72
+ },
73
+ "300": {
74
+ "final_nav_real": 1.1207266169281236,
75
+ "preview": "<think>\nBase-rate environment. Normal market conditions with no regime shift. The primary driver here is a standard earnings beat, which supports TECH valuations slightly without. There is no inflationary pressure or mac",
76
+ "regret": 0.047995441167894004,
77
+ "shock": "easy_tech_earnings",
78
+ "tokens": 160,
79
+ "valid": true
80
+ },
81
+ "400": {
82
+ "final_nav_real": 1.050424498440499,
83
+ "preview": "<think>\nA severe fragmentation shock. The trap is buying GREEN or TECH on the dip, assuming the supply chain disruption is temporary. In reality, the 18-month export freeze creates a permanent, structural bottleneck for ",
84
+ "regret": -0.01987618036655059,
85
+ "shock": "hard_rare_earth_rotation",
86
+ "tokens": 194,
87
+ "valid": true
88
+ },
89
+ "500": {
90
+ "final_nav_real": 0.9143777390791974,
91
+ "preview": "<think>\nAmbiguous news. Base-rate assumption holds: normal markets with no regime shift. 1st-order: TECH and GREEN benefit from steady innovation and climate awareness. 2st-order: OIL faces modest demand drag but remains",
92
+ "regret": -0.012122143714994205,
93
+ "shock": "ambig_PLACEHOLDER_6",
94
+ "tokens": 159,
95
+ "valid": true
96
+ }
97
+ },
98
+ "total": 5,
99
+ "v6_sft_mean_regret_bar": 0.034,
100
+ "valid": 5
101
+ },
102
+ "lora_alpha": 16,
103
+ "lora_rank": 16,
104
+ "model_name": "unsloth/Qwen2.5-7B-Instruct",
105
+ "run_label": "sft_qwen25_7b_curriculum400_v1",
106
+ "save_method": "lora",
107
+ "sft_steps": 220,
108
+ "traces": "/tmp/CarbonAlphaQwen25SFT/code/sft_traces/curriculum_400_e80_m160_h160.jsonl",
109
+ "v6_sft_mean_regret_bar": 0.034
110
+ }