77ethers commited on
Commit
c8f6283
·
verified ·
1 Parent(s): 9773bd5

Reorganize blog links and visuals

Browse files
Files changed (1) hide show
  1. BLOG_CARBONALPHA.md +45 -24
BLOG_CARBONALPHA.md CHANGED
@@ -1,5 +1,22 @@
1
  # CarbonAlpha: Teaching a 7B Model to Manage a Carbon-Budgeted Portfolio Through Macro Shocks
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ## Why This Problem
4
 
5
  ESG-mandated capital is now measured in tens of trillions, and the mandates are getting teeth.
@@ -14,10 +31,6 @@ And a generic LLM with a clever prompt cannot do it either — not reliably. It
14
 
15
  CarbonAlpha is our attempt to build the missing thing: a small, fine-tuned, RL-trained reasoning agent that lives inside a real portfolio environment with a real carbon budget, and learns to allocate through it.
16
 
17
- ![CarbonAlpha demo dashboard showing macro shocks, portfolio NAV, carbon budget, and allocation state](assets/blog/carbonalpha-demo-dashboard.png)
18
-
19
- *The demo frames CarbonAlpha as a live portfolio agent: one macro shock enters, the model reasons, and the environment turns that reasoning into allocations, carbon usage, NAV, and benchmark-relative outcomes.*
20
-
21
  ## The Core Bet
22
 
23
  CarbonAlpha is not a price predictor.
@@ -60,10 +73,6 @@ The harness makes the model answerable to consequences.
60
 
61
  In that sense, CarbonAlpha is not just a fine-tuned LLM. It is an **evaluation harness for carbon-aware portfolio reasoning** and a **training harness for turning macro theses into scored actions**.
62
 
63
- ![CarbonAlpha harness diagram showing simulator, action schema, guardrails, carbon budget, benchmark, reward function, validation, and evaluation](assets/blog/carbonalpha-harness.png)
64
-
65
- *Reliability comes from the system around the model: the simulator, schema, carbon budget, benchmark, reward function, validation loop, and evaluation set.*
66
-
67
  ## The Environment
68
 
69
  We built CarbonAlpha as an OpenEnv environment.
@@ -114,10 +123,6 @@ Hard shocks test second- and third-order reasoning. A rare-earth export restrict
114
 
115
  This shock structure became both the environment curriculum and the data curriculum.
116
 
117
- ![CarbonAlpha curriculum progression from easy to ambiguous to hard macro shocks](assets/blog/carbonalpha-curriculum.png)
118
-
119
- *The training curriculum moves from clean first-order cases to ambiguous mixed-signal events and then hard, nonlinear macro shocks.*
120
-
121
  ## The Training Dataset
122
 
123
  The training data was not just a bag of examples. We built it as a curriculum.
@@ -345,10 +350,6 @@ The reward stack connects directly to portfolio management:
345
 
346
  This is the core engineering move: the model is not rewarded for sounding like a portfolio manager. It is rewarded for producing actions that survive the simulator.
347
 
348
- ![GRPO simulation arena showing candidate portfolio actions scored by reward components](assets/blog/carbonalpha-grpo-arena.png)
349
-
350
- *GRPO turns candidate trade tickets into scored outcomes. The best completions are the ones that execute cleanly, respect carbon limits, reduce drawdown, and beat the benchmark.*
351
-
352
  ## Why Regret Is the Right Financial Signal
353
 
354
  Raw return is not enough. If every asset rises, a model can look good by accident.
@@ -389,10 +390,6 @@ beats baseline: 5/5
389
 
390
  We also tested a Qwen3-4B-Base branch. It passed the mechanical GRPO smoke gate, but did not beat the Qwen2.5 model. So for the demo, Qwen2.5-7B remains the stronger candidate.
391
 
392
- ![CarbonAlpha outperforming equal-weight while staying within carbon budget](assets/blog/carbonalpha-results-race.png)
393
-
394
- *The desired behavior is not just higher return. CarbonAlpha must outperform while staying inside a hard carbon budget and surviving inflation, transition shocks, physical climate risk, hedge bleed, and drawdown pressure.*
395
-
396
  ## How We Evaluated It
397
 
398
  We used three evaluation layers because no single metric tells the whole story.
@@ -545,10 +542,6 @@ The demo is designed to make the training delta visible.
545
 
546
  You can choose or edit a macro headline, then click **Plan Portfolio**. CarbonAlpha reasons live and produces an allocation.
547
 
548
- ![Live CarbonAlpha demo dashboard showing a Q7 macro headline, model reasoning, locked allocation, carbon timeline, NAV versus benchmark, and reward breakdown](assets/blog/carbonalpha-demo-live-screenshot.jpg)
549
-
550
- *A live demo run: the user edits a quarter headline, replans from that point, and watches the allocation, carbon path, NAV, and reward components update together.*
551
-
552
  The interface shows:
553
 
554
  - model reasoning;
@@ -563,6 +556,10 @@ The important thing is not just the final answer. It is watching how the trained
563
 
564
  A base model may produce plausible prose. The trained model is more likely to produce a valid action that respects the environment.
565
 
 
 
 
 
566
  ## What Still Fails
567
 
568
  The model is not perfect, and the eval caught useful weaknesses.
@@ -590,3 +587,27 @@ CarbonAlpha is one version of that idea: a small reasoning model trained to mana
590
  - Salesforce, ["Agent Harness: The Infrastructure for Reliable AI"](https://www.salesforce.com/agentforce/ai-agents/agent-harness/)
591
  - Future of Being Human, ["What we miss when we talk about AI Harnesses"](https://www.futureofbeinghuman.com/p/what-we-miss-when-we-talk-about-ai-harnesses)
592
  - rmax.ai, ["Harness Engineering Is the Primary Lever for Agent Reliability in 2025-2026"](https://rmax.ai/notes/harness-new-model-agent-systems-2026/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # CarbonAlpha: Teaching a 7B Model to Manage a Carbon-Budgeted Portfolio Through Macro Shocks
2
 
3
+ ![Live CarbonAlpha demo dashboard showing a Q7 macro headline, model reasoning, locked allocation, carbon timeline, NAV versus benchmark, and reward breakdown](assets/blog/carbonalpha-demo-live-screenshot.jpg)
4
+
5
+ *The live CarbonAlpha demo: edit a macro headline, re-plan from that quarter, and watch allocation, carbon budget, NAV, and reward components move together.*
6
+
7
+ ## Submission Links
8
+
9
+ - Live demo Space: [77ethers-carbonalpha-demo.hf.space](https://77ethers-carbonalpha-demo.hf.space/)
10
+ - Hugging Face Space repo: [huggingface.co/spaces/77ethers/CarbonAlpha-demo](https://huggingface.co/spaces/77ethers/CarbonAlpha-demo)
11
+ - Hugging Face model repo: [huggingface.co/77ethers/CarbonAlpha](https://huggingface.co/77ethers/CarbonAlpha)
12
+ - Final GRPO adapter: [grpo_qwen25_7b_adapter_phase1_100_v1](https://huggingface.co/77ethers/CarbonAlpha/tree/main/grpo_qwen25_7b_adapter_phase1_100_v1)
13
+ - SFT warm-start adapter: [sft_qwen25_7b_curriculum400_v1](https://huggingface.co/77ethers/CarbonAlpha/tree/main/sft_qwen25_7b_curriculum400_v1)
14
+ - Training dataset repo: [huggingface.co/datasets/77ethers/CarbonAlpha-train](https://huggingface.co/datasets/77ethers/CarbonAlpha-train)
15
+ - Final Colab notebook: [carbonalpha_final_pipeline.ipynb](https://colab.research.google.com/github/capabl-machines/gridops/blob/round-2/notebooks/carbonalpha_final_pipeline.ipynb)
16
+ - GitHub branch: [capabl-machines/gridops/tree/round-2](https://github.com/capabl-machines/gridops/tree/round-2)
17
+ - Model card: [README.md on Hugging Face](https://huggingface.co/77ethers/CarbonAlpha/blob/main/README.md)
18
+ - Training evidence: [loss plot](https://huggingface.co/77ethers/CarbonAlpha/blob/main/assets/loss_curve.png), [reward plot](https://huggingface.co/77ethers/CarbonAlpha/blob/main/assets/reward_curve.png), [raw GRPO log](https://huggingface.co/77ethers/CarbonAlpha/blob/main/training_logs/qwen25_grpo_phase1_100_v1.log)
19
+
20
  ## Why This Problem
21
 
22
  ESG-mandated capital is now measured in tens of trillions, and the mandates are getting teeth.
 
31
 
32
  CarbonAlpha is our attempt to build the missing thing: a small, fine-tuned, RL-trained reasoning agent that lives inside a real portfolio environment with a real carbon budget, and learns to allocate through it.
33
 
 
 
 
 
34
  ## The Core Bet
35
 
36
  CarbonAlpha is not a price predictor.
 
73
 
74
  In that sense, CarbonAlpha is not just a fine-tuned LLM. It is an **evaluation harness for carbon-aware portfolio reasoning** and a **training harness for turning macro theses into scored actions**.
75
 
 
 
 
 
76
  ## The Environment
77
 
78
  We built CarbonAlpha as an OpenEnv environment.
 
123
 
124
  This shock structure became both the environment curriculum and the data curriculum.
125
 
 
 
 
 
126
  ## The Training Dataset
127
 
128
  The training data was not just a bag of examples. We built it as a curriculum.
 
350
 
351
  This is the core engineering move: the model is not rewarded for sounding like a portfolio manager. It is rewarded for producing actions that survive the simulator.
352
 
 
 
 
 
353
  ## Why Regret Is the Right Financial Signal
354
 
355
  Raw return is not enough. If every asset rises, a model can look good by accident.
 
390
 
391
  We also tested a Qwen3-4B-Base branch. It passed the mechanical GRPO smoke gate, but did not beat the Qwen2.5 model. So for the demo, Qwen2.5-7B remains the stronger candidate.
392
 
 
 
 
 
393
  ## How We Evaluated It
394
 
395
  We used three evaluation layers because no single metric tells the whole story.
 
542
 
543
  You can choose or edit a macro headline, then click **Plan Portfolio**. CarbonAlpha reasons live and produces an allocation.
544
 
 
 
 
 
545
  The interface shows:
546
 
547
  - model reasoning;
 
556
 
557
  A base model may produce plausible prose. The trained model is more likely to produce a valid action that respects the environment.
558
 
559
+ ![CarbonAlpha training progression screenshot comparing GRPO, SFT, and base Qwen responses to the same prompt](assets/blog/carbonalpha-training-progression-screenshot.png)
560
+
561
+ *The demo also exposes the training progression directly: GRPO, SFT, and base Qwen answer the same macro prompt side by side.*
562
+
563
  ## What Still Fails
564
 
565
  The model is not perfect, and the eval caught useful weaknesses.
 
587
  - Salesforce, ["Agent Harness: The Infrastructure for Reliable AI"](https://www.salesforce.com/agentforce/ai-agents/agent-harness/)
588
  - Future of Being Human, ["What we miss when we talk about AI Harnesses"](https://www.futureofbeinghuman.com/p/what-we-miss-when-we-talk-about-ai-harnesses)
589
  - rmax.ai, ["Harness Engineering Is the Primary Lever for Agent Reliability in 2025-2026"](https://rmax.ai/notes/harness-new-model-agent-systems-2026/)
590
+
591
+ ## Visual Appendix
592
+
593
+ The images below are AI-generated concept visuals used to explain the system narrative. The real demo screenshots are shown at the top of the blog and in the demo section above.
594
+
595
+ ![CarbonAlpha demo dashboard concept showing macro shocks, portfolio NAV, carbon budget, and allocation state](assets/blog/carbonalpha-demo-dashboard.png)
596
+
597
+ *Concept 1: CarbonAlpha as a live portfolio agent.*
598
+
599
+ ![CarbonAlpha harness diagram showing simulator, action schema, guardrails, carbon budget, benchmark, reward function, validation, and evaluation](assets/blog/carbonalpha-harness.png)
600
+
601
+ *Concept 2: the harness around the model.*
602
+
603
+ ![CarbonAlpha curriculum progression from easy to ambiguous to hard macro shocks](assets/blog/carbonalpha-curriculum.png)
604
+
605
+ *Concept 3: the easy, ambiguous, and hard curriculum.*
606
+
607
+ ![GRPO simulation arena showing candidate portfolio actions scored by reward components](assets/blog/carbonalpha-grpo-arena.png)
608
+
609
+ *Concept 4: GRPO as a simulation arena for candidate allocations.*
610
+
611
+ ![CarbonAlpha outperforming equal-weight while staying within carbon budget](assets/blog/carbonalpha-results-race.png)
612
+
613
+ *Concept 5: the target behavior: outperform while staying inside the carbon budget.*