Instructions to use 77ethers/CarbonAlpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use 77ethers/CarbonAlpha with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Reorganize blog links and visuals
Browse files- BLOG_CARBONALPHA.md +45 -24
BLOG_CARBONALPHA.md
CHANGED
|
@@ -1,5 +1,22 @@
|
|
| 1 |
# CarbonAlpha: Teaching a 7B Model to Manage a Carbon-Budgeted Portfolio Through Macro Shocks
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
## Why This Problem
|
| 4 |
|
| 5 |
ESG-mandated capital is now measured in tens of trillions, and the mandates are getting teeth.
|
|
@@ -14,10 +31,6 @@ And a generic LLM with a clever prompt cannot do it either — not reliably. It
|
|
| 14 |
|
| 15 |
CarbonAlpha is our attempt to build the missing thing: a small, fine-tuned, RL-trained reasoning agent that lives inside a real portfolio environment with a real carbon budget, and learns to allocate through it.
|
| 16 |
|
| 17 |
-

|
| 18 |
-
|
| 19 |
-
*The demo frames CarbonAlpha as a live portfolio agent: one macro shock enters, the model reasons, and the environment turns that reasoning into allocations, carbon usage, NAV, and benchmark-relative outcomes.*
|
| 20 |
-
|
| 21 |
## The Core Bet
|
| 22 |
|
| 23 |
CarbonAlpha is not a price predictor.
|
|
@@ -60,10 +73,6 @@ The harness makes the model answerable to consequences.
|
|
| 60 |
|
| 61 |
In that sense, CarbonAlpha is not just a fine-tuned LLM. It is an **evaluation harness for carbon-aware portfolio reasoning** and a **training harness for turning macro theses into scored actions**.
|
| 62 |
|
| 63 |
-

|
| 64 |
-
|
| 65 |
-
*Reliability comes from the system around the model: the simulator, schema, carbon budget, benchmark, reward function, validation loop, and evaluation set.*
|
| 66 |
-
|
| 67 |
## The Environment
|
| 68 |
|
| 69 |
We built CarbonAlpha as an OpenEnv environment.
|
|
@@ -114,10 +123,6 @@ Hard shocks test second- and third-order reasoning. A rare-earth export restrict
|
|
| 114 |
|
| 115 |
This shock structure became both the environment curriculum and the data curriculum.
|
| 116 |
|
| 117 |
-

|
| 118 |
-
|
| 119 |
-
*The training curriculum moves from clean first-order cases to ambiguous mixed-signal events and then hard, nonlinear macro shocks.*
|
| 120 |
-
|
| 121 |
## The Training Dataset
|
| 122 |
|
| 123 |
The training data was not just a bag of examples. We built it as a curriculum.
|
|
@@ -345,10 +350,6 @@ The reward stack connects directly to portfolio management:
|
|
| 345 |
|
| 346 |
This is the core engineering move: the model is not rewarded for sounding like a portfolio manager. It is rewarded for producing actions that survive the simulator.
|
| 347 |
|
| 348 |
-

|
| 349 |
-
|
| 350 |
-
*GRPO turns candidate trade tickets into scored outcomes. The best completions are the ones that execute cleanly, respect carbon limits, reduce drawdown, and beat the benchmark.*
|
| 351 |
-
|
| 352 |
## Why Regret Is the Right Financial Signal
|
| 353 |
|
| 354 |
Raw return is not enough. If every asset rises, a model can look good by accident.
|
|
@@ -389,10 +390,6 @@ beats baseline: 5/5
|
|
| 389 |
|
| 390 |
We also tested a Qwen3-4B-Base branch. It passed the mechanical GRPO smoke gate, but did not beat the Qwen2.5 model. So for the demo, Qwen2.5-7B remains the stronger candidate.
|
| 391 |
|
| 392 |
-

|
| 393 |
-
|
| 394 |
-
*The desired behavior is not just higher return. CarbonAlpha must outperform while staying inside a hard carbon budget and surviving inflation, transition shocks, physical climate risk, hedge bleed, and drawdown pressure.*
|
| 395 |
-
|
| 396 |
## How We Evaluated It
|
| 397 |
|
| 398 |
We used three evaluation layers because no single metric tells the whole story.
|
|
@@ -545,10 +542,6 @@ The demo is designed to make the training delta visible.
|
|
| 545 |
|
| 546 |
You can choose or edit a macro headline, then click **Plan Portfolio**. CarbonAlpha reasons live and produces an allocation.
|
| 547 |
|
| 548 |
-

|
| 549 |
-
|
| 550 |
-
*A live demo run: the user edits a quarter headline, replans from that point, and watches the allocation, carbon path, NAV, and reward components update together.*
|
| 551 |
-
|
| 552 |
The interface shows:
|
| 553 |
|
| 554 |
- model reasoning;
|
|
@@ -563,6 +556,10 @@ The important thing is not just the final answer. It is watching how the trained
|
|
| 563 |
|
| 564 |
A base model may produce plausible prose. The trained model is more likely to produce a valid action that respects the environment.
|
| 565 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 566 |
## What Still Fails
|
| 567 |
|
| 568 |
The model is not perfect, and the eval caught useful weaknesses.
|
|
@@ -590,3 +587,27 @@ CarbonAlpha is one version of that idea: a small reasoning model trained to mana
|
|
| 590 |
- Salesforce, ["Agent Harness: The Infrastructure for Reliable AI"](https://www.salesforce.com/agentforce/ai-agents/agent-harness/)
|
| 591 |
- Future of Being Human, ["What we miss when we talk about AI Harnesses"](https://www.futureofbeinghuman.com/p/what-we-miss-when-we-talk-about-ai-harnesses)
|
| 592 |
- rmax.ai, ["Harness Engineering Is the Primary Lever for Agent Reliability in 2025-2026"](https://rmax.ai/notes/harness-new-model-agent-systems-2026/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# CarbonAlpha: Teaching a 7B Model to Manage a Carbon-Budgeted Portfolio Through Macro Shocks
|
| 2 |
|
| 3 |
+

|
| 4 |
+
|
| 5 |
+
*The live CarbonAlpha demo: edit a macro headline, re-plan from that quarter, and watch allocation, carbon budget, NAV, and reward components move together.*
|
| 6 |
+
|
| 7 |
+
## Submission Links
|
| 8 |
+
|
| 9 |
+
- Live demo Space: [77ethers-carbonalpha-demo.hf.space](https://77ethers-carbonalpha-demo.hf.space/)
|
| 10 |
+
- Hugging Face Space repo: [huggingface.co/spaces/77ethers/CarbonAlpha-demo](https://huggingface.co/spaces/77ethers/CarbonAlpha-demo)
|
| 11 |
+
- Hugging Face model repo: [huggingface.co/77ethers/CarbonAlpha](https://huggingface.co/77ethers/CarbonAlpha)
|
| 12 |
+
- Final GRPO adapter: [grpo_qwen25_7b_adapter_phase1_100_v1](https://huggingface.co/77ethers/CarbonAlpha/tree/main/grpo_qwen25_7b_adapter_phase1_100_v1)
|
| 13 |
+
- SFT warm-start adapter: [sft_qwen25_7b_curriculum400_v1](https://huggingface.co/77ethers/CarbonAlpha/tree/main/sft_qwen25_7b_curriculum400_v1)
|
| 14 |
+
- Training dataset repo: [huggingface.co/datasets/77ethers/CarbonAlpha-train](https://huggingface.co/datasets/77ethers/CarbonAlpha-train)
|
| 15 |
+
- Final Colab notebook: [carbonalpha_final_pipeline.ipynb](https://colab.research.google.com/github/capabl-machines/gridops/blob/round-2/notebooks/carbonalpha_final_pipeline.ipynb)
|
| 16 |
+
- GitHub branch: [capabl-machines/gridops/tree/round-2](https://github.com/capabl-machines/gridops/tree/round-2)
|
| 17 |
+
- Model card: [README.md on Hugging Face](https://huggingface.co/77ethers/CarbonAlpha/blob/main/README.md)
|
| 18 |
+
- Training evidence: [loss plot](https://huggingface.co/77ethers/CarbonAlpha/blob/main/assets/loss_curve.png), [reward plot](https://huggingface.co/77ethers/CarbonAlpha/blob/main/assets/reward_curve.png), [raw GRPO log](https://huggingface.co/77ethers/CarbonAlpha/blob/main/training_logs/qwen25_grpo_phase1_100_v1.log)
|
| 19 |
+
|
| 20 |
## Why This Problem
|
| 21 |
|
| 22 |
ESG-mandated capital is now measured in tens of trillions, and the mandates are getting teeth.
|
|
|
|
| 31 |
|
| 32 |
CarbonAlpha is our attempt to build the missing thing: a small, fine-tuned, RL-trained reasoning agent that lives inside a real portfolio environment with a real carbon budget, and learns to allocate through it.
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
## The Core Bet
|
| 35 |
|
| 36 |
CarbonAlpha is not a price predictor.
|
|
|
|
| 73 |
|
| 74 |
In that sense, CarbonAlpha is not just a fine-tuned LLM. It is an **evaluation harness for carbon-aware portfolio reasoning** and a **training harness for turning macro theses into scored actions**.
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
## The Environment
|
| 77 |
|
| 78 |
We built CarbonAlpha as an OpenEnv environment.
|
|
|
|
| 123 |
|
| 124 |
This shock structure became both the environment curriculum and the data curriculum.
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
## The Training Dataset
|
| 127 |
|
| 128 |
The training data was not just a bag of examples. We built it as a curriculum.
|
|
|
|
| 350 |
|
| 351 |
This is the core engineering move: the model is not rewarded for sounding like a portfolio manager. It is rewarded for producing actions that survive the simulator.
|
| 352 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 353 |
## Why Regret Is the Right Financial Signal
|
| 354 |
|
| 355 |
Raw return is not enough. If every asset rises, a model can look good by accident.
|
|
|
|
| 390 |
|
| 391 |
We also tested a Qwen3-4B-Base branch. It passed the mechanical GRPO smoke gate, but did not beat the Qwen2.5 model. So for the demo, Qwen2.5-7B remains the stronger candidate.
|
| 392 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 393 |
## How We Evaluated It
|
| 394 |
|
| 395 |
We used three evaluation layers because no single metric tells the whole story.
|
|
|
|
| 542 |
|
| 543 |
You can choose or edit a macro headline, then click **Plan Portfolio**. CarbonAlpha reasons live and produces an allocation.
|
| 544 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 545 |
The interface shows:
|
| 546 |
|
| 547 |
- model reasoning;
|
|
|
|
| 556 |
|
| 557 |
A base model may produce plausible prose. The trained model is more likely to produce a valid action that respects the environment.
|
| 558 |
|
| 559 |
+

|
| 560 |
+
|
| 561 |
+
*The demo also exposes the training progression directly: GRPO, SFT, and base Qwen answer the same macro prompt side by side.*
|
| 562 |
+
|
| 563 |
## What Still Fails
|
| 564 |
|
| 565 |
The model is not perfect, and the eval caught useful weaknesses.
|
|
|
|
| 587 |
- Salesforce, ["Agent Harness: The Infrastructure for Reliable AI"](https://www.salesforce.com/agentforce/ai-agents/agent-harness/)
|
| 588 |
- Future of Being Human, ["What we miss when we talk about AI Harnesses"](https://www.futureofbeinghuman.com/p/what-we-miss-when-we-talk-about-ai-harnesses)
|
| 589 |
- rmax.ai, ["Harness Engineering Is the Primary Lever for Agent Reliability in 2025-2026"](https://rmax.ai/notes/harness-new-model-agent-systems-2026/)
|
| 590 |
+
|
| 591 |
+
## Visual Appendix
|
| 592 |
+
|
| 593 |
+
The images below are AI-generated concept visuals used to explain the system narrative. The real demo screenshots are shown at the top of the blog and in the demo section above.
|
| 594 |
+
|
| 595 |
+

|
| 596 |
+
|
| 597 |
+
*Concept 1: CarbonAlpha as a live portfolio agent.*
|
| 598 |
+
|
| 599 |
+

|
| 600 |
+
|
| 601 |
+
*Concept 2: the harness around the model.*
|
| 602 |
+
|
| 603 |
+

|
| 604 |
+
|
| 605 |
+
*Concept 3: the easy, ambiguous, and hard curriculum.*
|
| 606 |
+
|
| 607 |
+

|
| 608 |
+
|
| 609 |
+
*Concept 4: GRPO as a simulation arena for candidate allocations.*
|
| 610 |
+
|
| 611 |
+

|
| 612 |
+
|
| 613 |
+
*Concept 5: the target behavior: outperform while staying inside the carbon budget.*
|