|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: peft |
|
|
tags: |
|
|
- forecasting |
|
|
- prediction |
|
|
- reinforcement-learning |
|
|
- grpo |
|
|
- lora |
|
|
- mixture-of-experts |
|
|
- politics |
|
|
- trump |
|
|
- future-as-label |
|
|
datasets: |
|
|
- LightningRodLabs/WWTD-2025 |
|
|
base_model: openai/gpt-oss-120b |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: Trump-Forecaster |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Probabilistic Forecasting |
|
|
dataset: |
|
|
name: WWTD-2025 |
|
|
type: LightningRodLabs/WWTD-2025 |
|
|
split: test |
|
|
metrics: |
|
|
- type: brier_score |
|
|
value: 0.194 |
|
|
name: Brier Score |
|
|
- type: ece |
|
|
value: 0.079 |
|
|
name: Expected Calibration Error |
|
|
--- |
|
|
|
|
|
# Trump-Forecaster |
|
|
|
|
|
### RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions |
|
|
|
|
|
Starting from nothing but 5 search queries, we used the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) to automatically generate [2,108 forecasting questions](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) from news articles, label them using real outcomes, and train this model via RL. **No expertise required. No manual labeling. No domain-specific engineering.** The result beats GPT-5 on held-out questions. |
|
|
|
|
|
You can do this in any domain — just change the search queries. See [how we built the dataset](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025). |
|
|
|
|
|
This repo contains a **LoRA adapter** for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b). A standalone `merge.py` script is included to merge it into a full model. |
|
|
|
|
|
--- |
|
|
|
|
|
## Results |
|
|
|
|
|
Evaluated on 682 held-out test questions under two conditions: with news context, and without context (question only). The no-context condition reveals whether the model knows what it doesn't know—untrained models project false confidence, while RL training fixes overconfidence. |
|
|
|
|
|
| Model | Brier (With Context) | BSS | Brier (No Context) | BSS | ECE (With Context) | ECE (No Context) | |
|
|
|-------|:---:|:---:|:---:|:---:|:---:|:---:| |
|
|
| GPT-5 | 0.200 | +0.14 | 0.258 | -0.11 | 0.091 | 0.191 | |
|
|
| gpt-oss-120b (base) | 0.213 | +0.08 | 0.260 | -0.12 | 0.111 | 0.190 | |
|
|
| **Trump-Forecaster** | **0.194** | **+0.16** | **0.242** | **-0.04** | **0.079** | **0.164** | |
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
### Metrics |
|
|
|
|
|
- **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate—positive means the model learned something useful beyond historical frequency. |
|
|
- **Expected Calibration Error (ECE)**: Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4) |
|
|
- **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer) |
|
|
- **LoRA rank**: 32 |
|
|
- **Learning rate**: 4e-5 |
|
|
- **Batch size**: 32, group size 8 |
|
|
- **Training steps**: 50 |
|
|
- **Max tokens**: 16,384 |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included. |
|
|
|
|
|
### Merge into full model |
|
|
|
|
|
```bash |
|
|
pip install torch transformers safetensors tqdm huggingface-hub |
|
|
python merge.py --output ./trump-forecaster-merged |
|
|
``` |
|
|
|
|
|
This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model. |
|
|
|
|
|
### Inference |
|
|
|
|
|
```python |
|
|
import sglang as sgl |
|
|
|
|
|
engine = sgl.Engine( |
|
|
model_path="./trump-forecaster-merged", |
|
|
tokenizer_path="openai/gpt-oss-120b", |
|
|
trust_remote_code=True, |
|
|
dtype="bfloat16", |
|
|
tp_size=2, |
|
|
) |
|
|
|
|
|
news_context = "... relevant news articles ..." |
|
|
|
|
|
prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes". |
|
|
|
|
|
Question: Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025? |
|
|
|
|
|
Context: |
|
|
{news_context} |
|
|
|
|
|
Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags.""" |
|
|
|
|
|
output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]}) |
|
|
print(output["text"]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Links |
|
|
|
|
|
- **Dataset**: [LightningRodLabs/WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) |
|
|
- **Training platform**: [Tinker](https://tinker.computer) |
|
|
- **Data generation**: [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) |
|
|
- **Future-as-Label paper**: [arxiv:2601.06336](https://arxiv.org/abs/2601.06336) |
|
|
- **Outcome-based RL paper**: [arxiv:2505.17989](https://arxiv.org/abs/2505.17989) |
|
|
|