LightningRodLabs
/

Trump-Forecaster

@@ -2,7 +2,7 @@
 language:
 - en
 license: apache-2.0
-library_name: transformers
 tags:
 - forecasting
 - prediction
@@ -40,7 +40,9 @@ model-index:
 ### RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions
-We fine-tuned [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with reinforcement learning to predict Trump administration actions. Trained on the [WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) dataset of 2,108 binary forecasting questions generated with the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk), the model beats GPT-5 on held-out forecasting questions.
 [Dataset](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
@@ -53,8 +55,8 @@ Evaluated on 682 held-out test questions under two conditions: with news context
 | Model | Brier (With Context) | BSS | Brier (No Context) | BSS | ECE (With Context) | ECE (No Context) |
 |-------|:---:|:---:|:---:|:---:|:---:|:---:|
 | GPT-5 | 0.200 | +0.14 | 0.258 | -0.11 | 0.091 | 0.191 |
-| gpt-oss-120b | 0.213 | +0.08 | 0.260 | -0.12 | 0.111 | 0.190 |
-| **gpt-oss-120b RL (this model)** | **0.194** | **+0.16** | **0.242** | **-0.04** | **0.079** | **0.164** |
 ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025/resolve/main/brier_skill_score.png)
@@ -83,16 +85,31 @@ Evaluated on 682 held-out test questions under two conditions: with news context
 ## Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "LightningRodLabs/Trump-Forecaster",
-    torch_dtype="auto",
-    device_map="auto",
     trust_remote_code=True,
 )
-tokenizer = AutoTokenizer.from_pretrained("LightningRodLabs/Trump-Forecaster", trust_remote_code=True)
 prompt = """You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
@@ -100,18 +117,26 @@ Question: Will Trump impose 25% tariffs on all goods from Canada by February 1,
 Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=0.7)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
-For faster inference with the MoE architecture, use [SGLang](https://github.com/sgl-project/sglang):
 ```python
-import sglang as sgl
-engine = sgl.Engine(model_path="LightningRodLabs/Trump-Forecaster", trust_remote_code=True, dtype="bfloat16")
-output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
 ```
 ---

 language:
 - en
 license: apache-2.0
+library_name: peft
 tags:
 - forecasting
 - prediction
 ### RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions
+We fine-tuned [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with reinforcement learning to predict Trump administration actions. Trained on the [WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) dataset of 2,108 binary forecasting questions generated with the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk), Trump-Forecaster beats GPT-5 on held-out forecasting questions.
+This repo contains a **LoRA adapter** (5.3 GB) for gpt-oss-120b. A standalone `merge.py` script is included to produce a full merged model if needed.
 [Dataset](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
 | Model | Brier (With Context) | BSS | Brier (No Context) | BSS | ECE (With Context) | ECE (No Context) |
 |-------|:---:|:---:|:---:|:---:|:---:|:---:|
 | GPT-5 | 0.200 | +0.14 | 0.258 | -0.11 | 0.091 | 0.191 |
+| gpt-oss-120b (base) | 0.213 | +0.08 | 0.260 | -0.12 | 0.111 | 0.190 |
+| **Trump-Forecaster** | **0.194** | **+0.16** | **0.242** | **-0.04** | **0.079** | **0.164** |
 ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025/resolve/main/brier_skill_score.png)
 ## Usage
+This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.
+### Merge into full model
+```bash
+pip install torch transformers safetensors tqdm huggingface-hub
+python merge.py --output ./trump-forecaster-merged
+```
+This downloads the base model (MXFP4, ~30 GB), dequantizes to bf16, applies the LoRA adapter, and saves the merged model (~300 GB bf16). Requires ~300 GB RAM, no GPU needed.
+### Inference with the merged model
+With [SGLang](https://github.com/sgl-project/sglang) (recommended for MoE):
 ```python
+import sglang as sgl
+engine = sgl.Engine(
+    model_path="./trump-forecaster-merged",
+    tokenizer_path="openai/gpt-oss-120b",
     trust_remote_code=True,
+    dtype="bfloat16",
+    tp_size=2,  # needs 2x 80GB GPUs for bf16
 )
 prompt = """You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
 Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
+output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
+print(output["text"])
 ```
+Or with transformers:
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "./trump-forecaster-merged",
+    torch_dtype="auto",
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b", trust_remote_code=True)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ---