Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -23,6 +23,16 @@ A trained judgment layer for autonomous scientific workflows. Starting from `Qwe
 The target capability is not general reasoning or autonomous science. It is the decision-making core that determines whether a larger scientific system behaves intelligently when search, evidence, cost, and belief updates are all coupled: selecting which candidate to investigate, evaluating whether evidence should be trusted or escalated, and revising hypotheses as conflicting results accumulate.
 ## Training
 **Base model:** Qwen3-30B-A3B-Instruct-2507 (30B total / 3B active MoE)
@@ -46,6 +56,8 @@ This release corresponds to the **step-100 merged checkpoint**.
 Primary metric: **hypothesis accuracy** -- the fraction of episodes where the model's highest-posterior hypothesis matches the oracle ground truth after all evidence rounds.
 ### Held-out learning curve (29 open-world environments, pass@1)
 | Checkpoint | Hypothesis Accuracy | Mean Reward | Parse Rate |

 The target capability is not general reasoning or autonomous science. It is the decision-making core that determines whether a larger scientific system behaves intelligently when search, evidence, cost, and belief updates are all coupled: selecting which candidate to investigate, evaluating whether evidence should be trusted or escalated, and revising hypotheses as conflicting results accumulate.
+## Release Links
+- **Paper PDF:** [Training Scientific Judgment with Verified Environments for Autonomous Science](https://github.com/Dynamical-Systems-Research/training-scientific-judgment/blob/main/paper/training-scientific-judgment.pdf)
+- **Blog post:** [Training Scientific Judgment](https://dynamicalsystems.ai/blog/training-scientific-judgment)
+- **Public repo:** [Dynamical-Systems-Research/training-scientific-judgment](https://github.com/Dynamical-Systems-Research/training-scientific-judgment)
+- **Released evaluation bundle:** [repo `data/open_world/`](https://github.com/Dynamical-Systems-Research/training-scientific-judgment/tree/main/data/open_world)
+- **Search assets:** [`Dynamical-Systems/crystalite-base`](https://huggingface.co/Dynamical-Systems/crystalite-base), [`Dynamical-Systems/crystalite-balanced`](https://huggingface.co/Dynamical-Systems/crystalite-balanced)
+This model is the released **scientific-judgment policy** used in the final paper and blog post. The associated Crystalite checkpoints are released as supporting search-side assets for the open-world campaign provenance. The default public reproducibility path uses the frozen serialized campaign bundle from the public repo.
 ## Training
 **Base model:** Qwen3-30B-A3B-Instruct-2507 (30B total / 3B active MoE)
 Primary metric: **hypothesis accuracy** -- the fraction of episodes where the model's highest-posterior hypothesis matches the oracle ground truth after all evidence rounds.
+The final public release is paired with a frozen open-world bundle containing `300` serialized campaigns overall, with the primary paper evaluation reported on the pruned reachable held-out set of `29` campaigns.
 ### Held-out learning curve (29 open-world environments, pass@1)
 | Checkpoint | Hypothesis Accuracy | Mean Reward | Parse Rate |