how2everything
/

how2judge

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- how2everything
+- evaluation
+- llm-judge
+---
+# How2Judge
+**How2Judge** is an open 8B judge model introduced in the paper [How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs](https://huggingface.co/papers/2602.08808).
+It is designed to reliably score goal-conditioned "how-to" procedures generated by LLMs. Distilled from a frontier model, How2Judge detects "critical failures"—such as missing prerequisites, incorrect step ordering, or omissions—that would prevent a user from successfully achieving a goal. In evaluation, it achieves 80.5% agreement with human annotators, providing a low-cost and reproducible alternative to human evaluation or frontier model judging.
+## Resources
+- **Paper:** [How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs](https://huggingface.co/papers/2602.08808)
+- **GitHub Repository:** [lilakk/how2everything](https://github.com/lilakk/how2everything)
+- **Project Blog:** [Allen Institute for AI - How2Everything](https://allenai.org/blog/how2everything)
+## Citation
+```bibtex
+@misc{chang2026how2everythingminingwebhowto,
+      title={How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs},
+      author={Yapei Chang and Kyle Lo and Mohit Iyyer and Luca Soldaini},
+      year={2026},
+      eprint={2602.08808},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2602.08808},
+}
+```