Add model card for How2Judge

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - how2everything
7
+ - evaluation
8
+ - llm-judge
9
+ ---
10
+
11
+ # How2Judge
12
+
13
+ **How2Judge** is an open 8B judge model introduced in the paper [How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs](https://huggingface.co/papers/2602.08808).
14
+
15
+ It is designed to reliably score goal-conditioned "how-to" procedures generated by LLMs. Distilled from a frontier model, How2Judge detects "critical failures"—such as missing prerequisites, incorrect step ordering, or omissions—that would prevent a user from successfully achieving a goal. In evaluation, it achieves 80.5% agreement with human annotators, providing a low-cost and reproducible alternative to human evaluation or frontier model judging.
16
+
17
+ ## Resources
18
+
19
+ - **Paper:** [How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs](https://huggingface.co/papers/2602.08808)
20
+ - **GitHub Repository:** [lilakk/how2everything](https://github.com/lilakk/how2everything)
21
+ - **Project Blog:** [Allen Institute for AI - How2Everything](https://allenai.org/blog/how2everything)
22
+
23
+ ## Citation
24
+
25
+ ```bibtex
26
+ @misc{chang2026how2everythingminingwebhowto,
27
+ title={How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs},
28
+ author={Yapei Chang and Kyle Lo and Mohit Iyyer and Luca Soldaini},
29
+ year={2026},
30
+ eprint={2602.08808},
31
+ archivePrefix={arXiv},
32
+ primaryClass={cs.LG},
33
+ url={https://arxiv.org/abs/2602.08808},
34
+ }
35
+ ```