TongZheng1999 commited on
Commit
23d2514
·
verified ·
1 Parent(s): cab2878

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -8,3 +8,51 @@ pinned: false
8
  ---
9
 
10
  Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  Edit this `README.md` markdown file to author your organization card.
11
+
12
+ ---
13
+ title: README
14
+ emoji: 🔥
15
+ colorFrom: indigo
16
+ colorTo: purple
17
+ sdk: static
18
+ pinned: false
19
+ ---
20
+
21
+ # AutoTTS
22
+
23
+ **LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling**
24
+
25
+ An environment-driven framework that automatically discovers test-time scaling (TTS) strategies, shifting the human role from hand-crafting branching, pruning, and stopping heuristics to constructing discovery environments where TTS strategies can be discovered automatically.
26
+
27
+ ## 📄 Paper
28
+
29
+ **[LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling](https://github.com/zhengkid/AutoTTS)**
30
+
31
+ Tong Zheng¹, Haolin Liu², Chengsong Huang³, Huiwen Bao, Sheng Zhang¹, Rui Liu¹, Runpeng Dai⁴, Ruibo Chen¹, Chenxi Liu¹, Tianyi Xiong¹, Xidong Wu⁵, Hongming Zhang⁶, Heng Huang¹
32
+
33
+ ¹University of Maryland ²University of Virginia ³Washington University in St. Louis ⁴University of North Carolina ⁵Google ⁶Meta
34
+
35
+ ## ✨ Highlights
36
+
37
+ - **Environment-driven discovery**: Reframes TTS strategy design as an automated search problem over a structured control space, rather than hand-crafted heuristics.
38
+ - **Offline replay environment**: Pre-collects reasoning trajectories and probe signals so candidate controllers can be evaluated cheaply without repeated LLM calls.
39
+ - **Beta parameterization**: Collapses all internal hyperparameters into a single scalar β, making the search tractable and reducing overfitting.
40
+ - **Execution trace feedback**: Fine-grained traces help the explorer agent diagnose *why* a controller fails, not just whether it failed.
41
+ - **Affordable**: The entire discovery process costs only **$39.9** and **160 minutes**.
42
+ - **Strong results**: Discovered controllers improve the accuracy–cost Pareto frontier over strong handcrafted baselines (SC@64, ASC, ESC, Parallel-Probe) and generalize to held-out benchmarks (AIME25, HMMT25, GPQA-Diamond) and model scales (Qwen3-0.6B/1.7B/4B/8B, DeepSeek-R1-Distill-Llama-8B).
43
+
44
+ ## 🔗 Links
45
+
46
+ - 💻 **Code**: [github.com/zhengkid/AutoTTS](https://github.com/zhengkid/AutoTTS)
47
+
48
+ ## 📝 Citation
49
+
50
+ If you find our work useful, please cite:
51
+
52
+ ```bibtex
53
+ @article{zheng2026autotts,
54
+ title={LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling},
55
+ author={Zheng, Tong and Liu, Haolin and Huang, Chengsong and Bao, Huiwen and Zhang, Sheng and Liu, Rui and Dai, Runpeng and Chen, Ruibo and Liu, Chenxi and Xiong, Tianyi and Wu, Xidong and Zhang, Hongming and Huang, Heng},
56
+ year={2026}
57
+ }
58
+ ```