Spaces:

AutoTTS
/

README

Configuration error

App Files Files Community

TongZheng1999 commited on 15 days ago

Commit

23d2514

verified ·

1 Parent(s): cab2878

Update README.md

Browse files

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -8,3 +8,51 @@ pinned: false
 ---
 Edit this `README.md` markdown file to author your organization card.

 ---
 Edit this `README.md` markdown file to author your organization card.
+---
+title: README
+emoji: 🔥
+colorFrom: indigo
+colorTo: purple
+sdk: static
+pinned: false
+---
+# AutoTTS
+**LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling**
+An environment-driven framework that automatically discovers test-time scaling (TTS) strategies, shifting the human role from hand-crafting branching, pruning, and stopping heuristics to constructing discovery environments where TTS strategies can be discovered automatically.
+## 📄 Paper
+**[LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling](https://github.com/zhengkid/AutoTTS)**
+Tong Zheng¹, Haolin Liu², Chengsong Huang³, Huiwen Bao, Sheng Zhang¹, Rui Liu¹, Runpeng Dai⁴, Ruibo Chen¹, Chenxi Liu¹, Tianyi Xiong¹, Xidong Wu⁵, Hongming Zhang⁶, Heng Huang¹
+¹University of Maryland  ²University of Virginia  ³Washington University in St. Louis  ⁴University of North Carolina  ⁵Google  ⁶Meta
+## ✨ Highlights
+- **Environment-driven discovery**: Reframes TTS strategy design as an automated search problem over a structured control space, rather than hand-crafted heuristics.
+- **Offline replay environment**: Pre-collects reasoning trajectories and probe signals so candidate controllers can be evaluated cheaply without repeated LLM calls.
+- **Beta parameterization**: Collapses all internal hyperparameters into a single scalar β, making the search tractable and reducing overfitting.
+- **Execution trace feedback**: Fine-grained traces help the explorer agent diagnose *why* a controller fails, not just whether it failed.
+- **Affordable**: The entire discovery process costs only **$39.9** and **160 minutes**.
+- **Strong results**: Discovered controllers improve the accuracy–cost Pareto frontier over strong handcrafted baselines (SC@64, ASC, ESC, Parallel-Probe) and generalize to held-out benchmarks (AIME25, HMMT25, GPQA-Diamond) and model scales (Qwen3-0.6B/1.7B/4B/8B, DeepSeek-R1-Distill-Llama-8B).
+## 🔗 Links
+- 💻 **Code**: [github.com/zhengkid/AutoTTS](https://github.com/zhengkid/AutoTTS)
+## 📝 Citation
+If you find our work useful, please cite:
+```bibtex
+@article{zheng2026autotts,
+  title={LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling},
+  author={Zheng, Tong and Liu, Haolin and Huang, Chengsong and Bao, Huiwen and Zhang, Sheng and Liu, Rui and Dai, Runpeng and Chen, Ruibo and Liu, Chenxi and Xiong, Tianyi and Wu, Xidong and Zhang, Hongming and Huang, Heng},
+  year={2026}
+}
+```