Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models Paper • 2307.10236 • Published Jul 16, 2023 • 1
TESTEVAL: Benchmarking Large Language Models for Test Case Generation Paper • 2406.04531 • Published Jun 6, 2024 • 1
Evaluating LLMs on Sequential API Call Through Automated Test Generation Paper • 2507.09481 • Published Jul 13, 2025
Risk Assessment Framework for Code LLMs via Leveraging Internal States Paper • 2504.14640 • Published Apr 20, 2025