Proof Assistant Projects Collection Digesting proof assistant libraries for AI ingestion. • 84 items • Updated 11 days ago • 3
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity Paper • 2509.20293 • Published Sep 24, 2025 • 8
Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models Paper • 2508.12461 • Published Aug 17, 2025 • 2
Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement Paper • 2504.16136 • Published Apr 21, 2025
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity Paper • 2509.20293 • Published Sep 24, 2025 • 8