Wav2Gloss: Generating Interlinear Glossed Text from Speech Paper • 2403.13169 • Published Mar 19, 2024
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models Paper • 2406.09282 • Published Jun 13, 2024
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Paper • 2409.09506 • Published Sep 14, 2024 • 4
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 5
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder Paper • 2507.14129 • Published Jul 18, 2025 • 11
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Paper • 2510.24992 • Published Oct 28, 2025 • 4
PRiSM: Benchmarking Phone Realization in Speech Models Paper • 2601.14046 • Published 19 days ago • 6
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think Paper • 2505.10185 • Published May 15, 2025 • 26
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 46
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 47
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Paper • 2411.14199 • Published Nov 21, 2024 • 33
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22, 2024 • 14
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 39
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 30