Layer by layer, module by module: Choose both for optimal OOD probing of ViT Paper • 2603.05280 • Published 29 days ago
Layer by layer, module by module: Choose both for optimal OOD probing of ViT Paper • 2603.05280 • Published 29 days ago
Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift Paper • 2401.08909 • Published Jan 17, 2024
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published Aug 28, 2025 • 11
CauKer: classification time series foundation models can be pretrained on synthetic data only Paper • 2508.02879 • Published Aug 4, 2025
Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published Feb 6 • 4
Optimal Self-Consistency for Efficient Reasoning with Large Language Models Paper • 2511.12309 • Published Nov 15, 2025
Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published Feb 6 • 4
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods Paper • 2502.01384 • Published Feb 3, 2025 • 2
LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection Paper • 2510.26510 • Published Oct 30, 2025 • 2
LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection Paper • 2510.26510 • Published Oct 30, 2025 • 2