kaizuberbuehler 's Collections LM Capabilities and Scaling
updated
Compression Represents Intelligence Linearly
Paper
• 2404.09937
• Published • 28
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
• 2404.06395
• Published • 24
Long-context LLMs Struggle with Long In-context Learning
Paper
• 2404.02060
• Published • 37
Are large language models superhuman chemists?
Paper
• 2404.01475
• Published • 19
FlowMind: Automatic Workflow Generation with LLMs
Paper
• 2404.13050
• Published • 34
Capabilities of Gemini Models in Medicine
Paper
• 2404.18416
• Published • 25
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
• 2405.12107
• Published • 29
On the Planning Abilities of Large Language Models -- A Critical
Investigation
Paper
• 2305.15771
• Published • 1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
• 2406.09170
• Published • 27
MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding
Paper
• 2406.09411
• Published • 19
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo
Tree Self-refine with LLaMa-3 8B
Paper
• 2406.07394
• Published • 29
GEB-1.3B: Open Lightweight Large Language Model
Paper
• 2406.09900
• Published • 21
Mixture of A Million Experts
Paper
• 2407.04153
• Published • 5
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
• 2404.05405
• Published • 10
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
• 2408.06195
• Published • 73
Attention Heads of Large Language Models: A Survey
Paper
• 2409.03752
• Published • 92
HelloBench: Evaluating Long Text Generation Capabilities of Large
Language Models
Paper
• 2409.16191
• Published • 41
Making Text Embedders Few-Shot Learners
Paper
• 2409.15700
• Published • 29
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
• 2406.14546
• Published • 3
Are Your LLMs Capable of Stable Reasoning?
Paper
• 2412.13147
• Published • 93
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
• 2501.01257
• Published • 51
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
• 2501.01264
• Published • 26
Paper
• 2412.04315
• Published • 19
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
Quantized LLMs with 100T Training Tokens
Paper
• 2411.17691
• Published • 13
PokerBench: Training Large Language Models to become Professional Poker
Players
Paper
• 2501.08328
• Published • 19
Do generative video models learn physical principles from watching
videos?
Paper
• 2501.09038
• Published • 34
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for
Mixture-of-Experts Language Models
Paper
• 2501.12370
• Published • 11
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
• 2501.16975
• Published • 32
Large Language Models Think Too Fast To Explore Effectively
Paper
• 2501.18009
• Published • 23
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of
Physical Concept Understanding
Paper
• 2502.08946
• Published • 192
Scaling Embedding Layers in Language Models
Paper
• 2502.01637
• Published • 24
Great Models Think Alike and this Undermines AI Oversight
Paper
• 2502.04313
• Published • 33
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
• 2502.07617
• Published • 29
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Paper
• 2502.06857
• Published • 24
Distillation Scaling Laws
Paper
• 2502.08606
• Published • 47
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Paper
• 2502.05167
• Published • 16
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the
Limits of Embedding Space Capacity
Paper
• 2502.13063
• Published • 74
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based
Perspective
Paper
• 2502.17262
• Published • 22
Gemini Robotics: Bringing AI into the Physical World
Paper
• 2503.20020
• Published • 31
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Paper
• 2503.07604
• Published • 23
Shifting Long-Context LLMs Research from Input to Output
Paper
• 2503.04723
• Published • 22
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper
• 2503.04872
• Published • 15
Inside-Out: Hidden Factual Knowledge in LLMs
Paper
• 2503.15299
• Published • 56
A Comprehensive Survey on Long Context Language Modeling
Paper
• 2503.17407
• Published • 49
RL Tango: Reinforcing Generator and Verifier Together for Language
Reasoning
Paper
• 2505.15034
• Published • 5
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published • 58
Scaling Analysis of Interleaved Speech-Text Language Models
Paper
• 2504.02398
• Published • 31
How Many Instructions Can LLMs Follow at Once?
Paper
• 2507.11538
• Published • 2
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published • 80
Scaling Laws for Native Multimodal Models Scaling Laws for Native
Multimodal Models
Paper
• 2504.07951
• Published • 30
BitNet b1.58 2B4T Technical Report
Paper
• 2504.12285
• Published • 85
ColorBench: Can VLMs See and Understand the Colorful World? A
Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
Paper
• 2504.10514
• Published • 48
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through
the Lens of Internal Representations
Paper
• 2504.13816
• Published • 18