tmarechaux 's Collections
Language Modeling Is Compression
Paper
• 2309.10668
• Published • 85
Small-scale proxies for large-scale Transformer training instabilities
Paper
• 2309.14322
• Published • 22
Evaluating Cognitive Maps and Planning in Large Language Models with
CogEval
Paper
• 2309.15129
• Published • 7
Vision Transformers Need Registers
Paper
• 2309.16588
• Published • 86
The Consensus Game: Language Model Generation via Equilibrium Search
Paper
• 2310.09139
• Published • 14
Text Generation with Diffusion Language Models: A Pre-training Approach
with Continuous Paragraph Denoise
Paper
• 2212.11685
• Published • 2
Levels of AGI for Operationalizing Progress on the Path to AGI
Paper
• 2311.02462
• Published • 36
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 627
Scaling Instructable Agents Across Many Simulated Worlds
Paper
• 2404.10179
• Published • 28
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published • 157
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper
• 2407.01392
• Published • 44
softmax is not enough (for sharp out-of-distribution)
Paper
• 2410.01104
• Published • 1
Paper
• 2410.05258
• Published • 182
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
Hallucinations
Paper
• 2410.02707
• Published • 47
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published • 321