Thomas Betton

tbetton

18 11

thomasbtnfr

AI & ML interests

None yet

Recent Activity

liked a model 19 days ago

zai-org/GLM-5.2-FP8

upvoted an article about 2 months ago

Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step

upvoted an article about 2 months ago

EMO: Pretraining mixture of experts for emergent modularity

View all activity

Organizations

upvoted 3 articles about 2 months ago

Article

Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step

FINAL-Bench

•

May 15

• 18

Article

EMO: Pretraining mixture of experts for emergent modularity

allenai

•

May 8

• 38

Article

Unlocking asynchronicity in continuous batching

ror, pcuenq, ariG23498

•

May 14

• 61

upvoted an article 4 months ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

kashif, stas

•

Mar 9

• 30

upvoted 2 articles 6 months ago

Article

Mixture of Experts Explained

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 1.15k

Article

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

NormalUhr

•

Feb 28, 2025

• 19

upvoted an article 7 months ago

Article

Improving Prompt Consistency with Structured Generations

willkurt, remi, clefourrier

•

Apr 30, 2024

• 68

upvoted an article 12 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar

•

Jun 3, 2025

• 101

upvoted an article about 1 year ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers

tomaarsen, arthurbresnu

•

Jul 1, 2025

• 138

upvoted a paper about 1 year ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 191

upvoted 3 articles over 1 year ago

Article

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

NormalUhr

•

Feb 4, 2025

• 17

Article

4D masks support in Transformers

poedator

•

Jan 8, 2024

• 31

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

sirluk

•

Oct 7, 2024

• 71

upvoted a paper over 1 year ago

JudgeBench: A Benchmark for Evaluating LLM-based Judges

Paper • 2410.12784 • Published Oct 16, 2024 • 47

Thomas Betton

AI & ML interests

Recent Activity

Organizations

tbetton's activity

Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step

EMO: Pretraining mixture of experts for emergent modularity

Unlocking asynchronicity in continuous batching

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Mixture of Experts Explained

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

Improving Prompt Consistency with Structured Generations

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

Training and Finetuning Sparse Embedding Models with Sentence Transformers

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

4D masks support in Transformers

Efficient LLM Pretraining: Packed Sequences and Masked Attention