Edd's picture

Edd

Erland

·

AI & ML interests

None yet

Recent Activity

updated a model 4 days ago

Erland/mini-glm-moe

updated a model 4 days ago

Erland/mini-minimax-m2

updated a dataset 4 days ago

Erland/Reverse-Text-SFT

View all activity

Organizations

None yet

upvoted an article 28 days ago

Article

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

+6

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego

•

30 days ago

• 42

upvoted a paper about 2 months ago

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Paper • 2604.26687 • Published Apr 29 • 2

upvoted 3 papers 3 months ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 204

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 116

LinguDistill: Recovering Linguistic Ability in Vision- Language Models via Selective Cross-Modal Distillation

Paper • 2604.00829 • Published Apr 1 • 8

upvoted a paper 6 months ago

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published Dec 16, 2025 • 44

upvoted a collection 7 months ago

Ministral 3

Mistral Ministral 3: new multimodal models in Base, Instruct, and Reasoning variants, available in 3B, 8B, and 14B sizes. • 36 items • Updated 10 days ago • 35

upvoted 2 articles 7 months ago

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

+5

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

Article

Continuous batching from first principles

+1

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 411

upvoted a paper 9 months ago

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Paper • 2509.14008 • Published Sep 17, 2025 • 90

upvoted a paper 10 months ago

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1, 2025 • 62

upvoted a collection 10 months ago

ByteDance Papers

ByteDance papers collection • 142 items • Updated 3 days ago • 35

upvoted a paper 10 months ago

Predicting the Order of Upcoming Tokens Improves Language Modeling

Paper • 2508.19228 • Published Aug 26, 2025 • 23

upvoted an article 10 months ago

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

tngtech

•

Apr 16, 2025

• 81

upvoted a collection 12 months ago

Indonesian Text Similarity Dataset

This collection contains currated text similarity datasets that are available in huggingface dataset • 16 items • Updated Jul 11, 2025 • 6

upvoted an article 12 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

+4

toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar

•

Jun 3, 2025

• 101

upvoted a collection 12 months ago

Gemma 3n

Google Gemma 3n models, all versions including Dynamic GGUF, 4-bit, 16-bit and formats! • 10 items • Updated 10 days ago • 28

upvoted 3 papers about 1 year ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29, 2025 • 31

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Paper • 2504.17768 • Published Apr 24, 2025 • 14