30 11

Kale

Zyn123

AI & ML interests

None yet

Recent Activity

upvoted an article 2 months ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

upvoted a paper 7 months ago

Less is More: Recursive Reasoning with Tiny Networks

upvoted a paper 8 months ago

Set Block Decoding is a Language Model Inference Accelerator

View all activity

Organizations

None yet

upvoted an article 2 months ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

sirluk

•

Oct 7, 2024

• 71

upvoted a paper 7 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 514

upvoted a paper 8 months ago

Set Block Decoding is a Language Model Inference Accelerator

Paper • 2509.04185 • Published Sep 4, 2025 • 54

upvoted an article about 1 year ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Kseniase

•

Mar 17, 2025

• 357

upvoted 4 articles over 1 year ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 889

Article

Open-R1: Update #1

open-r1

•

Feb 2, 2025

• 305

Article

Mastering Tensor Dimensions in Transformers

not-lain

•

Jan 12, 2025

• 172

Article

Deriving DPO's Loss

hba123

•

Dec 24, 2024

• 30

liked 2 models over 1 year ago

Tiiny/SmallThinker-3B-Preview

Text Generation • 3B • Updated Jan 16, 2025 • 790 • • 415

onnx-community/moonshine-base-ONNX

Automatic Speech Recognition • Updated Jan 18, 2025 • 8.87k • 34

upvoted 2 articles over 1 year ago

Article

Decoding Strategies in Large Language Models

mlabonne

•

Oct 29, 2024

• 113

Article

Fine-tune Llama 2 with DPO

kashif, ybelkada, lvwerra

•

Aug 8, 2023

• 69

liked a model over 1 year ago

EmergentMethods/gliner_medium_news-v2.1

Token Classification • 0.2B • Updated Jan 12 • 280 • 82

upvoted 5 articles over 1 year ago

Article

How to build a custom text classifier without days of human labeling

sdiazlor

•

Oct 17, 2024

• 57

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

medmekk, marcsun13, lvwerra, pcuenq, osanseviero, thomwolf

•

Sep 18, 2024

• 280

Article

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

sanchit-gandhi

•

Nov 3, 2022

• 371

Article

Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging

akjindal53244

•

Aug 19, 2024

• 79

Article

Merge Large Language Models with mergekit

mlabonne

•

Jan 9, 2024

• 155

upvoted an article almost 2 years ago

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

derek-thomas, dmaniloff, drbh

•

Jul 18, 2024

• 63

liked a model almost 2 years ago

csdc-atl/dialogue-rewriter

Updated Oct 16, 2023 • 8 • 16

Kale

AI & ML interests

Recent Activity

Organizations

Zyn123's activity

Efficient LLM Pretraining: Packed Sequences and Masked Attention

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Open-R1: a fully open reproduction of DeepSeek-R1

Open-R1: Update #1

Mastering Tensor Dimensions in Transformers

Deriving DPO's Loss

Decoding Strategies in Large Language Models

Fine-tune Llama 2 with DPO

How to build a custom text classifier without days of human labeling

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging

Merge Large Language Models with mergekit

TGI Multi-LoRA: Deploy Once, Serve 30 Models