4 10 10

Joseph Lee

jiosephlee

jiosephlee

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

jiosephlee/starling-transfer-shared-eval-same-species-v2-no-source-value

published a dataset 1 day ago

jiosephlee/starling-transfer-shared-eval-same-species-v2-no-source-value

updated a dataset 1 day ago

jiosephlee/starling-transfer-shared-eval-same-species-v2-source-value

View all activity

Organizations

None yet

upvoted an article 4 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

NormalUhr

•

Aug 9, 2025

• 128

upvoted 2 articles 5 months ago

Article

SFT with vLLM Downstream Evaluation: A VRAM-Efficient Pipeline (arm64)

AlioLeuchtmann

•

Jan 11

• 3

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

•

Jan 27

• 80

upvoted 2 articles 6 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar

•

Jun 3, 2025

• 101

Article

Tool Use, Unified

Rocketknight1

•

Aug 12, 2024

• 121

upvoted 3 articles 12 months ago

Article

Fixing Gradient Accumulation

lysandre, ArthurZ, muellerzr, ydshieh, BenjaminB, pcuenq

•

Oct 16, 2024

• 66

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra

•

Aug 21, 2024

• 41

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

sirluk

•

Oct 7, 2024

• 71

upvoted 2 articles about 1 year ago

Article

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

davidberenstein1957

•

May 7, 2025

• 42

Article

Saving Memory Using Padding-Free Transformer Layers during Finetuning

mayank-mishra

•

Jun 11, 2024

• 21

Joseph Lee

AI & ML interests

Recent Activity

Organizations

jiosephlee's activity

From GRPO to DAPO and GSPO: What, Why, and How

SFT with vLLM Downstream Evaluation: A VRAM-Efficient Pipeline (arm64)

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

Tool Use, Unified

Fixing Gradient Accumulation

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

Saving Memory Using Padding-Free Transformer Layers during Finetuning