SAISON17 (Sejung Son)

upvoted an article 8 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

upvoted 3 articles 9 months ago

Article

Vision Language Model Alignment in TRL ⚡️

+3

sergiopaniego, merve, qgallouedec, kashif, ariG23498

•

Aug 7, 2025

• 112

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

+5

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

Article

Gaia2 and ARE: Empowering the community to study agents

+9

clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter

•

Sep 22, 2025

• 136

upvoted 3 articles 10 months ago

Article

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

RakshitAralimatti

•

Aug 8, 2025

• 36

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

not-lain

•

Jan 30, 2025

• 351

Article

ChatML vs Harmony: Understanding the new Format from OpenAI 🔍

kuotient

•

Aug 9, 2025

• 59

upvoted 3 articles 11 months ago

Article

Decoding Strategies in Large Language Models

mlabonne

•

Oct 29, 2024

• 114

Article

Mixture of Experts Explained

+4

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 1.15k

Article

Assisted Generation: a new direction toward low-latency text generation

joaogante

•

May 11, 2023

• 79

upvoted an article about 1 year ago

Article

Fine-tuning Llama 2 70B using PyTorch FSDP

+2

smangrul, sgugger, lewtun, philschmid

•

Sep 13, 2023

• 32

upvoted a collection about 1 year ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29, 2025 • 739

Sejung Son

AI & ML interests

Organizations

Why Did MiniMax M2 End Up as a Full Attention Model?

Vision Language Model Alignment in TRL ⚡️

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Gaia2 and ARE: Empowering the community to study agents

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

KV Caching Explained: Optimizing Transformer Inference Efficiency

ChatML vs Harmony: Understanding the new Format from OpenAI 🔍

Decoding Strategies in Large Language Models

Mixture of Experts Explained

Assisted Generation: a new direction toward low-latency text generation

Fine-tuning Llama 2 70B using PyTorch FSDP

Llama 4

Sejung Son

AI & ML interests

Organizations

SAISON17's activity

Why Did MiniMax M2 End Up as a Full Attention Model?

Vision Language Model Alignment in TRL ⚡️

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Gaia2 and ARE: Empowering the community to study agents

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

KV Caching Explained: Optimizing Transformer Inference Efficiency

ChatML vs Harmony: Understanding the new Format from OpenAI 🔍

Decoding Strategies in Large Language Models

Mixture of Experts Explained

Assisted Generation: a new direction toward low-latency text generation

Fine-tuning Llama 2 70B using PyTorch FSDP