chewkokwah (Chew Kok Wah)

upvoted an article 6 months ago

Article

What makes good reasoning data

MiniMax-AI

•

Oct 30, 2025

• 45

upvoted 3 articles 7 months ago

Article

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

RakshitAralimatti

•

Aug 8, 2025

• 36

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

lysandre, ArthurZ, cyrilvallez, reach-vb

•

Dec 1, 2025

• 312

Article

Announcing New Hugging Face and KerasHub integration

ariG23498

•

Jul 10, 2024

• 6

upvoted a collection 7 months ago

MathArena Outputs

Collection

Outputs of models on the MathArena Benchmark. • 30 items • Updated May 4 • 1

upvoted an article 8 months ago

Article

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

exploding-gradients

•

Sep 16, 2025

• 21

upvoted a paper 8 months ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 183

upvoted an article 8 months ago

Article

On the Shifting Global Compute Landscape

huggingface

•

Oct 29, 2025

• 62

upvoted 2 papers 8 months ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29, 2025 • 81

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 343

upvoted 3 articles 8 months ago

Article

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

+3

smohammadi, siro1, winglian, marcsun13, djsaunde

•

Aug 8, 2025

• 99

Article

Sentence Transformers is joining Hugging Face!

tomaarsen

•

Oct 22, 2025

• 88

Article

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

danielhanchen

•

Jan 10, 2024

• 77

upvoted an article 9 months ago

Article

How to Run a Hugging Face Model in JAX (Part 1)

qihqi

•

Jul 20, 2025

• 31

upvoted an article 10 months ago

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

+5

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

upvoted a paper 10 months ago

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Paper • 2410.07985 • Published Oct 10, 2024 • 32

upvoted a paper 12 months ago

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Paper • 2507.16812 • Published Jul 22, 2025 • 65

upvoted an article 12 months ago

Article

OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models

nvidia

•

Jul 18, 2025

• 51

upvoted a paper 12 months ago

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Paper • 2507.09075 • Published Jul 11, 2025 • 19

upvoted an article 12 months ago

Article

Ettin Suite: SoTA Paired Encoders and Decoders

+4

orionweller, kdricci, mmarone, NohTow, dlawrie, vandurme

•

Jul 16, 2025

• 81

Chew Kok Wah

AI & ML interests

Organizations

What makes good reasoning data

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

Transformers v5: Simple model definitions powering the AI ecosystem

Announcing New Hugging Face and KerasHub integration

MathArena Outputs

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

On the Shifting Global Compute Landscape

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Qwen3 Technical Report

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Sentence Transformers is joining Hugging Face!

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

How to Run a Hugging Face Model in JAX (Part 1)

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Ettin Suite: SoTA Paired Encoders and Decoders

Chew Kok Wah

AI & ML interests

Organizations

chewkokwah's activity

What makes good reasoning data

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

Transformers v5: Simple model definitions powering the AI ecosystem

Announcing New Hugging Face and KerasHub integration

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

On the Shifting Global Compute Landscape

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Sentence Transformers is joining Hugging Face!

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

How to Run a Hugging Face Model in JAX (Part 1)

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models

Ettin Suite: SoTA Paired Encoders and Decoders