In a Training Loop 🔄

5 19 7

NJX-njx PRO

NJX-njx

AI & ML interests

AI infra, large model architecture, intelligent agent,evaluation

Recent Activity

upvoted a changelog about 2 months ago

Introducing hf-mount

new activity 2 months ago

bigscience/bloom:pretokenizer Regex issues?

new activity 2 months ago

bigscience/bloom:Is is feasible to use this checkpoint for multi node inference via deepspeed Zero stage-3

View all activity

Organizations

upvoted a changelog about 2 months ago

Hugging Face Changelog

Introducing hf-mount

Mar 24

• 223

upvoted 3 articles 2 months ago

Article

Custom Kernels for All from Codex and Claude

burtenshaw, sayakpaul, ariG23498, evalstate

•

Feb 13

• 75

Article

混合专家模型（MoE）详解

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 82

Article

Mixture of Experts (MoEs) in Transformers

ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap

•

Feb 26

• 159

upvoted 2 articles 3 months ago

Article

Microgpt

NJX-njx

•

Feb 12

• 1

Article

Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models

kshitijthakkar

•

Feb 9

• 1

upvoted a paper 3 months ago

ERNIE 5.0 Technical Report

Paper • 2602.04705 • Published Feb 4 • 268

upvoted an article 3 months ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

burtenshaw, SaylorTwift, kramp, merve, davanstrien, nielsr, julien-c

•

Feb 4

• 89

upvoted 9 articles 4 months ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

burtenshaw, evalstate, merve, pcuenq

•

Jan 28

• 156

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

•

Jan 27

• 74

Article

Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

huggingface

•

Jan 27

• 45

Article

One Year Since the “DeepSeek Moment”

huggingface

•

Jan 20

• 62

Article

New paradigms for scientific research paper work

NJX-njx

•

Jan 28

• 1

Article

来自OpenAI gpt-oss的技巧，你🫵在transformers中也可以使用

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 14

Article

用开源模型强化你的 OCR 工作流

merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq

•

Oct 21, 2025

• 14

Article

Codex 正在推动 AI 模型的开源与训练流程

burtenshaw, evalstate

•

Dec 11, 2025

• 13

Article

Chasing AI, Losing Meaning

NJX-njx

•

Jan 28

• 3

upvoted a collection 4 months ago

evaluation

Collection

2 items • Updated Jan 25 • 1

NJX-njx PRO

AI & ML interests

Recent Activity

Organizations

NJX-njx's activity

Introducing hf-mount

Custom Kernels for All from Codex and Claude

混合专家模型（MoE）详解

Mixture of Experts (MoEs) in Transformers

Microgpt

Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models

Community Evals: Because we're done trusting black-box leaderboards over the community

We Got Claude to Build CUDA Kernels and teach open models!

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

One Year Since the “DeepSeek Moment”

New paradigms for scientific research paper work

来自OpenAI gpt-oss的技巧，你🫵在transformers中也可以使用

用开源模型强化你的 OCR 工作流

Codex 正在推动 AI 模型的开源与训练流程

Chasing AI, Losing Meaning