6 15

Jadon

jadodev

phase

AI & ML interests

Machine Learning, Programming Language Theory, Category Theory, Quantum Computing

Recent Activity

liked a model about 1 month ago

nvidia/Gemma-4-31B-IT-NVFP4

liked a model about 1 month ago

tencent/Sequential-Hidden-Decoding-8B-n8-Instruct

upvoted a paper about 2 months ago

Virtual Width Networks

View all activity

Organizations

None yet

liked 2 models about 1 month ago

nvidia/Gemma-4-31B-IT-NVFP4

Text Generation • 21B • Updated 7 days ago • 2.23M • • 475

tencent/Sequential-Hidden-Decoding-8B-n8-Instruct

Text Generation • 13B • Updated Mar 31 • 102 • 8

upvoted a paper about 2 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39

liked a model about 2 months ago

ByteDance/Ouro-1.4B

Text Generation • Updated Jan 18 • 54.9k • 92

liked a Space about 2 months ago

The Smol Training Playbook

📚

3.17k

The secrets to building world-class LLMs

liked a model about 2 months ago

HuggingFaceTB/FineMath-Llama-3B

3B • Updated Nov 27, 2025 • 65 • 22

liked 3 datasets about 2 months ago

upvoted a paper 6 months ago

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset

Paper • 2508.15096 • Published Aug 20, 2025 • 9

liked a model about 1 year ago

deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27, 2025 • 549k • • 3.11k

updated a collection about 2 years ago

transformer

Collection

2 items • Updated Apr 7, 2024

upvoted a paper about 2 years ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 108

liked 2 models about 2 years ago

mlabonne/phixtral-4x2_8

Text Generation • Updated Jan 15, 2024 • 95 • 209

NousResearch/Nous-Hermes-2-Mistral-7B-DPO

Text Generation • 7B • Updated Apr 30, 2024 • 1.74k • • 218

upvoted a paper about 2 years ago

Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7, 2024 • 48

updated a collection about 2 years ago

transformer

Collection

2 items • Updated Apr 7, 2024

upvoted a paper about 2 years ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 190

liked a model about 2 years ago

HuggingFaceH4/zephyr-7b-alpha

Text Generation • 7B • Updated Oct 16, 2024 • 4.09k • • 1.12k

Jadon

AI & ML interests

Recent Activity

Organizations

jadodev's activity

The Smol Training Playbook