🔄 In a Training Loop

2 41 529

Yash Marathe

yashmarathe

AI & ML interests

None yet

Recent Activity

upvoted an article 6 days ago

How NVIDIA Builds Open Data for AI

liked a dataset 6 days ago

nvidia/PhysicalAI-Robotics-Open-H-Embodiment

liked a model 9 days ago

poolside/Laguna-M.1

View all activity

Organizations

upvoted an article 6 days ago

Article

How NVIDIA Builds Open Data for AI

nvidia

•

Mar 10

• 17

upvoted a collection 18 days ago

datasets

Collection

Reasoning Core ◉ Pre-generated symbolic reasoning data, from pre-training pile to post-training environments • 5 items • Updated Mar 3 • 4

upvoted a collection 22 days ago

Reasoning Datasets

Collection

50 items • Updated Jun 8, 2025 • 12

upvoted an article about 1 month ago

Article

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego

•

May 27

• 42

upvoted 2 articles 4 months ago

Article

Introducing Storage Buckets on the Hugging Face Hub

Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner

•

Mar 10

• 196

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

upvoted a collection 5 months ago

Open Coding Agents

Collection

13 items • Updated Mar 5 • 54

upvoted an article 7 months ago

Article

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

smohammadi, siro1, winglian, marcsun13, djsaunde

•

Aug 8, 2025

• 99

upvoted a paper 7 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39

upvoted a paper 8 months ago

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48

upvoted 2 articles 9 months ago

Article

Gaia2 and ARE: Empowering the community to study agents

clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter

•

Sep 22, 2025

• 136

Article

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

exploding-gradients

•

Sep 16, 2025

• 21

upvoted a paper 9 months ago

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 73

upvoted 3 collections 12 months ago

upvoted 4 collections about 1 year ago

Avey 1 Research Preview

Collection

1.5B preview models trained on 100B tokens of FineWeb, and an instruct-tuned version on smoltalk. • 3 items • Updated Jun 16, 2025 • 7

V-JEPA 2

Collection

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 225

Falcon-H1

Collection

Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 33 items • Updated Mar 2 • 59

LipSync and Face Operations

Collection

24 items • Updated May 24 • 65

Yash Marathe

AI & ML interests

Recent Activity

Organizations

yashmarathe's activity

How NVIDIA Builds Open Data for AI

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Introducing Storage Buckets on the Hugging Face Hub

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Gaia2 and ARE: Empowering the community to study agents

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation