🔄 In a Training Loop

8 12 23

Louis Ulmer

lulmer

lulmer

AI & ML interests

NLP (semantic search, topic generation) Computer vision (object detection) Diffusion Models

Recent Activity

upvoted a paper 20 days ago

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

liked a model about 2 months ago

deepseek-ai/DeepSeek-V4-Pro

liked a model about 2 months ago

poolside/Laguna-XS.2

View all activity

Organizations

upvoted a paper 20 days ago

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

Paper • 2606.03458 • Published 23 days ago • 65

liked 2 models about 2 months ago

deepseek-ai/DeepSeek-V4-Pro

Text Generation • 862B • Updated 3 days ago • 2.05M • • 5.05k

poolside/Laguna-XS.2

Text Generation • 33B • Updated 6 days ago • 234k • 312

liked a model 5 months ago

zai-org/GLM-4.7-Flash

Text Generation • 31B • Updated Jan 29 • 2.08M • • 1.76k

New activity in stas/openwebtext-10k 5 months ago

Convert dataset to Parquet

#3 opened 5 months ago by

lulmer

upvoted a paper 5 months ago

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 53

liked a dataset 6 months ago

khaihernlow/financial-reports-sec

Updated Jan 6, 2023 • 221 • 2

upvoted a paper 10 months ago

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193

upvoted a paper 11 months ago

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper • 2507.10524 • Published Jul 14, 2025 • 74

upvoted an article 12 months ago

Article

Bringing Fusion Down to Earth: ML for Stellarator Optimization

cgeorgiaw

•

Jul 2, 2025

• 81

liked a model about 1 year ago

black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Jun 27, 2025 • 1.11M • • 13.3k

New activity in Qwen/Qwen2.5-VL-7B-Instruct about 1 year ago

Exception: Could not find the transformer layer class to wrap in the model.

👍 4

#2 opened over 1 year ago by

atishay-scribe

upvoted an article about 1 year ago

Article

🐯 Liger GRPO meets TRL

shisahni, kashif, smohammadi, ShirinYamani, m0m0chen, liberty4321

•

May 25, 2025

• 54

liked a Space about 1 year ago

The Ultra-Scale Playbook

🌌

3.9k

The ultimate guide to training LLM on large GPU Clusters

upvoted an article about 1 year ago

Article

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

tiiuae

•

May 21, 2025

• 39

liked 5 datasets about 1 year ago

Louis Ulmer

AI & ML interests

Recent Activity

Organizations

lulmer's activity

Convert dataset to Parquet

Bringing Fusion Down to Earth: ML for Stellarator Optimization

Exception: Could not find the transformer layer class to wrap in the model.

🐯 Liger GRPO meets TRL

The Ultra-Scale Playbook

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance