💼 Hiring

Florian Zimmermeister

flozi00

AI & ML interests

ASR, German LLM

Recent Activity

upvoted an article 9 days ago

Party is over: regularizing ColBERT models to fix efficient ANN methods

liked a dataset 10 days ago

MultiLlasa/Kartoffelphon-2.5M-de-ger

liked a model 10 days ago

microsoft/FastContext-1.0-4B-RL

View all activity

Organizations

$A\\Ware's profile picture$

upvoted an article 9 days ago

Article

Party is over: regularizing ColBERT models to fix efficient ANN methods

lightonai

•

10 days ago

• 23

upvoted 2 papers 24 days ago

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Paper • 2605.30409 • Published 30 days ago • 41

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published 26 days ago • 233

upvoted a paper about 2 months ago

Self-Distillation Enables Continual Learning

Paper • 2601.19897 • Published Jan 27 • 41

upvoted 2 papers 3 months ago

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Paper • 2604.05091 • Published Apr 6 • 47

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 116

upvoted a collection 3 months ago

Mistral Small 4

Collection

A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated Mar 16 • 75

upvoted an article 4 months ago

Article

Spend 80% of Your LLM Compute on Data, Not Training

maxidl

•

Feb 14

• 2

upvoted a collection 4 months ago

Qwen3.5

Collection

21 items • Updated Mar 9 • 1.69k

upvoted a paper 4 months ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 356

upvoted 2 articles 5 months ago

Article

Open Responses: What you need to know

evalstate, burtenshaw, merve, pcuenq

•

Jan 15

• 112

Article

We Got Claude to Build CUDA Kernels and teach open models!

burtenshaw, evalstate, merve, pcuenq

•

Jan 28

• 158

upvoted 2 papers 5 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 233

Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 99

upvoted 2 papers 6 months ago

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 328

Parallax: Efficient LLM Inference Service over Decentralized Environment

Paper • 2509.26182 • Published Sep 30, 2025 • 1

upvoted a collection 6 months ago

Audio2Face-3D

Collection

Open-weight Audio2Face-3D and Audio2Emotion networks and a sample dataset for training and evaluation • 7 items • Updated 15 days ago • 19

upvoted 2 articles 7 months ago

Article

Continuous batching from first principles

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 411

Article

🌳 QAT: The Art of Growing a Bonsai Model

onekq

•

Nov 9, 2025

• 15

upvoted a paper 7 months ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29, 2025 • 81

Florian Zimmermeister

AI & ML interests

Recent Activity

Organizations

flozi00's activity

Party is over: regularizing ColBERT models to fix efficient ANN methods

Spend 80% of Your LLM Compute on Data, Not Training

Open Responses: What you need to know

We Got Claude to Build CUDA Kernels and teach open models!

Continuous batching from first principles

🌳 QAT: The Art of Growing a Bonsai Model