Michael Fromm's picture

Michael Fromm

mfromm

·

https://fromm-m.github.io/fromm/

AI & ML interests

NLP, LLM, ConvAI

Recent Activity

authored a paper 8 days ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

upvoted a collection 8 days ago

upvoted a paper 8 days ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

View all activity

Organizations

upvoted a collection 8 days ago

KletterMix

4 items • Updated 21 days ago • 6

upvoted a paper 8 days ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

Paper • 2606.03773 • Published 23 days ago • 21

upvoted a collection 5 months ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 13 days ago • 169

upvoted a paper 8 months ago

Tokenizer Choice For LLM Training: Negligible or Crucial?

Paper • 2310.08754 • Published Oct 12, 2023 • 3

upvoted an article 12 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 780

upvoted a paper about 1 year ago

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

Paper • 2505.22232 • Published May 28, 2025 • 18

upvoted a collection over 1 year ago

EU20-Benchmarks

Evaluation Benchmarks for 20 European languages. • 5 items • Updated Oct 11, 2024 • 9