view article Article LeMat-Rho: High-Fidelity Charge Density Dataset for Atomistic Materials Modeling LeMaterial • 6 days ago
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages davanstrien • Jul 8, 2025 • 35
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
view article Article FineWeb2-C: Help Build Better Language Models in Your Language davanstrien • Dec 23, 2024 • 21