view changelog Hugging Face Changelog Introducing Buckets: S3-like storage on the Hub 2 days ago • 101
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets Paper • 2602.22207 • Published 15 days ago • 42
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 21 days ago • 483
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published Feb 26, 2025 • 22
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs Paper • 2504.04030 • Published Apr 5, 2025 • 3
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 28 days ago • 49
Open Coding Agents Specialization Collection Ai2 Open Coding Agents - Django, Sphinx, Sympy Data • 6 items • Updated 30 days ago • 3
Multilingual PII & De-Identification Collection Multilingual models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 245 items • Updated 2 days ago • 22
view article Article Classement compar:IA : des votes des utilisateurs au classement participatif des modèles Nov 3, 2025 • 7
compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data Paper • 2602.06669 • Published Feb 6 • 7
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
Instruction Pretrained Experiments Collection Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 109