view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 909
view article Article Getting Started with Sentiment Analysis using Python federicopascual • Feb 2, 2022 • 75
EmoPillars Collection This collection contains models and a dataset for fine-grained context-aware and context-less emotion classification. • 7 items • Updated Apr 25, 2025 • 4
view article Article Introducing AI Sheets: a tool to work with datasets using open AI models! +4 dvilasuero, Ameeeee, frascuchon, damianpumar, lvwerra, thomwolf • Aug 8, 2025 • 109
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 264
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5, 2025 • 62
🧠 Traditional Chinese Reasoning Datasets Collection A curated collection of datasets designed to evaluate and train reasoning capabilities in Traditional Chinese across various domains. • 3 items • Updated May 6 • 9
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality +2 saurabhdash, olivernan, ArashAhmadian, johndang-cohere • Mar 4, 2025 • 78
Breeze 2 Family Collection Llama-Breeze2 is a multi-modal language model family specifically intended for Traditional Chinese use. BreezyVoice is a Taiwan Mandarin TTS • 6 items • Updated Feb 26, 2025 • 20
Cosmos-Tokenizer1 Collection ⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/cosmos3 • 22 items • Updated 13 days ago • 44
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 16 items • Updated Mar 2 • 84
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5, 2025 • 309
view article Article Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚 Isayoften • Aug 26, 2024 • 92
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 51
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated May 5, 2025 • 251
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 459
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 13 days ago • 164