view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 Mar 22, 2024 • 126
PII & De-Identification Collection Models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 33 items • Updated 14 days ago • 27
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Paper • 2506.06276 • Published Jun 6, 2025 • 26
SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity Paper • 2510.23541 • Published Oct 27, 2025 • 13
C2S-Scale-Gemma-Models Collection C2S-Scale Gemma models trained using the Cell2Sentence framework, described in the C2S-Scale paper. • 2 items • Updated Oct 13, 2025 • 12
mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 50
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 Sep 4, 2025 • 272
💫StarVector Models Collection StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20, 2025 • 96
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 309
Llama 3.2 3B & 1B GGUF Quants Collection Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. • 4 items • Updated Sep 26, 2024 • 46
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 52