view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 59
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 892
Alignment Makes Language Models Normative, Not Descriptive Paper • 2603.17218 • Published Mar 17 • 46
Health AI Developer Foundations (HAI-DEF) Collection Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 22 items • Updated Mar 12 • 217
PubMedQA: A Dataset for Biomedical Research Question Answering Paper • 1909.06146 • Published Sep 13, 2019 • 4
view article Article AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models imomayiz • Sep 16, 2025 • 19
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article Topic 27: What are Chain-of-Agents and Chain-of-RAG? Kseniase • Feb 13, 2025 • 18
Biomedical NLP papers Collection Papers posted on @ArxivHealthcareNLP@sigmoid.social (Clinical, Healthcare & Biomedical NLP) • 183 items • Updated Jan 24, 2025 • 43
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions Paper • 2402.18060 • Published Feb 28, 2024 • 2
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 169
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 66
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 loubnabnl, anton-l, davanstrien • Mar 20, 2024 • 113
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains Paper • 2402.10373 • Published Feb 15, 2024 • 10
Educational Resources for Medical LLMs Collection Curated medical LLM datasets and models for use in curricular content, particularly for medical professionals (e.g. medical students). • 15 items • Updated Dec 1, 2023 • 6
Healthcare Bias Eval Datasets Collection Benchmarks and other datasets that can be used to evaluate bias in healthcare settings. • 5 items • Updated Dec 9, 2023 • 1