Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 315
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published Jun 22, 2025 • 19
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 15 items • Updated 3 days ago • 534
KVAE 1.0 Collection KVAE 1.0 tokenizers are for images (KVAE-2D-1.0) and video (KVAE-3D-1.0) are distributed under MIT license (commercial use is possible). • 2 items • Updated Dec 14, 2025 • 7
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated 1 day ago • 452
VideoPrism Collection VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. • 5 items • Updated 1 day ago • 17
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models Paper • 2512.00590 • Published Nov 29, 2025 • 51
Saiga GGUF Collection Russian fine-tunes of different base LLMs in the GGUF format compatible with llama.cpp • 8 items • Updated Apr 27, 2025 • 38
DTrOCR: Decoder-only Transformer for Optical Character Recognition Paper • 2308.15996 • Published Aug 30, 2023 • 4
Transformer-Based Approach for Joint Handwriting and Named Entity Recognition in Historical documents Paper • 2112.04189 • Published Dec 8, 2021 • 3
Handwritten and Printed Text Segmentation: A Signature Case Study Paper • 2307.07887 • Published Jul 15, 2023 • 1
WriteViT: Handwritten Text Generation with Vision Transformer Paper • 2505.13235 • Published May 19, 2025 • 1