kolar0 (Kat)

upvoted a paper 2 months ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Paper • 2603.27862 • Published Mar 29 • 32

upvoted a paper 4 months ago

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published Jan 29 • 75

upvoted a paper 5 months ago

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Paper • 2601.04720 • Published Jan 8 • 59

upvoted an article 6 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

burtenshaw, evalstate

•

Dec 4, 2025

• 629

upvoted an article 7 months ago

Article

Neuro SAN Is All You Need — A Data-Driven Multi-Agent Orchestration Framework (extended)

danyoung

•

Jun 17, 2025

• 5

upvoted a paper 8 months ago

RAG-Anything: All-in-One RAG Framework

Paper • 2510.12323 • Published Oct 14, 2025 • 82

upvoted a collection 10 months ago

MedGemma Release

Collection

Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated Mar 12 • 501

upvoted 2 papers 11 months ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 108

upvoted a paper about 1 year ago

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 99

upvoted 3 articles about 1 year ago

Article

Vision Language Models (Better, faster, stronger)

+3

merve, sergiopaniego, ariG23498, pcuenq, andito

•

May 12, 2025

• 613

Article

Introducing GGUF-my-LoRA

ngxson

•

Nov 1, 2024

• 23

Article

Tiny Agents: an MCP-powered agent in 50 lines of code

julien-c

•

Apr 25, 2025

• 308

Kat

AI & ML interests

Organizations

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

We Got Claude to Fine-Tune an Open Source LLM

Neuro SAN Is All You Need — A Data-Driven Multi-Agent Orchestration Framework (extended)

RAG-Anything: All-in-One RAG Framework

MedGemma Release

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Seedance 1.0: Exploring the Boundaries of Video Generation Models

MMaDA: Multimodal Large Diffusion Language Models

Vision Language Models (Better, faster, stronger)

Introducing GGUF-my-LoRA

Tiny Agents: an MCP-powered agent in 50 lines of code

Kat

AI & ML interests

Organizations

kolar0's activity

We Got Claude to Fine-Tune an Open Source LLM

Neuro SAN Is All You Need — A Data-Driven Multi-Agent Orchestration Framework (extended)

Vision Language Models (Better, faster, stronger)

Introducing GGUF-my-LoRA

Tiny Agents: an MCP-powered agent in 50 lines of code