Misc - a samsam55 Collection

samsam55 's Collections

Video Generation & Pipelines

Coding Agents (Games)

Reinforcement Learning Etc..

Run on CPU Optimizations

World View Creation (out painting 3D)

Visual Multi Modal LLM

TTS & Speech to Text

3D Models & Modeling

Misc

updated May 25

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Paper • 2510.03663 • Published Oct 4, 2025 • 17
LLM-guided Hierarchical Retrieval

Paper • 2510.13217 • Published Oct 15, 2025 • 21
AnyUp: Universal Feature Upsampling

Paper • 2510.12764 • Published Oct 14, 2025 • 13
katanemo/Arch-Router-1.5B

Text Generation • 2B • Updated Apr 2 • 2.88k • • 267
nvidia/Audio2Face-3D-v3.0

Updated Oct 21, 2025 • 262 • 82
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

Paper • 2510.20150 • Published Oct 23, 2025 • 7
WildDet3D: Scaling Promptable 3D Detection in the Wild

Paper • 2604.08626 • Published Apr 9 • 248
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Paper • 2604.19667 • Published Apr 21 • 23
The Last Harness You'll Ever Build

Paper • 2604.21003 • Published Apr 22 • 6
From Context to Skills: Can Language Models Learn from Context Skillfully?

Paper • 2604.27660 • Published May 3 • 171
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

Paper • 2604.24026 • Published Apr 27 • 22
Sapiens2

Paper • 2604.21681 • Published Apr 23 • 22
Co-Director: Agentic Generative Video Storytelling

Paper • 2604.24842 • Published Apr 27 • 16
Recursive Multi-Agent Systems

Paper • 2604.25917 • Published Apr 28 • 286
openai/privacy-filter

Token Classification • 1B • Updated Apr 22 • 300k • • 1.67k
Qwen/WebWorld-8B

Text Generation • 8B • Updated May 8 • 1.3k • • 62
IAAR-Shanghai/MemPrivacy-1.7B-SFT

Text Generation • 2B • Updated May 15 • 1.12k • 30
facebook/sam3

Mask Generation • 0.9B • Updated Nov 20, 2025 • 1.72M • 2.33k
openbmb/MiniCPM-V-4.6

Image-Text-to-Text • 1B • Updated 23 days ago • 802k • 1.13k
Code as Agent Harness

Paper • 2605.18747 • Published May 18 • 223
amazon/chronos-2

Time Series Forecasting • 0.1B • Updated 22 days ago • 15.4M • 338
Falconsai/nsfw_image_detection

Image Classification • 85.8M • Updated Apr 6, 2025 • 9.8M • • 1.11k
autogluon/chronos-bolt-small

Time Series Forecasting • 47.7M • Updated Oct 30, 2025 • 13.9M • 44
pyannote/voice-activity-detection

Automatic Speech Recognition • Updated May 10, 2024 • 3.42M • 237
TrustSafeAI/RADAR-Vicuna-7B

Text Classification • Updated Nov 7, 2023 • 1.34M • • 13
rizvandwiki/gender-classification

Image Classification • 85.8M • Updated May 18, 2023 • 1.46M • • 60
Qwen/Qwen3-Reranker-0.6B

Text Ranking • 0.6B • Updated Apr 16 • 2.01M • 367
meta-llama/Prompt-Guard-86M

Text Classification • 0.3B • Updated Nov 12, 2025 • 2.27M • • 348
Tongyi-MAI/Z-Image-Turbo

Text-to-Image • Updated Jan 30 • 891k • • 4.88k