Sam
samsam55
·
AI & ML interests
None yet
Recent Activity
updated a collection about 1 month ago
Self Improving updated a collection about 1 month ago
Misc updated a collection about 1 month ago
MiscOrganizations
None yet
Coding Agents (Games)
Datasets
Run on CPU Optimizations
World View Creation (out painting 3D)
Coding LLMs
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 12 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 2.2M • • 1.66k -
Phonikud/phonikud
0.3B • Updated • 28 • 1 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 65
Agents
Video Generation & Pipelines
Reinforcement Learning Etc..
Self Improving
-
VISTA: A Test-Time Self-Improving Video Generation Agent
Paper • 2510.15831 • Published • 24 -
Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
Paper • 2510.15624 • Published • 15 -
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Paper • 2605.23904 • Published • 247
Deep Search
Computer Use
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 22 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 53 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 129 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 47
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 17 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 21 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 13 -
katanemo/Arch-Router-1.5B
Text Generation • 2B • Updated • 2.88k • • 267
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 6 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 9 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 65 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 50
Skills
Video Generation & Pipelines
Coding Agents (Games)
Reinforcement Learning Etc..
Datasets
Self Improving
-
VISTA: A Test-Time Self-Improving Video Generation Agent
Paper • 2510.15831 • Published • 24 -
Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
Paper • 2510.15624 • Published • 15 -
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Paper • 2605.23904 • Published • 247
Run on CPU Optimizations
Deep Search
World View Creation (out painting 3D)
Computer Use
Coding LLMs
Visual Multi Modal LLM
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 22 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 53 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 129 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 47
TTS & Speech to Text
-
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Paper • 2510.03117 • Published • 12 -
ResembleAI/chatterbox
Text-to-Speech • Updated • 2.2M • • 1.66k -
Phonikud/phonikud
0.3B • Updated • 28 • 1 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 65
Misc
-
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Paper • 2510.03663 • Published • 17 -
LLM-guided Hierarchical Retrieval
Paper • 2510.13217 • Published • 21 -
AnyUp: Universal Feature Upsampling
Paper • 2510.12764 • Published • 13 -
katanemo/Arch-Router-1.5B
Text Generation • 2B • Updated • 2.88k • • 267
Agents
3D Models & Modeling
-
Towards Scalable and Consistent 3D Editing
Paper • 2510.02994 • Published • 6 -
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Paper • 2509.24817 • Published • 9 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 65 -
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Paper • 2510.15869 • Published • 50