Image Generators are Generalist Vision Learners Paper β’ 2604.20329 β’ Published 21 days ago β’ 20
MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines Paper β’ 2603.06679 β’ Published Mar 30 β’ 6
AVO: Agentic Variation Operators for Autonomous Evolutionary Search Paper β’ 2603.24517 β’ Published Mar 25 β’ 11
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Paper β’ 2603.16792 β’ Published Mar 17 β’ 3
SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization Paper β’ 2602.04811 β’ Published Feb 4 β’ 2
Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis Paper β’ 2601.14253 β’ Published Jan 20 β’ 10
V-DPM: 4D Video Reconstruction with Dynamic Point Maps Paper β’ 2601.09499 β’ Published Jan 14 β’ 11
UM-Text: A Unified Multimodal Model for Image Understanding Paper β’ 2601.08321 β’ Published Jan 13 β’ 12
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation Paper β’ 2601.03955 β’ Published Jan 7 β’ 3
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation Paper β’ 2512.24724 β’ Published Dec 31, 2025 β’ 9
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow Paper β’ 2512.24766 β’ Published Dec 31, 2025 β’ 9
What matters for Representation Alignment: Global Information or Spatial Structure? Paper β’ 2512.10794 β’ Published Dec 11, 2025 β’ 9
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models Paper β’ 2512.07843 β’ Published Nov 24, 2025 β’ 22
view post Post 27163 Want to iterate on a Hugging Face Space with an LLM? Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model! multimodalart/repo2txt See translation 1 reply Β· π€ 3 3 π 2 2 π 1 1 + Reply
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper β’ 2510.08697 β’ Published Oct 9, 2025 β’ 39
view post Post 18376 Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it πI've built a live real time demo on Spaces πΉπ¨ multimodalart/self-forcing See translation 6 replies Β· β€οΈ 12 12 π₯ 6 6 + Reply