Generative AI For Audio

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

akhaliq submitted a paper 26 days ago

Image Generators are Generalist Vision Learners

akhaliq submitted a paper about 2 months ago

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

akhaliq submitted a paper about 2 months ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

View all activity

submitted a paper to Daily Papers 26 days ago

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published 28 days ago • 20

submitted 2 papers to Daily Papers about 2 months ago

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

Paper • 2603.06679 • Published Mar 30 • 6

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Paper • 2603.24517 • Published Mar 25 • 11

submitted 2 papers to Daily Papers 2 months ago

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Paper • 2603.16792 • Published Mar 17 • 3

Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published Mar 13 • 43

authored a paper 2 months ago

Any to Full: Prompting Depth Anything for Depth Completion in One Stage

Paper • 2603.05711 • Published Mar 5 • 2

submitted a paper to Daily Papers 3 months ago

SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization

Paper • 2602.04811 • Published Feb 4 • 2

submitted a paper to Daily Papers 3 months ago

UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization

Paper • 2602.04683 • Published Feb 4 • 3

submitted 3 papers to Daily Papers 4 months ago

Visual Personalization Turing Test

Paper • 2601.22680 • Published Jan 30 • 2

Causal World Modeling for Robot Control

Paper • 2601.21998 • Published Jan 29 • 31

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

Paper • 2601.14253 • Published Jan 20 • 10

authored a paper 4 months ago

HeartMuLa: A Family of Open Sourced Music Foundation Models

Paper • 2601.10547 • Published Jan 15 • 49

submitted a paper to Daily Papers 4 months ago

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Paper • 2601.09499 • Published Jan 14 • 11

submitted a paper to Daily Papers 4 months ago

HeartMuLa: A Family of Open Sourced Music Foundation Models

Paper • 2601.10547 • Published Jan 15 • 49

submitted 2 papers to Daily Papers 4 months ago

UM-Text: A Unified Multimodal Model for Image Understanding

Paper • 2601.08321 • Published Jan 13 • 20

ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

Paper • 2601.03955 • Published Jan 7 • 3

submitted 4 papers to Daily Papers 5 months ago

FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Paper • 2512.24724 • Published Dec 31, 2025 • 9

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Paper • 2512.24766 • Published Dec 31, 2025 • 9

What matters for Representation Alignment: Global Information or Spatial Structure?

Paper • 2512.10794 • Published Dec 11, 2025 • 9

Towards a Science of Scaling Agent Systems

Paper • 2512.08296 • Published Dec 9, 2025 • 17