Submitted by Bai LiChen 8 MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model catnip 24 1