Video - a Carlosvirella100 Collection

Carlosvirella100 's Collections

CAMV

Video

updated Dec 23, 2025

Upvote

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 14
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 62
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2, 2025 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29, 2025 • 7
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8, 2025 • 45
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Paper • 2507.03336 • Published Jul 4, 2025 • 7
GTA1: GUI Test-time Scaling Agent

Paper • 2507.05791 • Published Jul 8, 2025 • 27
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Paper • 2507.07136 • Published Jul 9, 2025 • 40
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11, 2025 • 32
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper • 2507.10524 • Published Jul 14, 2025 • 74
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16, 2025 • 43
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Paper • 2507.09075 • Published Jul 11, 2025 • 19
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published Jul 14, 2025 • 30
Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11, 2025 • 19
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs

Paper • 2507.08616 • Published Jul 11, 2025 • 15
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published Jul 15, 2025 • 62
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Paper • 2507.13332 • Published Jul 17, 2025 • 49
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16, 2025 • 36
FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17, 2025 • 10
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17, 2025 • 24
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

Paper • 2507.08422 • Published Jul 11, 2025 • 36
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20, 2025 • 61
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling

Paper • 2507.11061 • Published Jul 15, 2025 • 37
Gaussian Splatting with Discretized SDF for Relightable Assets

Paper • 2507.15629 • Published Jul 21, 2025 • 23
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22, 2025 • 124
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Paper • 2507.17745 • Published Jul 23, 2025 • 36
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Oct 20, 2025 • 100
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31, 2025 • 6
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1, 2025 • 64
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1, 2025 • 96
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 276
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper • 2508.03686 • Published Aug 5, 2025 • 39
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 240
Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published Jul 24, 2025 • 86
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 141
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published Aug 4, 2025 • 13
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7, 2025 • 27
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11, 2025 • 43
Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7, 2025 • 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 212
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published Aug 8, 2025 • 42
Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11, 2025 • 30
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 58
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7, 2025 • 48
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13, 2025 • 15
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14, 2025 • 34
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Paper • 2508.14896 • Published Aug 20, 2025 • 23
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published Aug 14, 2025 • 42
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 50
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 91
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

Paper • 2508.14811 • Published Aug 20, 2025 • 42
UQ: Assessing Language Models on Unsolved Questions

Paper • 2508.17580 • Published Aug 25, 2025 • 15
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Paper • 2508.17472 • Published Aug 24, 2025 • 26
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26, 2025 • 16
Autoregressive Universal Video Segmentation Model

Paper • 2508.19242 • Published Aug 26, 2025 • 29
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27, 2025 • 32
Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published Aug 27, 2025 • 85
SpotEdit: Evaluating Visually-Guided Image Editing Methods

Paper • 2508.18159 • Published Aug 25, 2025 • 3
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published Aug 14, 2025 • 15
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 90
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 171
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Paper • 2508.20096 • Published Aug 27, 2025 • 37
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22, 2025 • 4
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28, 2025 • 78
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published Aug 29, 2025 • 12
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29, 2025 • 21
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Paper • 2508.17677 • Published Aug 25, 2025 • 14
CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published Aug 19, 2025 • 8
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2, 2025 • 128
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published Aug 30, 2025 • 19
Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5, 2025 • 47
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

Paper • 2509.12203 • Published Sep 15, 2025 • 20
Locality in Image Diffusion Models Emerges from Data Statistics

Paper • 2509.09672 • Published Sep 11, 2025 • 13
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

Paper • 2509.09926 • Published Sep 12, 2025 • 14
Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16, 2025 • 36
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12, 2025 • 31
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16, 2025 • 67
Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 72
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

Paper • 2509.10687 • Published Sep 12, 2025 • 7
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

Paper • 2509.15212 • Published Sep 18, 2025 • 22
AToken: A Unified Tokenizer for Vision

Paper • 2509.14476 • Published Sep 17, 2025 • 37
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 154
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 165
Fine-tuning Done Right in Model Editing

Paper • 2509.22072 • Published Sep 26, 2025 • 28
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 147
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26, 2025 • 30
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Paper • 2509.25848 • Published Sep 30, 2025 • 81
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30, 2025 • 49
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Paper • 2509.22651 • Published Sep 26, 2025 • 23
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Paper • 2509.22414 • Published Sep 26, 2025 • 22
LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1, 2025 • 108
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published Sep 26, 2025 • 54
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published Oct 3, 2025 • 25
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

Paper • 2509.25771 • Published Sep 30, 2025 • 11
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5, 2025 • 26
Efficient Intent Detection with Dual Sentence Encoders

Paper • 2003.04807 • Published Mar 10, 2020 • 2
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10, 2025 • 50
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 28
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Paper • 2512.19134 • Published Dec 22, 2025 • 32

Upvote

Collection guide
Browse collections