papers - a passagereptile455 Collection

passagereptile455 's Collections

models

papers

updated 2 days ago

Upvote

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 8
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 162
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 69
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24, 2025 • 42
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Paper • 2507.18392 • Published Jul 24, 2025 • 20
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

Paper • 2507.17596 • Published Jul 23, 2025 • 7
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Paper • 2507.18742 • Published Jul 24, 2025 • 6
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

Paper • 2507.10510 • Published Jul 14, 2025 • 5
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25, 2025 • 34
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Paper • 2507.16534 • Published Jul 22, 2025 • 9
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 264
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 257
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 161
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 168
Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2, 2025 • 133
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 135
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22, 2025 • 125
T-LoRA: Single Image Diffusion Model Customization Without Overfitting

Paper • 2507.05964 • Published Jul 8, 2025 • 121
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19, 2025 • 137
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Paper • 2410.10813 • Published Oct 14, 2024 • 16
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13, 2025 • 25
Defeating Prompt Injections by Design

Paper • 2503.18813 • Published Mar 24, 2025 • 25
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

Paper • 2505.22954 • Published May 29, 2025 • 15
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis

Paper • 2505.11581 • Published May 16, 2025 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12, 2024 • 128
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 11
Self-Refine: Iterative Refinement with Self-Feedback

Paper • 2303.17651 • Published Mar 30, 2023 • 2
Gorilla: Large Language Model Connected with Massive APIs

Paper • 2305.15334 • Published May 24, 2023 • 7
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Paper • 2303.17580 • Published Mar 30, 2023 • 15
Communicative Agents for Software Development

Paper • 2307.07924 • Published Jul 16, 2023 • 6
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

Paper • 2308.08155 • Published Aug 16, 2023 • 11
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11, 2025 • 37
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 112
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 191
Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3, 2025 • 58
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues

Paper • 2501.10836 • Published Jan 18, 2025 • 1
Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 195
DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Paper • 2401.00812 • Published Jan 1, 2024 • 12
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 32
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22, 2025 • 12
Speculative Safety-Aware Decoding

Paper • 2508.17739 • Published Aug 25, 2025
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

Paper • 2508.10029 • Published Aug 8, 2025
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs

Paper • 2508.10031 • Published Aug 9, 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs

Paper • 2508.20333 • Published Aug 28, 2025
Mitigating Jailbreaks with Intent-Aware LLMs

Paper • 2508.12072 • Published Aug 16, 2025
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

Paper • 2509.17938 • Published Sep 22, 2025 • 4
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness

Paper • 2509.14297 • Published Sep 17, 2025
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 518
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Paper • 2412.21199 • Published Dec 30, 2024 • 13
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9, 2025 • 20
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

Paper • 2510.24592 • Published Oct 28, 2025 • 17
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 79
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 249
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Paper • 2506.03828 • Published Jun 4, 2025 • 20
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
Next-Embedding Prediction Makes Strong Vision Learners

Paper • 2512.16922 • Published Dec 18, 2025 • 91
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 330
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 261
Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 190
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published Apr 13 • 72
ATANT: An Evaluation Framework for AI Continuity

Paper • 2604.06710 • Published Apr 8 • 1
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

Paper • 2604.10539 • Published Apr 12 • 3
SHARE: Social-Humanities AI for Research and Education

Paper • 2604.11152 • Published Apr 13 • 1
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Paper • 2604.04385 • Published Apr 13 • 1
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation

Paper • 2604.09212 • Published Apr 10 • 3
Counting to Four is still a Chore for VLMs

Paper • 2604.10039 • Published Apr 11 • 2
ADD for Multi-Bit Image Watermarking

Paper • 2604.11491 • Published Apr 13 • 3
Continuous Adversarial Flow Models

Paper • 2604.11521 • Published Apr 13 • 12
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation

Paper • 2604.11290 • Published Apr 13 • 4
CocoaBench: Evaluating Unified Digital Agents in the Wild

Paper • 2604.11201 • Published Apr 13 • 37
CodeTracer: Towards Traceable Agent States

Paper • 2604.11641 • Published Apr 13 • 38
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

Paper • 2604.10949 • Published Apr 13 • 40
Zero-shot World Models Are Developmentally Efficient Learners

Paper • 2604.10333 • Published Apr 11 • 7
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

Paper • 2604.02340 • Published Apr 11 • 9
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Paper • 2604.11778 • Published Apr 13 • 10
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Paper • 2604.09557 • Published Feb 10 • 13
Efficient RL Training for LLMs with Experience Replay

Paper • 2604.08706 • Published Apr 9 • 23
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models

Paper • 2604.09459 • Published Apr 13 • 14
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Paper • 2604.08121 • Published Apr 9 • 44
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation

Paper • 2604.09132 • Published Apr 10 • 56
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

Paper • 2604.11259 • Published Apr 13 • 12
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

Paper • 2604.11753 • Published Apr 13 • 16
TRACE: Capability-Targeted Agentic Training

Paper • 2604.05336 • Published Apr 7 • 15
Panoptic Pairwise Distortion Graph

Paper • 2604.11004 • Published Apr 13 • 2
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory

Paper • 2604.11544 • Published Apr 13 • 4
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction

Paper • 2604.08921 • Published Apr 10 • 2
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

Paper • 2604.10718 • Published Apr 12 • 4
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

Paper • 2604.10425 • Published Apr 12 • 3
BMdataset: A Musicologically Curated LilyPond Dataset

Paper • 2604.10628 • Published Apr 12 • 2
Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Paper • 2604.11737 • Published Apr 13 • 6
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Paper • 2604.11446 • Published Apr 13 • 4
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

Paper • 2604.11716 • Published Apr 13 • 5
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

Paper • 2604.11666 • Published Apr 13 • 4
Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series

Paper • 2604.10799 • Published Apr 12 • 6
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

Paper • 2604.11547 • Published Apr 13 • 5
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

Paper • 2604.10784 • Published Apr 12 • 7
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting

Paper • 2604.10688 • Published Apr 12 • 27
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

Paper • 2604.10030 • Published Apr 11 • 15
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Paper • 2604.11805 • Published Apr 13 • 16
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Paper • 2604.10480 • Published Apr 12 • 20
Introspective Diffusion Language Models

Paper • 2604.11035 • Published Apr 13 • 25
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Paper • 2604.10905 • Published Apr 13 • 29
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Paper • 2604.11297 • Published Apr 13 • 144
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Paper • 2604.08570 • Published Mar 25 • 126
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 30
The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 34
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Paper • 2606.02373 • Published Jun 1 • 59
Next-Latent Prediction Transformers Learn Compact World Models

Paper • 2511.05963 • Published Nov 8, 2025 • 3
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Paper • 2601.22027 • Published Jan 29 • 87

Upvote

Collection guide
Browse collections