KnowledgeXLab@Shanghai AI Lab

community

https://github.com/KnowledgeXLab

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Papers

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

View all Papers

authored a paper 5 months ago

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

Paper • 2601.08173 • Published Jan 13 • 9

submitted a paper to Daily Papers 5 months ago

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

Paper • 2601.08173 • Published Jan 13 • 9

authored 7 papers 8 months ago

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Paper • 2404.05225 • Published Apr 8, 2024 • 2

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

Paper • 2407.12358 • Published Jul 17, 2024 • 1

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

Paper • 2507.20766 • Published Jul 28, 2025 • 1

Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8, 2025 • 16

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

Paper • 2509.24709 • Published Sep 29, 2025 • 7

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

Paper • 2509.26048 • Published Sep 30, 2025 • 7

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Paper • 2510.08002 • Published Oct 9, 2025 • 24

authored 2 papers 9 months ago

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

Paper • 2309.16292 • Published Sep 28, 2023 • 1

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Paper • 2311.05332 • Published Nov 9, 2023 • 11

authored 2 papers 9 months ago

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Paper • 2311.05332 • Published Nov 9, 2023 • 11

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

Paper • 2306.06023 • Published Jun 9, 2023

authored a paper 9 months ago

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Paper • 2307.07162 • Published Jul 14, 2023

authored a paper 9 months ago

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Paper • 2307.07162 • Published Jul 14, 2023

authored a paper 9 months ago

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

Paper • 2401.11649 • Published Jan 22, 2024 • 3

authored a paper 9 months ago

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

Paper • 2402.03830 • Published Feb 6, 2024 • 2

authored a paper 9 months ago

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

Paper • 2402.03830 • Published Feb 6, 2024 • 2

authored a paper 9 months ago

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12, 2024 • 33

authored a paper 9 months ago

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Paper • 2406.11633 • Published Jun 17, 2024 • 1