-
RynnBrain
📉5Query images or videos with visual and text prompts
-
Alibaba-DAMO-Academy/RynnBrain-2B
Image-Text-to-Text • 2B • Updated • 431 • 24 -
Alibaba-DAMO-Academy/RynnBrain-8B
Image-Text-to-Text • 9B • Updated • 407 • 11 -
Alibaba-DAMO-Academy/RynnBrain-30B-A3B
Image-Text-to-Text • 17B • Updated • 421 • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2602.14979
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 64 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 12 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 22 -
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 28 -
RynnBrain: Open Embodied Foundation Models
Paper • 2602.14979 • Published • 31
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 16 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16
-
RynnBrain
📉5Query images or videos with visual and text prompts
-
Alibaba-DAMO-Academy/RynnBrain-2B
Image-Text-to-Text • 2B • Updated • 431 • 24 -
Alibaba-DAMO-Academy/RynnBrain-8B
Image-Text-to-Text • 9B • Updated • 407 • 11 -
Alibaba-DAMO-Academy/RynnBrain-30B-A3B
Image-Text-to-Text • 17B • Updated • 421 • 14
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Paper • 2507.02608 • Published • 22 -
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 28 -
RynnBrain: Open Embodied Foundation Models
Paper • 2602.14979 • Published • 31
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 64 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 12 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 16 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16