OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens
Abstract
OmniLottie framework generates high-quality vector animations from multi-modal instructions using a specialized Lottie tokenizer and pretrained vision-language models.
OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON files contain extensive invariant structural metadata and formatting tokens, posing significant challenges for learning vector animation generation. Therefore, we introduce a well designed Lottie tokenizer that transforms JSON files into structured sequences of commands and parameters representing shapes, animation functions and control parameters. Such tokenizer enables us to build OmniLottie upon pretrained vision language models to follow multi-modal interleaved instructions and generate high quality vector animations. To further advance research in vector animation generation, we curate MMLottie-2M, a large scale dataset of professionally designed vector animations paired with textual and visual annotations. With extensive experiments, we validate that OmniLottie can produce vivid and semantically aligned vector animations that adhere closely to multi modal human instructions.
Community
OmniLottie is the first family of end-to-end multimodal Lottie generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed Lottie animations from multi-modal instructions including texts, images, and videos. We also introduce MMLottie-2M, a multimodal dataset with two million richly annotated Lottie animations, along with a standardized evaluation protocol for multi-modal vector animation generation tasks.
Wow this is brilliant. Same technique can be applied to other formats to train LLMs with low overhead and better efficiency.
Kudos! ๐
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VINO: A Unified Visual Generator with Interleaved OmniModal Context (2026)
- DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning (2026)
- Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing (2026)
- Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing (2026)
- VecGlypher: Unified Vector Glyph Generation with Language Models (2026)
- Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models (2026)
- SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
arXivLens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/omnilottie-generating-vector-animations-via-parameterized-lottie-tokens-4704-7a682626
- Executive Summary
- Detailed Breakdown
- Practical Applications