Lance 3B Video - GGUF

This repository contains the quantized GGUF weights for Lance 3B Video, originally developed by ByteDance Research.

Lance is a lightweight, native unified multimodal model that supports image and video understanding, generation, and editing within a single framework. By quantizing these weights into GGUF format, the model becomes significantly more accessible for local inference, drastically reducing the massive 40GB VRAM requirement of the original unquantized model.

πŸ“Š Hardware Compatibility & Files

These GGUF files are designed for efficient CPU/GPU offloading. Choose the quantization level that best fits your available memory and desired quality.

Quantization Filename Size Description
4-bit Lance_3B_Video-Q4_K_M.gguf 4.96 GB Optimal balance of speed and VRAM. Great for mid-range hardware.
5-bit Lance_3B_Video-Q5_K_M.gguf 5.53 GB Higher precision with minimal size increase.
6-bit Lance_3B_Video-Q6_K.gguf 6.12 GB Near-unquantized quality for higher-end local setups.
8-bit Lance_3B_Video-Q8_0.gguf 7.62 GB Maximum quality, requiring the most memory among these options.

🌟 Model Overview

Rather than relying on massive parameter scaling, Lance achieves state-of-the-art results through collaborative multi-task training and a dual-stream Mixture-of-Experts (MoE) architecture.

  • Efficient at 3B Scale: With only 3 billion active parameters, Lance delivers robust performance across video generation, editing, and understanding benchmarks.
  • Trained from Scratch: Built with a staged multi-task recipe and trained entirely from scratch within a highly efficient 128-A100-GPU budget.
  • Unified Architecture: Uses a dual-stream architecture on shared interleaved multimodal sequences. It employs modality-aware rotary positional encoding to prevent interference between different types of visual tokens.

Supported Tasks for this Video Variant:

  1. Text-to-Video Generation: Generate highly detailed, physics-aware, and coherent videos from text prompts.
  2. Video Editing: Perform background transformation, object replacement, and style changes while maintaining temporal consistency.
  3. Multi-turn Consistency Editing: Chain multiple edits together on the same video seamlessly.
  4. Video Understanding: Analyze video content for QA and detailed captioning.

πŸš€ How to Use

(Note: GGUF support for multimodal diffusion/generation models is rapidly evolving. You will typically use these weights inside of custom nodes for UI frameworks rather than standard text-based LLM runners.)

Recommended Ecosystems:

  • ComfyUI: Look for custom nodes that support GGUF unified multimodal or Lance architectures (similar to how Wan/Hunyuan/Flux GGUFs are currently implemented).
  • Custom Inference Scripts: Can be loaded via specialized Python wrappers that utilize ggml backend for multimodal pipelines.

Citation

If you use this model or the original architecture in your research, please cite the original ByteDance paper:

@article{fu2026lance,
  title={Lance: Unified Multimodal Modeling by Multi-Task Synergy},
  author={Fu, Fengyi and Wu, Shaojin and Guo, Jianzhu and others},
  journal={arXiv preprint arXiv:2605.18678},
  year={2026}
}
Downloads last month
762
GGUF
Model size
6B params
Architecture
lance
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Abiray/Lance_3B_Video-GGUF

Quantized
(15)
this model

Paper for Abiray/Lance_3B_Video-GGUF