UMT5-XXL Encoder for Burn

UMT5-XXL encoder weights converted to Burn's native .bpk format.

Model Description

UMT5 (Unified Multilingual T5) is a multilingual encoder-decoder model trained on the mC4 corpus. This repository contains only the encoder portion, which is commonly used for text encoding in video generation models like LongCat and WanVideo.

Model Details

Architecture: Transformer Encoder
Hidden Size: 4096
Num Layers: 24
Num Attention Heads: 64
Feed-Forward Size: 10240
Vocabulary Size: 256384
Parameters: ~4.7B (encoder only)
Precision: BF16

Source

Original weights from Kijai/WanVideo_comfy (umt5-xxl-enc-bf16.safetensors).

Usage with umt5-burn

use burn::backend::candle::{Candle, CandleDevice};
use half::bf16;
use umt5_burn::{UMT5Config, UMT5Encoder};

type Backend = Candle<bf16, i64>;

fn main() {
    let device = CandleDevice::Metal(0); // or Cpu
    let config = UMT5Config::xxl();

    let mut encoder: UMT5Encoder<Backend> = config.init(&device);
    encoder.load_weights("umt5-xxl-enc-bf16.bpk").unwrap();

    // Use encoder...
}

Files

umt5-xxl-enc-bf16.bpk - Burn native format weights (recommended for Burn)
config.json - Model configuration

License

Same as the original UMT5 model - Apache 2.0.

Citation

If you use this model, please cite the original UMT5 paper:

@article{xue2022byt5,
  title={ByT5: Towards a token-free future with pre-trained byte-to-byte models},
  author={Xue, Linting and others},
  journal={Transactions of the Association for Computational Linguistics},
  year={2022}
}

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support