SeamlessM4T-v2 T2TT Lite Model

Extracted from facebook/seamless-m4t-v2-large, containing only T2TT (Text-to-Text Translation) components.

Original Model: facebook/seamless-m4t-v2-large

Official Documentation: SeamlessM4T-v2 Documentation

Note: This package only reorganizes publicly available weights from Meta's original model for T2TT usage. No new training or fine-tuning is introduced. All rights of the model and weights belong to their original owner.

Supported Features

T2TT (Text-to-Text Translation): Multilingual text translation
96 Languages: Supports text translation between 96 languages

Included Components

Model Weights

text_encoder: Text encoder
text_decoder: Text decoder
shared.weight: Shared word embeddings
lang_embed: Language embeddings

Model Size

Original Model: ~8.6 GB
Lite Model: ~5.1 GB
Removed Weights: 1219 (speech_encoder, t2u_model, vocoder)
Space Saved: ~3.5 GB

Usage Examples

1. Basic T2TT: Text-to-Text Translation

from transformers import SeamlessM4Tv2Model, AutoProcessor

# Load model
model = SeamlessM4Tv2Model.from_pretrained("jaman21/seamless-m4t-v2-t2tt")
processor = AutoProcessor.from_pretrained("jaman21/seamless-m4t-v2-t2tt")

# Translate text
text_inputs = processor(text="Hello, how are you?", src_lang="eng", return_tensors="pt")
output_tokens = model.generate(**text_inputs, tgt_lang="fra", generate_speech=False)
translated_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
print(translated_text)  # "Bonjour, comment allez-vous?"

2. Advanced Generation Strategies

# Beam search for better quality (slower)
text_inputs = processor(text="The quick brown fox jumps", src_lang="eng", return_tensors="pt")
outputs = model.generate(
    **text_inputs,
    tgt_lang="jpn",
    generate_speech=False,
    num_beams=5,              # Use beam search
    max_new_tokens=256,
    early_stopping=True
)

# Sampling for more diverse output
outputs = model.generate(
    **text_inputs,
    tgt_lang="kor",
    generate_speech=False,
    do_sample=True,           # Enable sampling
    top_k=50,
    top_p=0.95,
    temperature=0.8           # 0.0-1.0: lower is more deterministic, higher is more random (affects translation quality)
)

3. Batch Processing Multiple Texts

# Process multiple texts at once
texts = [
    "Hello, how are you?",
    "What is your name?",
    "Nice to meet you!"
]

text_inputs = processor(text=texts, src_lang="eng", return_tensors="pt", padding=True)
output_tokens = model.generate(**text_inputs, tgt_lang="ita", generate_speech=False)

# Decode all outputs
translations = processor.batch_decode(output_tokens, skip_special_tokens=True)
for orig, trans in zip(texts, translations):
    print(f"{orig} -> {trans}")

4. Control Generation Length and Quality

text_inputs = processor(text="Translate this sentence", src_lang="eng", return_tensors="pt")

# Higher quality but more computationally expensive
high_quality_output = model.generate(
    **text_inputs,
    tgt_lang="rus",
    generate_speech=False,
    num_beams=5,              # Beam search
    max_new_tokens=512,       # Allow longer output
    length_penalty=1.0,       # No length penalty
    early_stopping=True,
    use_cache=True            # Accelerate generation
)

# Faster generation speed, acceptable quality
fast_output = model.generate(
    **text_inputs,
    tgt_lang="rus",
    generate_speech=False,
    num_beams=1,              # Greedy decoding for better translation quality (slower)
    max_new_tokens=256,
    use_cache=True
)

5. GPU/CPU Usage

import torch

# Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Process inputs on the same device
text_inputs = processor(text="Hello", src_lang="eng", return_tensors="pt")
text_inputs = {k: v.to(device) for k, v in text_inputs.items()}

# Generate
with torch.inference_mode():  # More efficient than torch.no_grad()
    outputs = model.generate(**text_inputs, tgt_lang="cmn", generate_speech=False)

License

Same as the original model: CC-BY-NC-4.0

For commercial use, please refer to Meta's licensing terms.

References

Downloads last month: 10

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for jaman21/seamless-m4t-v2-t2tt

Seamless: Multilingual Expressive and Streaming Speech Translation

Paper • 2312.05187 • Published Dec 8, 2023 • 14