mom-multilingual-class
Collection
long context models for MoM multilingual classifier (domain, jailbreak, pii, factual, feedback) • 12 items • Updated
Part of the MoM (Mixture of Models) family for vLLM Semantic Router.
A LoRA adapter fine-tuned on mmbert-32k-yarn (307M parameter ModernBERT with 32K context and 1800+ language support) that classifies user prompt intent into the appropriate response modality:
| Label | Description | Routed To | Example |
|---|---|---|---|
| AR | Text-only response | Autoregressive LLM (e.g., Llama, Qwen) | "What is the capital of France?" |
| DIFFUSION | Image generation | Diffusion model (e.g., Flux, SDXL) | "A cyberpunk city at night, neon lights" |
| BOTH | Text + image response | Both AR + Diffusion pipeline | "Explain photosynthesis and show a diagram" |
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load base model + LoRA adapter
base_model = AutoModelForSequenceClassification.from_pretrained(
"llm-semantic-router/mmbert-32k-yarn", num_labels=3
)
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-modality-router-lora")
tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert32k-modality-router-lora")
# Label mapping
labels = {0: "AR", 1: "DIFFUSION", 2: "BOTH"}
# Classify prompts
prompts = [
"What are the benefits of exercise?",
"A serene Japanese garden with cherry blossoms, watercolor style",
"Explain how neural networks work and generate a diagram showing the architecture",
]
model.eval()
for prompt in prompts:
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=-1).item()
probs = torch.softmax(outputs.logits, dim=-1)[0]
print(f"Prompt: {prompt[:60]}...")
print(f" -> {labels[pred]} (confidence: {probs[pred]:.3f})")
print()
For easier deployment without PEFT dependency, use the merged version: llm-semantic-router/mmbert32k-modality-router-merged
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="llm-semantic-router/mmbert32k-modality-router-merged",
)
result = pipe("Draw a picture of a sunset over mountains")
print(result) # [{'label': 'DIFFUSION', 'score': 0.97}]
llm-semantic-router/mmbert-32k-yarn (307M parameters, ModernBERT + YaRN RoPE)attn.Wqkv, attn.Wo, mlp.Wi, mlp.Woclassifier, scoreThe model is trained on a curated combination of 10 public datasets plus seed examples:
| Dataset | Size | Description |
|---|---|---|
| Gustavosta/Stable-Diffusion-Prompts | 80K | Curated Stable Diffusion prompts |
| FredZhang7/stable-diffusion-prompts-2.47M | 2.47M | Large-scale SD prompt collection |
| nateraw/parti-prompts | 1.6K | Google Parti benchmark prompts |
| fal/image-generation-prompts | 1K+ | Diverse image generation prompts |
| allenai/WildChat (mined) | - | Real user prompts with image-generation intent |
| Dataset | Size | Description |
|---|---|---|
| OpenAssistant/oasst2 | 135K | Multilingual instruction conversations |
| tatsu-lab/alpaca | 52K | Stanford instruction-following |
| databricks/databricks-dolly-15k | 15K | Categorized instructions |
| stingning/ultrachat | 1.5M | Multi-turn conversations |
| allenai/WildChat (mined) | - | Real user text-only prompts |
| Dataset | Size | Description |
|---|---|---|
| mqliu/InterleavedBench | 7K+ | Gold-standard interleaved text+image prompts (EMNLP 2024) |
| allenai/WildChat (mined) | - | Real user multimodal prompts |
| Seed examples | 40+ | Curated diverse domain examples |
| Metric | Value |
|---|---|
| Accuracy | 0.9686 |
| F1 (weighted) | 0.9686 |
| Eval Loss | 0.0435 |
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| AR | 0.956 | 0.967 | 0.962 |
| DIFFUSION | 0.974 | 0.979 | 0.977 |
| BOTH | 0.983 | 0.951 | 0.967 |
This model is designed for routing LLM requests in multi-model serving systems like vLLM Semantic Router. It enables:
@misc{modality-router-2025,
title={Modality Router: Smart Output Modality Selection for Multi-Model Serving},
author={vLLM Semantic Router Team},
year={2025},
url={https://huggingface.co/llm-semantic-router/mmbert32k-modality-router-lora}
}
Base model
jhu-clsp/mmBERT-base