AMoE: Agglomerative MoE Vision Foundation Models
Collection
CVPR 2026. A family of vision encoders distilled from DINOv3 and SigLIP2, available in MoE and dense variants. • 4 items • Updated
• 1
Accepted at CVPR 2026
Small dense variant of AMoE. 0.07B parameters.
Part of the AMoE model family.
import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor
model_id = "tiiuae/amoe-dense-S"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)
image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
with torch.no_grad():
outputs = model(**inputs)
# Options: 'amoe' (512d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["amoe"] # (Batch, Tokens, 512)
summary_features = outputs["summary_features"]["siglip2"] # (Batch, 1152)
| Property | Value |
|---|---|
| Architecture | Dense |
| Parameters | 0.07B |
| Layers | 12 |
| Hidden Dim | 512 |
| FFN Dim | 2048 |
| Patch Size | 16x16 |
| Teachers | DINOv3, SigLIP2 |
@article{chaybouti2025amoe,
title={AMOE: Agglomerative Mixture-of-Experts Vision Foundation Models},
author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
journal={arXiv preprint arXiv:2512.20157},
year={2025}
}