Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 47
How to use timm/aimv2_large_patch14_448.apple_pt with timm:
import timm
model = timm.create_model("hf_hub:timm/aimv2_large_patch14_448.apple_pt", pretrained=True)How to use timm/aimv2_large_patch14_448.apple_pt with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-feature-extraction", model="timm/aimv2_large_patch14_448.apple_pt") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("timm/aimv2_large_patch14_448.apple_pt", dtype="auto")# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("timm/aimv2_large_patch14_448.apple_pt", dtype="auto")timm compatible AIM-v2 (https://huggingface.co/papers/2411.14402) image encoder weights from https://huggingface.co/apple/aimv2-large-patch14-448
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="timm/aimv2_large_patch14_448.apple_pt")