Instructions to use jn12/2026LPCV-Track1-MobileCLIP2-B-Best with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OpenCLIP
How to use jn12/2026LPCV-Track1-MobileCLIP2-B-Best with OpenCLIP:
import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:jn12/2026LPCV-Track1-MobileCLIP2-B-Best') tokenizer = open_clip.get_tokenizer('hf-hub:jn12/2026LPCV-Track1-MobileCLIP2-B-Best') - Notebooks
- Google Colab
- Kaggle
2026LPCV-Track1-MobileCLIP2-B-Best
jn12/2026LPCV-Track1-MobileCLIP2-B-Best is the exported ONNX version of the best current MobileCLIP2-B checkpoint used in this LPCV 2026 Track 1 image-to-text retrieval project.
The full project code is available here:
https://github.com/jn12-29/LPCV-Track1-EfficientAI
That repository contains the complete model training pipeline, together with dataset preparation, ONNX export, local evaluation, and deployment-oriented evaluation code.
The repository provides separated image and text encoders in ONNX format so they can be evaluated locally with ONNX Runtime or compiled further for Qualcomm device workflows.
Model overview
- Base architecture:
MobileCLIP2-B - Task: image-to-text retrieval
- Export format: ONNX
- Runtime target: local ONNX evaluation and Qualcomm deployment flow
Repository contents
This repository currently provides exported encoder files:
image_encoder.onnximage_encoder.onnx.datatext_encoder.onnxtext_encoder.onnx.data
These files can be consumed directly by the local evaluation pipeline in this repository.
Download
hf download jn12/2026LPCV-Track1-MobileCLIP2-B-Best \
--local-dir ./pretrained/2026LPCV-Track1-MobileCLIP2-B-Best
Expected local layout:
pretrained/2026LPCV-Track1-MobileCLIP2-B-Best/
βββ image_encoder.onnx
βββ image_encoder.onnx.data
βββ text_encoder.onnx
βββ text_encoder.onnx.data
Quick usage
Evaluate locally with ONNX Runtime
Install dependencies:
pip install onnxruntime pillow numpy torch torchvision transformers
hf download openai/clip-vit-base-patch32
Run evaluation with plain ONNX Runtime:
from pathlib import Path
import numpy as np
import onnxruntime as ort
import torch
import torch.nn.functional as F
from PIL import Image
from torchvision import transforms
from transformers import CLIPTokenizer
MODEL_DIR = Path("./pretrained/2026LPCV-Track1-MobileCLIP2-B-Best")
IMAGE_PATHS = [
"examples/image1.jpg",
"examples/image2.jpg",
]
TEXTS = [
"a red bus on the street",
"a group of people near a building",
"a dog running on grass",
]
def preprocess_image(image_path: str) -> np.ndarray:
transform = transforms.Compose(
[
transforms.Resize((224, 224)),
transforms.ToTensor(),
]
)
image = Image.open(image_path).convert("RGB")
image_tensor = transform(image).unsqueeze(0)
return image_tensor.numpy().astype(np.float32)
def l2_normalize(x: np.ndarray) -> np.ndarray:
return x / np.linalg.norm(x, axis=-1, keepdims=True)
def recall_at_k(image_features: np.ndarray, text_features: np.ndarray, positives, k: int) -> float:
similarities = image_features @ text_features.T
topk = np.argsort(-similarities, axis=1)[:, :k]
hits = 0
for i, gt in enumerate(positives):
if any(j in gt for j in topk[i]):
hits += 1
return hits / len(positives)
image_session = ort.InferenceSession(
str(MODEL_DIR / "image_encoder.onnx"),
providers=["CPUExecutionProvider"],
)
text_session = ort.InferenceSession(
str(MODEL_DIR / "text_encoder.onnx"),
providers=["CPUExecutionProvider"],
)
tokenizer = CLIPTokenizer.from_pretrained(
"openai/clip-vit-base-patch32",
local_files_only=True,
)
tokenizer.add_special_tokens({"cls_token": tokenizer.eos_token})
image_embeddings = []
for image_path in IMAGE_PATHS:
image_input = preprocess_image(image_path)
image_output = image_session.run(None, {"image": image_input})[0]
image_embeddings.append(image_output[0])
image_embeddings = l2_normalize(np.stack(image_embeddings, axis=0))
text_embeddings = []
for text in TEXTS:
token_ids = tokenizer(
[text],
padding="max_length",
truncation=True,
max_length=77,
return_tensors="pt",
)["input_ids"].numpy().astype(np.int32)
text_output = text_session.run(None, {"text": token_ids})[0]
text_embeddings.append(text_output[0])
text_embeddings = l2_normalize(np.stack(text_embeddings, axis=0))
# Example ground-truth mapping:
# image 0 matches text 0, image 1 matches text 1.
positive_text_indices = [{0}, {1}]
r_at_1 = recall_at_k(image_embeddings, text_embeddings, positive_text_indices, k=1)
r_at_2 = recall_at_k(image_embeddings, text_embeddings, positive_text_indices, k=2)
print(f"Recall@1: {r_at_1:.4f}")
print(f"Recall@2: {r_at_2:.4f}")
Preprocessing and tokenization
This repository follows the preprocessing used by the project codebase:
- images are resized to
224x224 - pixel values are scaled to
[0, 1]by dividing by255 - ImageNet mean/std normalization is not applied
- text tokenization uses
CLIPTokenizerfromopenai/clip-vit-base-patch32 - token sequences use
max_length=77
Before running local evaluation, make sure the tokenizer is available in the local Hugging Face cache:
hf download openai/clip-vit-base-patch32
Training context
The exported ONNX files come from the LPCV 2026 Track 1 training workflow built around:
MobileCLIP2-Bas the base model- contrastive JSONL training data with positives and hard negatives
- local PyTorch fine-tuning
- ONNX export for deployment-oriented evaluation
The corresponding image-source dataset is available at:
https://huggingface.co/datasets/jn12/VG100K4CL
Intended use
Use this model if you want to:
- reproduce local ONNX evaluation from this repository
- benchmark the exported retrieval model
- integrate the encoders into a deployment pipeline
This repository is not intended to be a generic sentence-embedding model release or a universal CLIP drop-in replacement.
Citation
If you use this model, please cite the Hugging Face repository and the project code:
Authors:
Hui Xie, Jinyang Du, Jiacheng Wang, Xiaoze Ge, Fengjun Zhong, Yejun Zeng, Ruihao Gong#, Xiaoning Liu, Shenghao Jin, Jinyang Guo#, Xianglong Liu
@misc{mobileclip2b_lpcv2026,
title = {2026LPCV-Track1-MobileCLIP2-B-Best},
author = {Hui Xie and Jinyang Du and Jiacheng Wang and Xiaoze Ge and Fengjun Zhong and Yejun Zeng and Ruihao Gong and Xiaoning Liu and Shenghao Jin and Jinyang Guo and Xianglong Liu},
year = {2026},
howpublished = {\url{https://huggingface.co/jn12/2026LPCV-Track1-MobileCLIP2-B-Best}}
}
Project repository:
https://github.com/jn12-29/LPCV-Track1-EfficientAI
- Downloads last month
- -
Model tree for jn12/2026LPCV-Track1-MobileCLIP2-B-Best
Base model
timm/MobileCLIP2-B-OpenCLIP