Mobile-O-Models
Collection
This collection contains all models of Mobile-O project
β’
3 items
β’
Updated
Mobile-O-0.5B is a compact unified visionβlanguageβdiffusion model that performs both multimodal understanding (VQA, OCR, reasoning) and image generation within a single architecture, designed for mobile and edge deployment.
| Spec | Detail |
|---|---|
| Total Parameters | 1.6B |
| Image Resolution | 512Γ512 |
| Image Generation | ~3 seconds on iPhone |
| Visual Understanding | ~0.4 seconds on iPhone |
| Memory Footprint | < 2GB |
| Task | Input β Output |
|---|---|
| π¬ Conversational AI | Text β Text |
| ποΈ Image Understanding | Image + Text β Text |
| πΌοΈ Image Generation | Text β Image |
| βοΈ Image Editing | Image + Text β Image |
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Amshaker/Mobile-O-0.5B",
repo_type="model",
local_dir="checkpoints",
allow_patterns=["final_merged_model_23620/*"]
)
python infer_und.py \
--model_path checkpoints/final_merged_model_23620/ \
--image_path assets/cute_cat.png \
--prompt "What is in the image?"
python infer_gen.py \
--model_path checkpoints/final_merged_model_23620/ \
--prompt "A vibrant tropical rainforest scene with a scarlet macaw perched on a moss-covered branch"
python infer_edit.py \
--model_path checkpoints/final_merged_model_23620/ \
--image_path assets/cute_cat.png \
--prompt "Make the cat wear a hat"
Mobile-O consists of three main components:
Trained in three stages:
| Resource | Link |
|---|---|
| π€ Mobile-O-1.5B | Model |
| π€ Mobile-O-0.5B-iOS | iOS Components |
| π± iOS App Source Code | Mobile-O-App |
@article{shaker2026mobileo,
title={Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device},
author={Shaker, Abdelrahman and Heakl, Ahmed and Muhammad, Jaseel and Thawkar, Ritesh and Thawakar, Omkar and Li, Senmao and Cholakkal, Hisham and Reid, Ian and Xing, Eric P. and Khan, Salman and Khan, Fahad Shahbaz},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026}
}
Released under CC BY-NC 4.0. For research purposes only.
Unable to build the model tree, the base model loops to the model itself. Learn more.