Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
tags:
|
| 4 |
+
- mobile-o
|
| 5 |
+
- multimodal
|
| 6 |
+
- unified-model
|
| 7 |
+
- ios
|
| 8 |
+
- coreml
|
| 9 |
+
- mlx
|
| 10 |
+
- on-device
|
| 11 |
+
- mobile
|
| 12 |
+
- edge-ai
|
| 13 |
+
pipeline_tag: image-text-to-text
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
<div align="center">
|
| 17 |
+
|
| 18 |
+
<h1>
|
| 19 |
+
<img src="https://github.com/Amshaker/Mobile-O/blob/main/assets/mobile-o-logo.png?raw=true" width="30" /> Mobile-O-0.5B-iOS
|
| 20 |
+
</h1>
|
| 21 |
+
|
| 22 |
+
**Optimized MLX & CoreML Components for On-Device Deployment**
|
| 23 |
+
|
| 24 |
+
<p>
|
| 25 |
+
<a href="https://arxiv.org/abs/XXXX.XXXXX"><img src="https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg" alt="arXiv"></a>
|
| 26 |
+
<a href="https://github.com/Amshaker/Mobile-O"><img src="https://img.shields.io/badge/GitHub-Code-black.svg" alt="Code"></a>
|
| 27 |
+
<a href="https://amshaker.github.io/Mobile-O/"><img src="https://img.shields.io/badge/π-Project_Page-2563eb.svg" alt="Project Page"></a>
|
| 28 |
+
<a href="https://mobileo.cvmbzuai.com/"><img src="https://img.shields.io/badge/π-Live_Demo-10b981.svg" alt="Demo"></a>
|
| 29 |
+
<a href="https://apps.apple.com/app/XXXXXXXXXX"><img src="https://img.shields.io/badge/-App_Store-black.svg" alt="App Store"></a>
|
| 30 |
+
</p>
|
| 31 |
+
|
| 32 |
+
</div>
|
| 33 |
+
|
| 34 |
+
## π Overview
|
| 35 |
+
|
| 36 |
+
This repository contains the optimized **MLX** and **CoreML** model components of [Mobile-O-0.5B](https://huggingface.co/Amshaker/Mobile-O-0.5B) for native iOS deployment. These components power the [Mobile-O iOS app](https://github.com/Amshaker/Mobile-O/tree/main/Mobile-O-App), enabling fully on-device multimodal understanding and image generation with no cloud dependency.
|
| 37 |
+
|
| 38 |
+
## π± On-Device Performance
|
| 39 |
+
|
| 40 |
+
| Spec | Detail |
|
| 41 |
+
|------|--------|
|
| 42 |
+
| β‘ Image Generation | ~3 seconds |
|
| 43 |
+
| ποΈ Visual Understanding | ~0.4 seconds |
|
| 44 |
+
| πΎ Memory Footprint | < 2GB |
|
| 45 |
+
| π± Compatible Devices | iPhone (A17+ / M-series) |
|
| 46 |
+
| π Cloud Dependency | None β fully on-device |
|
| 47 |
+
|
| 48 |
+
## π¦ Contents
|
| 49 |
+
|
| 50 |
+
This repo includes optimized model components in both **MLX** and **CoreML** formats:
|
| 51 |
+
|
| 52 |
+
| Component | Format | Description |
|
| 53 |
+
|-----------|--------|-------------|
|
| 54 |
+
| **VLM** | MLX / CoreML | FastVLM-0.5B (FastViT + Qwen2-0.5B) |
|
| 55 |
+
| **Diffusion Decoder** | MLX / CoreML | SANA-600M-512 (Linear DiT + VAE) |
|
| 56 |
+
| **MCP** | MLX / CoreML | Mobile Conditioning Projector (~2.4M params) |
|
| 57 |
+
|
| 58 |
+
## π Usage
|
| 59 |
+
|
| 60 |
+
### With the iOS App
|
| 61 |
+
|
| 62 |
+
1. Clone the [Mobile-O repo](https://github.com/Amshaker/Mobile-O)
|
| 63 |
+
2. Navigate to the `Mobile-O-App/` directory
|
| 64 |
+
3. Download this model repo into the app's model directory
|
| 65 |
+
4. Build and run in Xcode
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
git clone https://github.com/Amshaker/Mobile-O.git
|
| 69 |
+
cd Mobile-O/Mobile-O-App
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
Refer to the [Mobile-O-App README](https://github.com/Amshaker/Mobile-O/tree/main/Mobile-O-App) for detailed setup instructions.
|
| 73 |
+
|
| 74 |
+
### Download Models
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
from huggingface_hub import snapshot_download
|
| 78 |
+
|
| 79 |
+
snapshot_download(
|
| 80 |
+
repo_id="Amshaker/Mobile-O-0.5B-iOS",
|
| 81 |
+
repo_type="model",
|
| 82 |
+
local_dir="ios_models"
|
| 83 |
+
)
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
## π Related Resources
|
| 87 |
+
|
| 88 |
+
| Resource | Link |
|
| 89 |
+
|----------|------|
|
| 90 |
+
| π€ Mobile-O-0.5B | [PyTorch Model](https://huggingface.co/Amshaker/Mobile-O-0.5B) |
|
| 91 |
+
| π€ Mobile-O-1.5B | [PyTorch Model](https://huggingface.co/Amshaker/Mobile-O-1.5B) |
|
| 92 |
+
| π± iOS App Source Code | [Mobile-O-App](https://github.com/Amshaker/Mobile-O/tree/main/Mobile-O-App) |
|
| 93 |
+
| π€ Training Datasets | [Collection](https://huggingface.co/collections/Amshaker/mobile-o-datasets) |
|
| 94 |
+
|
| 95 |
+
## π Citation
|
| 96 |
+
|
| 97 |
+
```bibtex
|
| 98 |
+
@article{shaker2026mobileo,
|
| 99 |
+
title={Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device},
|
| 100 |
+
author={Shaker, Abdelrahman and Heakl, Ahmed and Muhammad, Jaseel and Thawkar, Ritesh and Thawakar, Omkar and Li, Senmao and Cholakkal, Hisham and Reid, Ian and Xing, Eric P. and Khan, Salman and Khan, Fahad Shahbaz},
|
| 101 |
+
journal={arXiv preprint arXiv:XXXX.XXXXX},
|
| 102 |
+
year={2026}
|
| 103 |
+
}
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
## βοΈ License
|
| 107 |
+
|
| 108 |
+
Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). For research purposes only.
|