ZETIC | On-device AI

company

https://zetic.ai

zeticai_

zetic-ai

Activity Feed

AI & ML interests

On-device AI for Everything

sj-zetic

updated 4 collections 4 months ago

posted an update 8 months ago

Post

408

⚡ MobileCLIP2 Complete On-device Study: TMLR 2025 Featured Model on Mobile
Major Release: Comprehensive mobile deployment study of Apple's MobileCLIP2 (TMLR August 2025 Featured) with detailed performance benchmarks across 52+ mobile devices!

🎯 Model Overview:
- Architecture: Multi-modal reinforced training (vision + language)
- Research: TMLR 2025 Featured Certification
- Innovation: Improved efficiency-accuracy trade-offs vs SigLIP/OpenAI CLIP
- Specialty: Zero-shot image classification and retrieval

📊 Mobile Performance Results:

Latency Metrics:
- NPU (Best): 9.74ms average inference
- GPU: 39.00ms average
- CPU: 494.89ms average
- NPU Advantage: 115.94x speedup over CPU baseline!

Memory Efficiency:
- Model Size: 1.66 GB (production optimized)
- Runtime Memory: 466.18 MB peak consumption
- Load Range: 0-1,884 MB across device categories
- Inference Range: 431-1,616 MB

Accuracy Preservation:
- FP16 Precision: 39.78 dB maintained
- Quantized Mode: 15.07 dB (INT optimization available)
- Zero-shot Quality: Production-grade vision-language matching

🏆 Research Highlights:

MobileCLIP2-S4 Performance:
- Matches SigLIP-SO400M/14 accuracy
- 2x fewer parameters
- 2.5x lower latency than DFN ViT-L/14

MobileCLIP-S0 Efficiency:
- Similar zero-shot performance to OpenAI ViT-B/16
- 4.8x faster inference
- 2.8x smaller model size

MobileCLIP-S2 Advantages:
- Better avg zero-shot than SigLIP ViT-B/16
- 2.3x faster, 2.1x smaller
- Trained with 3x less seen samples

MobileCLIP-B (LT) Accuracy:
- 77.2% ImageNet zero-shot
- Surpasses OpenAI ViT-L/14@336
- Better than DFN and SigLIP similar architectures

🔗 Resources:
- Complete Study: https://mlange.zetic.ai/p/Steve/MobileCLIP2-image

Ready to build vision-language applications that run entirely on-device?
The future of multi-modal AI runs locally in everyone's pocket! 🚀

yeonseok-zeticai

posted an update 8 months ago

Post

2230

⚡ ColBERT-ko-v1.0 Complete On-device Study: SOTA Korean Retrieval on Mobile
Major Release: Comprehensive mobile deployment study of yoonjong's ColBERT-ko-v1.0 with detailed performance benchmarks across 50+ mobile devices!

🎯 Model Overview:
Architecture: Korean-optimized ColBERT (late interaction)
Parameters: 0.1B (compact and efficient)
Specialty: Korean document retrieval and semantic search
Performance: 1.0 recall@10, 0.966 nDCG@10 on AutoRAGRetrieval
Benchmark: Outperforms Jina-ColBERT-v2 on Korean MTEB tasks

📊 Mobile Performance Results:

Latency Metrics:
NPU (Best): 3.17ms average inference
GPU: 11.67ms average
CPU: 21.36ms average
NPU Advantage: 18.46x speedup over CPU

Memory Efficiency:
Model Size: 567.89 MB (production optimized)
Runtime Memory: 170.87 MB peak consumption
Load Range: 4-614 MB across device categories
FP32 Memory: 642.65 MB (optional high precision)

Accuracy Preservation:
- FP16 Precision: 53.51 dB maintained
- Quantized Mode: 32.77 dB (available for memory constraints)
-Retrieval Quality: Production-grade Korean semantic matching

🔬 Research Implications:

This study demonstrates:
- Late interaction models are viable for mobile deployment
- Korean language models achieve SOTA performance on consumer hardware
- Sub-5ms retrieval enables real-time RAG applications
- Privacy-first Korean AI is now technically feasible

Benchmark Methodology:
- Standardized Korean query inputs
- Multiple inference cycles with statistical averaging
- Real-world Korean document corpus testing
- Thermal and battery impact assessment

Deployment Recommendations:
- Use NPU acceleration for 18x speedup on Android
- Implement MUVERA for real-time requirements
- Cache frequent queries for instant responses
- Progressive loading for large document collections
- Hybrid approach for extremely large corpora (local + cloud)

🔗 Resources:
Complete Study: https://mlange.zetic.ai/p/Steve/colbert_kor
Original Model: https://huggingface.co/yoonjong0505/ColBERT-ko-v1.0

yeonseok-zeticai

posted an update 8 months ago

Post

3722

⚡ RexBERT Complete On-device Study: Comprehensive Performance Analysis Across Mobile Devices
(Check details at https://mlange.zetic.ai/p/Steve/RexBERT)

TL;DR: Transformer models are now practical for real-time mobile applications. The cloud-to-edge AI migration is complete.

- Original model from @thebajajra

🎯 Study Overview:
- Model: RexBERT (ModernBERT for E-commerce)
- Focus: Real-world deployment viability and performance analysis

📊 Key Performance Metrics:

Latency Results:
- NPU (Best): 4.74ms average
- GPU: 12.56ms average
- CPU: 35.16ms average

NPU Advantage: 16.98x speedup over CPU

Memory Efficiency:
- Model Size: 568.96 MB (compressed for mobile)
- Runtime Memory: 299.01 MB peak consumption
- Load Memory Range: 285 MB - 1,072 MB across devices

Accuracy Preservation:
- FP16 Precision: 63.72 dB
- Quantized Mode: Available with minimal accuracy loss
- Inference Quality: Production-grade maintained

🛠 Technical Implementation:
(Runnable with Copy & Paste at the ZETIC.MLange link: https://mlange.zetic.ai/p/Steve/RexBERT)

This study demonstrates that:

Transformer models are viable for real-time mobile applications
NPU acceleration provides the breakthrough needed for practical deployment
Mobile-first AI architecture is now technically feasible
The performance gap between cloud and edge inference is rapidly closing

🚀 Real-World Applications Enabled:

E-commerce Intelligence:
- Instant product search and discovery
- Real-time semantic matching
- Context-aware recommendations
- Natural language query processing

Conversational Commerce:
- Voice-to-product search
- Chatbot-style shopping assistance
- Intent recognition and classification
- Multi-turn conversation handling

Privacy-First AI:
- On-device processing (no data transmission)
- GDPR/privacy regulation compliant
- Reduced server infrastructure costs
- Offline capability maintenance

Are you ready to integrate BERT-level language understanding into your mobile applications?

yeonseok-zeticai

posted an update 8 months ago

Post

3735

🎯 RetinaFace On-Device Deployment Study: NPU Acceleration Breakthrough!
(Check details at :https://mlange.zetic.ai/p/Steve/RetinaFace)

TL;DR: Successfully deployed RetinaFace with ZETIC.MLange achieving 1.43ms inference on mobile NPU!

🔍 Complete Performance Analysis:
Latency Comparison:
- NPU: 1.43ms (Winner! 🏆)
- GPU: 3.75ms
- CPU: 21.42ms

Accuracy Metrics - SNR:
- FP16: 56.98 dB
- Integer Quantized: 48.03 dB
(Precision-Performance: Excellent trade-off maintained)

Memory Footprint:
- Model Size: 2.00 MB (highly compressed)
- Runtime Memory: 14.58 MB peak
- Deployment Ready: ✅ Production optimized

🛠 Technical Implementation:
(Runnable with Copy & Paste at the MLange link!)

📊 Device Compatibility Matrix:
Tested on 50+ devices including Samsung Galaxy series, Google Pixel lineup, and Xiaomi devices, iPhones and iPads.
Consistent sub-5ms performance across the board!

🚀 Applications Unlocked:
- Real-time AR/VR face tracking
- Privacy-preserving edge authentication
- Live video processing pipelines
- Mobile security applications
- Interactive camera filters

The democratization of high-performance computer vision on mobile devices is happening NOW! This study proves that complex CV models can run efficiently on consumer hardware without compromising accuracy.
Want to reproduce these results? Check out the benchmark methodology and implementation guide!

yeonseok-zeticai

posted an update 8 months ago

Post

3408

YOLOv11 Complete On-device Study
- {NPU vs GPU vs CPU} Across All Model Variants

We've just completed comprehensive benchmarking of the entire YOLOv11 family on ZETIC.MLange.
Here's what every ML engineer needs to know.

📊 Key Findings Across 5 Model Variants (XL to Nano):

1. NPU Dominance in Efficiency:
- YOLOv11n: 1.72ms on NPU vs 53.60ms on CPU (31x faster)
- Memory footprint: 0-65MB across all variants
- Consistent sub-10ms inference even on XL models

2. The Sweet Spot - YOLOv11s:
- NPU: 3.23ms @ 95.57% mAP
- Perfect balance: 36MB model, production-ready speed
- 10x faster than GPU, 30x faster than CPU

3. Surprising Discovery:
Medium models (YOLOv11m) show unusual GPU performance patterns - NPU outperforms GPU by 4x (9.55ms vs 35.82ms), suggesting current GPU kernels aren't optimized for mid-size architectures.

4. Production Insights:
- XL/Large: GPU still competitive for batch processing
- Small/Nano: NPU absolutely crushes everything else
- Memory scaling: Linear from 10MB (Nano) to 217MB (XL)
- Accuracy plateau: 95.5-95.7% mAP across S/M/L variants

Real-world Impact:
For edge deployment, YOLOv11s on NPU delivers server-level accuracy at embedded speeds. This changes everything for real-time applications.

🔗 Test these benchmarks yourself: https://mlange.zetic.ai/p/Steve/YOLOv11_comparison?tab=versions&version=5

📈 Full benchmark suite available now

The data speaks for itself.
NPUs aren't the future - they're the present for efficient inference.
Which variant fits your use case? Let's discuss in the comments.

3 replies

yeonseok-zeticai

posted an update 11 months ago

Post

3222

🚀 Real-Time On-Device AI Agent with Polaris-4B — Run It Yourself, No Cloud, No Cost

We just deployed a real-time on-device AI agent using the Polaris-4B-Preview model — one of the top-performing <6B open LLMs on Hugging Face.

📱 What’s remarkable?
This model runs entirely on a mobile device, without cloud, and without any manual optimization. It was built using ZETIC.MLange, and the best part?

➡️ It’s totally automated, free to use, and anyone can do it.
You don’t need to write deployment code, tweak backends, or touch device-specific SDKs. Just upload your model — and ZETIC.MLange handles the rest.

🧠 About the Model
- Model: Polaris-4B-Preview
- Size: ~4B parameters
- Ranking: Top 3 on Hugging Face LLM Leaderboard (<6B)
- Tokenizer: Token-incremental inference supported
- Modifications: None — stock weights, just optimized for mobile

⚙️ What ZETIC.MLange Does
ZETIC.MLange is a fully automated deployment framework for On-Device AI, built for AI engineers who want to focus on models — not infrastructure.

Here’s what it does in minutes:
- 📊 Analyzes model structure
- ⚙️ Converts to mobile-optimized format (e.g., GGUF, ONNX)
- 📦 Generates a runnable runtime environment with pre/post-processing
- 📱 Targets real mobile hardware (CPU, GPU, NPU — including Qualcomm, MediaTek, Apple)
- 🎯 Gives you a downloadable SDK or mobile app component — ready to run
And yes — this is available now, for free, at https://mlange.zetic.ai

🧪 For AI Engineers Like You, If you want to:
- Test LLMs directly on-device
- Run models offline with no latency
- Avoid cloud GPU costs
- Deploy to mobile without writing app-side inference code

Then this is your moment. You can do exactly what we did, using your own models — all in a few clicks.

🎯 Start here → https://mlange.zetic.ai

📬 Want to try Polaris-4B on your own app? contact@zetic.ai, or just visit https://mlange.zetic.ai , it is opened as free!

Great work @Chancy , @Zhihui , @tobiaslee !

yeonseok-zeticai

posted an update 11 months ago

Post

3321

Hi everyone,

I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls.

I wanted to share:
1. ⚡ Tokens/sec performance across several SLLMs
2. 🤖 Observations on hardware utilization (where the workload actually runs)
3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps

There are reports for below models
- QWEN3 0.6B
- NVIDIA/Nemotron QWEN 1.5B
- SimpleScaling S1
- TinyLlama
- Unsloth tuned Llama 3.2 1B
- Naver HyperClova 0.5B

📜Comparable Benchmark reports (no cloud, all on-device):
I’d really value your thoughts on:
- Creative ideas to further optimize inference under these hardware constraints
- Other compact LLMs worth testing on-device
- Experiences you’ve had trying to deploy LLMs at the edge

If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons.

Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!

yeonseok-zeticai

posted an update 11 months ago

Post

5804

💫 Next-Level On-Device AI Showdown

🪽 See It to Believe It, How QWEN4b works at On-device environment without expensive GPU Cloud server?
We’ve crafted a side-by-side demo video showcasing both Jan-Nano and QWEN 4B in action—no more wondering which model reigns supreme. Click play, compare their speeds, accuracy, and memory footprints, and decide which one fits your needs best!

👋 Why You Can’t Miss This
We are actively creating runnable sLLM environments for On-device AI. You can just build On-device AI apps within few hours.
Including Jan-Nano, QWEN4b, there are several sLLM models ready to be used on your AI application!.

🤑 Please feel free to use, because it is free to use!.

Ready to Compare?

Watch now, draw your own conclusions, and let us know which model you’d deploy in your next edge-AI project! 🌍💡

#OnDeviceAI #EdgeAI #AIShowdown #MLOptimization #DemoVideo #AIComparison

1 reply

yeonseok-zeticai

posted an update 12 months ago

Post

1400

🚀 Try deep-research without Network Connection and Data Leak, Jan-Nano production-ready on-device AI in 6 hours!

Jan-Nano has been making waves as one of HuggingFace's most trending 4B parameter models, outperforming even 671B models on SimpleQA benchmarks. But here's what changes everything: ZETIC.MLange just transformed Jan-Nano into a blazing-fast on-device AI solution.

✨ 6-hour deployment from huggingface to production-ready library!

Zero cloud dependency - complete privacy and offline capability
While others struggle with complex on-device deployments taking weeks or months, ZETIC.MLange's automated pipeline makes it effortless. No manual optimization, no vendor-specific coding, no compromise on performance.

📱 Ready to transform your AI models? Try ZETIC.MLange, it is totally free now!

The future of AI is on-device. Make it happen in hours, not months.

#OnDeviceAI #EdgeAI #MLOptimization #NPU #PrivacyFirst

4 replies

yeonseok-zeticai

posted an update 12 months ago

Post

2067

🚀 NEW DROP: run your own on-device LLM—in minutes, on any phone
Today we’re open-sourcing everything you need to put Qwen3-0.6B straight into a production-ready mobile app:

🎥 Watch Qwen3-0.6B chat in real time on any smartphones!

📊 TPS benchmarks – slides comparing token-per-second across heterogeneous mobile devices

💻 Plug-and-play source – Just Copy & Run the source to your project for Android (Kotlin & Java) and iOS (Swift).

🤞 Cross-platform, one pipeline – ZETIC.MLange auto-tunes kernels for every different devices, we’ve tested.

👨‍💻 Ready for production – swap in your own model, re-benchmark with one command, publish.

Get started
Just Sign-up and check the playground project, QWEN-0.6B
- https://mlange.zetic.ai/p/zetic-example/Qwen3-0.6B

We built this to show that cloud-free LLMs are ready today. Dive in, fork it, and tag ZETIC.ai when you launch your own on-device assistant, game NPC, or offline content generator—we’ll spotlight the best projects.

AI & ML interests

Team members 10

ZETIC-ai's activity