Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc
|
| 3 |
+
tags:
|
| 4 |
+
- multimodal
|
| 5 |
+
---
|
| 6 |
+
# **OmniNeural** — World’s First NPU-Optimized Multimodal Model
|
| 7 |
+
|
| 8 |
+
## **Overview**
|
| 9 |
+
**OmniNeural** is the first multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, vehicles, IoT, and robotics.
|
| 10 |
+
|
| 11 |
+
By co-designing the software and model architecture with NPU hardware, OmniNeural achieves:
|
| 12 |
+
- **Up to 1.5× faster than CPU and 4× faster than GPU** for inference on consumer devices (e.g., Samsung S25 Ultra) .
|
| 13 |
+
- **2–4× better efficiency than CPU and 4–8× better than GPU** in battery usage .
|
| 14 |
+
- **Smooth multitasking**, running large generative AI models without slowing other applications .
|
| 15 |
+
|
| 16 |
+
This combination of speed, efficiency, and NPU support makes OmniNeural the most practical multimodal foundation for edge intelligence.
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## **Key Features**
|
| 21 |
+
- **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
|
| 22 |
+
- **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
|
| 23 |
+
- **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
|
| 24 |
+
- **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
|
| 25 |
+
- **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
|
| 26 |
+
- **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## **Use Cases**
|
| 31 |
+
|
| 32 |
+
- **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
|
| 33 |
+
- Examples: *Summarize slides into an email (PC)*, *extract action items from chat (mobile)*.
|
| 34 |
+
- Benefits: Private, offline, battery-efficient.
|
| 35 |
+
|
| 36 |
+
- **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
|
| 37 |
+
- Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
|
| 38 |
+
- Decisions run locally in milliseconds.
|
| 39 |
+
|
| 40 |
+
- **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
|
| 41 |
+
- Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
|
| 42 |
+
- Works without network connectivity.
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## **Performance / Benchmarks**
|
| 47 |
+
### Human Evaluation (vs baselines)
|
| 48 |
+
- **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
|
| 49 |
+
- **Audio**: Clear lead over baselines, especially in Whisper-encoder style tasks.
|
| 50 |
+
- **Text**: Matches or outperforms leading multimodal baselines.
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+

|
| 54 |
+
|
| 55 |
+
|
| 56 |
+

|
| 57 |
+
|
| 58 |
+
|
| 59 |
+

|
| 60 |
+
|
| 61 |
+
### Nexa Attention Speedups
|
| 62 |
+
- **9× faster** audio encoding (vs Whisper).
|
| 63 |
+
- **3.5× faster** image encoding (vs SigLIP).
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+

|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## **Architecture Overview**
|
| 71 |
+
OmniNeural’s design is tightly coupled with NPU hardware:
|
| 72 |
+
- **NPU-friendly ops** (ReLU > GELU/SILU).
|
| 73 |
+
- **Sparse + small tensor multiplications** for efficiency.
|
| 74 |
+
- **Convolutional layers** favored over linear for better NPU parallelization.
|
| 75 |
+
- **Hardware-aware attention** patterns to cut compute cost.
|
| 76 |
+
- **Static graph execution** for predictable latency.
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+

|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## **How to use** //TODO
|
| 84 |
+
|
| 85 |
+
> ⚠️ Note: OmniNeural currently runs on Qualcomm NPUs (Snapdragon devices).
|
| 86 |
+
> Apple NPU support is planned for the next release.
|
| 87 |
+
|
| 88 |
+
**Install via Nexa-SDK:**
|
| 89 |
+
|
| 90 |
+
## **License & Citation**
|
| 91 |
+
|
| 92 |
+
@misc{omninneural2025,
|
| 93 |
+
title={OmniNeural: NPU-Optimized Multimodal Model for On-Device AI},
|
| 94 |
+
author={Nexa AI},
|
| 95 |
+
year={2025},
|
| 96 |
+
howpublished={\url{https://huggingface.co/NexaAI/OmniNeural-4B}}
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
## **Links & Community** //TODO
|
| 100 |
+
|