NexaAI
/

OmniNeural-4B

+---
+license: cc
+tags:
+- multimodal
+---
+# **OmniNeural** — World’s First NPU-Optimized Multimodal Model
+## **Overview**
+**OmniNeural** is the first multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, vehicles, IoT, and robotics.
+By co-designing the software and model architecture with NPU hardware, OmniNeural achieves:
+- **Up to 1.5× faster than CPU and 4× faster than GPU** for inference on consumer devices (e.g., Samsung S25 Ultra) .
+- **2–4× better efficiency than CPU and 4–8× better than GPU** in battery usage .
+- **Smooth multitasking**, running large generative AI models without slowing other applications .
+This combination of speed, efficiency, and NPU support makes OmniNeural the most practical multimodal foundation for edge intelligence.
+---
+## **Key Features**
+- **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
+- **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
+- **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
+- **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
+- **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
+- **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
+---
+## **Use Cases**
+- **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
+   - Examples: *Summarize slides into an email (PC)*, *extract action items from chat (mobile)*.
+   - Benefits: Private, offline, battery-efficient.
+- **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
+   - Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
+   - Decisions run locally in milliseconds.
+- **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
+   - Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
+   - Works without network connectivity.
+---
+## **Performance / Benchmarks**
+### Human Evaluation (vs baselines)
+- **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
+- **Audio**: Clear lead over baselines, especially in Whisper-encoder style tasks.
+- **Text**: Matches or outperforms leading multimodal baselines.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/nplpd2RWyL_cYj0t-xvhq.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/0BK09jUWUnDqYjsKpdFcR.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/LJNCxyU0OKp7Z4ecLSD9C.png)
+### Nexa Attention Speedups
+- **9× faster** audio encoding (vs Whisper).
+- **3.5× faster** image encoding (vs SigLIP).
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/tKZ9zPjjZtVdGW2N3yBHp.png)
+---
+## **Architecture Overview**
+OmniNeural’s design is tightly coupled with NPU hardware:
+- **NPU-friendly ops** (ReLU > GELU/SILU).
+- **Sparse + small tensor multiplications** for efficiency.
+- **Convolutional layers** favored over linear for better NPU parallelization.
+- **Hardware-aware attention** patterns to cut compute cost.
+- **Static graph execution** for predictable latency.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png)
+---
+## **How to use**   //TODO
+> ⚠️ Note: OmniNeural currently runs on Qualcomm NPUs (Snapdragon devices).
+> Apple NPU support is planned for the next release.
+**Install via Nexa-SDK:**
+## **License & Citation**
+@misc{omninneural2025,
+  title={OmniNeural: NPU-Optimized Multimodal Model for On-Device AI},
+  author={Nexa AI},
+  year={2025},
+  howpublished={\url{https://huggingface.co/NexaAI/OmniNeural-4B}}
+}
+## **Links & Community** //TODO