NexaAI
/

OmniNeural-4B

Model card Files Files and versions

nexa4ai commited on Aug 18, 2025

Commit

f1daf2f

·

verified ·

1 Parent(s): c97477b

Update README.md

Files changed (1) hide show

README.md +5 -12

README.md CHANGED Viewed

@@ -3,17 +3,10 @@ license: cc
 tags:
 - multimodal
 ---
-# **OmniNeural** — World’s First Multimodal Model Designed for NPU
 ## **Overview**
-**OmniNeural** is the first multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, vehicles, IoT, and robotics.
-By co-designing the software and model architecture with NPU hardware, OmniNeural achieves:
-- **Up to 1.5× faster than CPU and 4× faster than GPU** for inference on consumer devices (e.g., Samsung S25 Ultra) .
-- **2–4× better efficiency than CPU and 4–8× better than GPU** in battery usage .
-- **Smooth multitasking**, running large generative AI models without slowing other applications .
-This combination of speed, efficiency, and NPU support makes OmniNeural the most practical multimodal foundation for edge intelligence.
 ---
@@ -46,15 +39,15 @@ This combination of speed, efficiency, and NPU support makes OmniNeural the most
 ## **Performance / Benchmarks**
 ### Human Evaluation (vs baselines)
 - **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
-- **Audio**: Clear lead over baselines, especially in Whisper-encoder style tasks.
 - **Text**: Matches or outperforms leading multimodal baselines.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/imf2Q9H9EgTzWklg2Wm4_.png)
 ### Nexa Attention Speedups
-- **9× faster** audio encoding (vs Whisper).
-- **3.5× faster** image encoding (vs SigLIP).
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/EqdcVCRnb6jpK_ckFyo5z.png)

 tags:
 - multimodal
 ---
+# **OmniNeural** — World’s First NPU-aware Multimodal Model
 ## **Overview**
+**OmniNeural** is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, automobile, IoT, and robotics.
 ---
 ## **Performance / Benchmarks**
 ### Human Evaluation (vs baselines)
 - **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
+- **Audio**: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
 - **Text**: Matches or outperforms leading multimodal baselines.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/imf2Q9H9EgTzWklg2Wm4_.png)
 ### Nexa Attention Speedups
+- **9× faster** audio encoding (vs Whisper encoder).
+- **3.5× faster** image encoding (vs SigLIP encoder).
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/EqdcVCRnb6jpK_ckFyo5z.png)