| # **Overview** | |
| **AutoNeural** is a next-generation, **NPU-native multimodal vision–language model** co-designed from the ground up for real-time, on-device inference. Instead of adapting GPU-first architectures, AutoNeural redesigns both **vision encoding** and **language modeling** for the constraints and capabilities of NPUs—achieving **14× faster latency**, **7× lower quantization error**, and **real-time automotive performance** even under aggressive low-precision settings. | |
| AutoNeural integrates: | |
| * A **MobileNetV5-based vision encoder** with depthwise separable convolutions. | |
| * A **Liquid AI hybrid Transformer-SSM language backbone** that dramatically reduces KV-cache overhead. | |
| * A **normalization-free MLP connector** tailored for quantization stability. | |
| * Mixed-precision **W8A16 (vision)** and **W4A16 (language)** inference validated on real Qualcomm NPUs. | |
| AutoNeural powers real-time cockpit intelligence including **in-cabin safety**, **out-of-cabin awareness**, **HMI understanding**, and **visual + conversational function calls**, as demonstrated in the on-device results (Page 6 figure) . | |
| --- | |
| # **Key Features** | |
| ### 🔍 **MobileNetV5 Vision Encoder (300M)** | |
| Optimized for edge hardware, with: | |
| * **Depthwise separable convolutions** for low compute and bounded activations. | |
| * **Local attention bottlenecks** only in late stages for efficient long-range reasoning. | |
| * **Multi-Scale Fusion Adapter (MSFA)** producing a compact **16×16×2048** feature map. | |
| * Stable **INT8/16** behavior with minimal post-quantization degradation. | |
| Yields **5.8× – 14× speedups** over ViT baselines across 256–768 px inputs. | |
| --- | |
| ### 🧠 **Hybrid Transformer-SSM Language Backbone (1.2B)** | |
| Designed for NPU memory hierarchies: | |
| * **5:1 ratio of SSM layers to Transformer attention layers** | |
| * **Linear-time gated convolution layers** for most steps | |
| * **Tiny rolling state** instead of KV-cache → up to **60% lower memory bandwidth** | |
| * **W4A16 stable quantization** across layers | |
| --- | |
| ### 🔗 **Normalization-Free Vision–Language Connector** | |
| A compact 2-layer MLP using **SiLU**, deliberately **removing RMSNorm** to avoid unstable activation ranges during static quantization. | |
| Ensures reliable deployment on W8A16/W4A16 pipelines. | |
| --- | |
| ### 🚗 **Automotive-Grade Multimodal Intelligence** | |
| Trained on **10M Infinity-MM samples** plus **200k automotive cockpit samples**, covering: | |
| * AI Sentinel (vehicle security) | |
| * AI Greeter (identity recognition) | |
| * Car Finder (parking localization) | |
| * Passenger safety monitoring | |
| Ensures robust performance across lighting, demographics, weather, and motion scenarios. | |
| --- | |
| ### ⚡ **Real NPU Benchmarks** | |
| Validated on **Qualcomm SA8295P NPU**: | |
| | Metric | Baseline (InternVL 2B) | **AutoNeural-VL** | | |
| | ------------------------- | ---------------------- | ----------------- | | |
| | **TTFT** | ~1.4 s | **~100 ms** | | |
| | **Max Vision Resolution** | 448×448 | **768×768** | | |
| | **RMS Quant Error** | 3.98% | **0.56%** | | |
| | **Decode Throughput** | 15 tok/s | **44 tok/s** | | |
| | **Context Length** | 1024 | **4096** | | |
| --- | |
| # **How to Use** | |
| > ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**. | |
| ### 1) Install Nexa-SDK | |
| Download the SDK,follow the installation steps provided on the model page. | |
| --- | |
| ### 2) Configure authentication | |
| Create an access token in the Model Hub, then run: | |
| ```bash | |
| nexa config set license '<access_token>' | |
| ``` | |
| --- | |
| ### 3) Run the model | |
| ```bash | |
| nexa infer NexaAI/AutoNeural | |
| ``` | |
| ### Image input | |
| Drag and drop one or more image files into the terminal window. | |
| Multiple images can be processed with a single query. | |
| --- | |
| # **License** | |
| The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license. | |
| You may: | |
| * Use the model for **non-commercial** purposes | |
| * Modify and redistribute it with attribution | |
| For **commercial licensing**, please contact: | |
| **[dev@nexa.ai](mailto:dev@nexa.ai)** |