# **Overview** **AutoNeural** is a next-generation, **NPU-native multimodal vision–language model** co-designed from the ground up for real-time, on-device inference. Instead of adapting GPU-first architectures, AutoNeural redesigns both **vision encoding** and **language modeling** for the constraints and capabilities of NPUs—achieving **14× faster latency**, **7× lower quantization error**, and **real-time automotive performance** even under aggressive low-precision settings. AutoNeural integrates: * A **MobileNetV5-based vision encoder** with depthwise separable convolutions. * A **Liquid AI hybrid Transformer-SSM language backbone** that dramatically reduces KV-cache overhead. * A **normalization-free MLP connector** tailored for quantization stability. * Mixed-precision **W8A16 (vision)** and **W4A16 (language)** inference validated on real Qualcomm NPUs. AutoNeural powers real-time cockpit intelligence including **in-cabin safety**, **out-of-cabin awareness**, **HMI understanding**, and **visual + conversational function calls**, as demonstrated in the on-device results (Page 6 figure) . --- # **Key Features** ### 🔍 **MobileNetV5 Vision Encoder (300M)** Optimized for edge hardware, with: * **Depthwise separable convolutions** for low compute and bounded activations. * **Local attention bottlenecks** only in late stages for efficient long-range reasoning. * **Multi-Scale Fusion Adapter (MSFA)** producing a compact **16×16×2048** feature map. * Stable **INT8/16** behavior with minimal post-quantization degradation. Yields **5.8× – 14× speedups** over ViT baselines across 256–768 px inputs. --- ### 🧠 **Hybrid Transformer-SSM Language Backbone (1.2B)** Designed for NPU memory hierarchies: * **5:1 ratio of SSM layers to Transformer attention layers** * **Linear-time gated convolution layers** for most steps * **Tiny rolling state** instead of KV-cache → up to **60% lower memory bandwidth** * **W4A16 stable quantization** across layers --- ### 🔗 **Normalization-Free Vision–Language Connector** A compact 2-layer MLP using **SiLU**, deliberately **removing RMSNorm** to avoid unstable activation ranges during static quantization. Ensures reliable deployment on W8A16/W4A16 pipelines. --- ### 🚗 **Automotive-Grade Multimodal Intelligence** Trained on **10M Infinity-MM samples** plus **200k automotive cockpit samples**, covering: * AI Sentinel (vehicle security) * AI Greeter (identity recognition) * Car Finder (parking localization) * Passenger safety monitoring Ensures robust performance across lighting, demographics, weather, and motion scenarios. --- ### ⚡ **Real NPU Benchmarks** Validated on **Qualcomm SA8295P NPU**: | Metric | Baseline (InternVL 2B) | **AutoNeural-VL** | | ------------------------- | ---------------------- | ----------------- | | **TTFT** | ~1.4 s | **~100 ms** | | **Max Vision Resolution** | 448×448 | **768×768** | | **RMS Quant Error** | 3.98% | **0.56%** | | **Decode Throughput** | 15 tok/s | **44 tok/s** | | **Context Length** | 1024 | **4096** | --- # **How to Use** > ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**. ### 1) Install Nexa-SDK Download the SDK,follow the installation steps provided on the model page. --- ### 2) Configure authentication Create an access token in the Model Hub, then run: ```bash nexa config set license '' ``` --- ### 3) Run the model ```bash nexa infer NexaAI/AutoNeural ``` ### Image input Drag and drop one or more image files into the terminal window. Multiple images can be processed with a single query. --- # **License** The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license. You may: * Use the model for **non-commercial** purposes * Modify and redistribute it with attribution For **commercial licensing**, please contact: **[dev@nexa.ai](mailto:dev@nexa.ai)**