YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

SmolVLA

Run SmolVLA optimized for Qualcomm Dragonwing IQ9 device's NPU with nexaSDK.

Quickstart

  1. Install NexaSDK and create a free account at sdk.nexa.ai

  2. Activate your device with your access token:

    nexa config set license '<access_token>'
    
  3. Run the model on Qualcomm NPU in one line:

    nexa infer NexaAI/smolVLA-npu
    
  • Input: Enter input folder path,
  • Output: Returns result in npy file, or report error if any required input cannot be found

Model Description

SmolVLA is a lightweight Vision-Language-Action (VLA) model built for efficient multimodal understanding and real-time control.
Developed by the Hugging Face Smol team, it unifies vision, language, and action into one coherent model that can perceive, reason, and act — enabling autonomous agents and robotics to run entirely on local hardware.

Features

  • 🧠 Unified Perception-to-Action — Combines visual understanding, natural language reasoning, and control generation.
  • Lightweight & Fast — Designed for real-time inference on laptops, edge boards, and NPUs.
  • 👁️ Grounded Visual Reasoning — Links language instructions with specific visual elements and spatial context.
  • 🧩 Zero-Shot Multimodal Tasks — Performs visual question answering, task planning, and grounding without retraining.
  • 🔧 Extensible & Open — Compatible with robotics frameworks and multimodal datasets for custom fine-tuning.

Use Cases

  • Embodied AI: End-to-end perception-action loops for robotics and simulation.
  • On-Device Agents: Multimodal assistants that process camera feeds locally.
  • Autonomous Systems: Real-time visual reasoning in automotive or IoT devices.
  • Research: Alignment studies and grounded reasoning experiments.
  • Simulation Control: Vision-driven policy generation for digital twins or VR.

Inputs and Outputs

Input

  • Image(s) or video frames
  • Optional text instruction or query

Output

  • Action vector or control command
  • Optional textual reasoning or visual grounding map

License

This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact dev@nexa.ai.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/smolVLA-npu