Llama-3.2-3B
Run Llama-3.2-3B optimized for Intel NPUs with nexaSDK.
Quickstart
Install nexaSDK and create a free account at sdk.nexa.ai
Activate your device with your access token:
nexa config set license '<access_token>'Run the model on Qualcomm NPU in one line:
nexa infer NexaAI/llama3.2-3B-intel-npu
Model Description
Llama-3.2-3B is a compact member of the Llama 3.2 family, designed to provide strong general-purpose language modeling in a lightweight 3B parameter footprint.
It balances efficiency with capability, making it well-suited for edge devices, prototyping, and applications where latency and resource constraints are critical.
Features
- Lightweight architecture: 3B parameters optimized for fast inference and low memory usage.
- Instruction-following: Tuned for prompts, Q&A, and step-by-step reasoning.
- Multilingual capabilities: Covers a wide range of global languages at smaller scale.
- Deployment flexibility: Runs efficiently on consumer hardware and server environments.
Use Cases
- Conversational assistants and chatbots.
- Educational tools and lightweight tutoring systems.
- Prototyping and experimentation with large language models on limited resources.
- Applications where cost or latency is a priority over sheer scale.
Inputs and Outputs
Input: Text prompts—questions, commands, or code snippets.
Output: Natural language responses including answers, explanations, or structured outputs.
License
- Licensed under Meta Llama 3.2 Community License
References
- Model card: https://huggingface.co/meta-llama/Llama-3.2-3B