Llama-3.2-3B

Run Llama-3.2-3B optimized for Intel NPUs with nexaSDK.

Quickstart

Install nexaSDK and create a free account at sdk.nexa.ai

Activate your device with your access token:

nexa config set license '<access_token>'

Run the model on Qualcomm NPU in one line:
```
nexa infer NexaAI/llama3.2-3B-intel-npu
```

Model Description

Llama-3.2-3B is a compact member of the Llama 3.2 family, designed to provide strong general-purpose language modeling in a lightweight 3B parameter footprint.
It balances efficiency with capability, making it well-suited for edge devices, prototyping, and applications where latency and resource constraints are critical.

Features

Lightweight architecture: 3B parameters optimized for fast inference and low memory usage.
Instruction-following: Tuned for prompts, Q&A, and step-by-step reasoning.
Multilingual capabilities: Covers a wide range of global languages at smaller scale.
Deployment flexibility: Runs efficiently on consumer hardware and server environments.

Use Cases

Conversational assistants and chatbots.
Educational tools and lightweight tutoring systems.
Prototyping and experimentation with large language models on limited resources.
Applications where cost or latency is a priority over sheer scale.

Inputs and Outputs

Input: Text prompts—questions, commands, or code snippets.
Output: Natural language responses including answers, explanations, or structured outputs.

License

Licensed under Meta Llama 3.2 Community License

References

Model card: https://huggingface.co/meta-llama/Llama-3.2-3B