Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ministral-3-3B-Instruct-2512
|
| 2 |
+
|
| 3 |
+
Run Ministral-3-3B on Apple ANE with NexaSDK.
|
| 4 |
+
|
| 5 |
+
## Quickstart
|
| 6 |
+
|
| 7 |
+
Install nexaSDK and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
|
| 8 |
+
|
| 9 |
+
Activate your device with your access token:
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
nexa config set license '<access_token>'
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
Run the model locally in one line:
|
| 16 |
+
|
| 17 |
+
```bash
|
| 18 |
+
nexa infer NexaAI/Ministral-3-3B-ANE
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
## Model Description
|
| 22 |
+
**Ministral-3-3B-Instruct-2512** is the instruction-tuned variant of Mistral AI’s smallest Ministral 3 model: a compact multimodal language model combining a ~3.4B-parameter language core with a 0.4B-parameter vision encoder.
|
| 23 |
+
It is post-trained in FP8 for instruction-following, making it well-suited for chat-style agents, tool use, and grounded reasoning on both text and images.
|
| 24 |
+
With a large 256k context window and efficient edge-oriented design, it targets real-time use on GPUs and other resource-constrained hardware.
|
| 25 |
+
|
| 26 |
+
## Features
|
| 27 |
+
- **Multimodal (vision + text)**: Understands and reasons over images alongside text in a single conversation.
|
| 28 |
+
- **Instruction-tuned**: Optimized for following natural-language instructions, chat, and assistant-style workflows.
|
| 29 |
+
- **Agentic capabilities**: Native support for function calling and structured JSON-style outputs for tool and API orchestration.
|
| 30 |
+
- **Large context window**: Up to **256k tokens** for long documents, multi-step workflows, and complex sessions.
|
| 31 |
+
- **Edge-optimized FP8 weights**: FP8 checkpoint designed for efficient deployment and serving, including on a single modern GPU.
|
| 32 |
+
- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.
|
| 33 |
+
- **Part of the Ministral 3 family**: Seamlessly aligned with 3B/8B/14B base, instruct, and reasoning variants for scalable deployments.
|
| 34 |
+
|
| 35 |
+
## Use Cases
|
| 36 |
+
- **Vision + language assistants**
|
| 37 |
+
- Image captioning and explanation (UI screenshots, photos, diagrams)
|
| 38 |
+
- Multimodal Q&A (e.g., “describe this chart and summarize its implications”)
|
| 39 |
+
- **Lightweight agents and tools**
|
| 40 |
+
- Function-calling workflows (retrieval, calculators, external APIs)
|
| 41 |
+
- JSON-structured responses for downstream automation
|
| 42 |
+
- **Text understanding & generation**
|
| 43 |
+
- Classification, tagging, routing, and extraction from long documents
|
| 44 |
+
- Short-form copywriting, drafting, and rewriting across multiple languages
|
| 45 |
+
- **Edge & low-resource deployments**
|
| 46 |
+
- On-device or near-edge assistants where latency, context length, and cost matter
|
| 47 |
+
- Local/private workloads that benefit from a small yet capable multimodal model
|
| 48 |
+
|
| 49 |
+
## Inputs and Outputs
|
| 50 |
+
|
| 51 |
+
**Inputs**
|
| 52 |
+
- **Text-only prompts**
|
| 53 |
+
- Single-turn or multi-turn chat-style conversations (`system`, `user`, `assistant` roles).
|
| 54 |
+
- Long-context inputs up to the model’s context limit (e.g., documents, logs, transcripts).
|
| 55 |
+
- **Multimodal prompts**
|
| 56 |
+
- One or more images (e.g., URLs or image tensors) combined with text.
|
| 57 |
+
- **Structured tool schemas**
|
| 58 |
+
- Function / tool definitions for agentic workflows (JSON schemas describing functions and parameters).
|
| 59 |
+
|
| 60 |
+
**Outputs**
|
| 61 |
+
- **Generated text**
|
| 62 |
+
- Answers, explanations, step-by-step reasoning, summaries, and creative content.
|
| 63 |
+
- **Multimodal-aware responses**
|
| 64 |
+
- Text grounded in the provided images (descriptions, comparisons, localized details).
|
| 65 |
+
- **Structured tool calls**
|
| 66 |
+
- JSON-like tool call objects for function execution and programmatic integration.
|
| 67 |
+
- **Logits / probabilities (advanced)**
|
| 68 |
+
- For users accessing the raw model via low-level APIs, token-level scores for custom decoding or research.
|
| 69 |
+
|
| 70 |
+
## License
|
| 71 |
+
This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact `dev@nexa.ai`
|