nexaml commited on
Commit
4a5e4ad
·
verified ·
1 Parent(s): 2278e12

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ministral-3-3B-Instruct-2512
2
+
3
+ Run Ministral-3-3B on Apple ANE with NexaSDK.
4
+
5
+ ## Quickstart
6
+
7
+ Install nexaSDK and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
8
+
9
+ Activate your device with your access token:
10
+
11
+ ```bash
12
+ nexa config set license '<access_token>'
13
+ ```
14
+
15
+ Run the model locally in one line:
16
+
17
+ ```bash
18
+ nexa infer NexaAI/Ministral-3-3B-ANE
19
+ ```
20
+
21
+ ## Model Description
22
+ **Ministral-3-3B-Instruct-2512** is the instruction-tuned variant of Mistral AI’s smallest Ministral 3 model: a compact multimodal language model combining a ~3.4B-parameter language core with a 0.4B-parameter vision encoder.
23
+ It is post-trained in FP8 for instruction-following, making it well-suited for chat-style agents, tool use, and grounded reasoning on both text and images.
24
+ With a large 256k context window and efficient edge-oriented design, it targets real-time use on GPUs and other resource-constrained hardware.
25
+
26
+ ## Features
27
+ - **Multimodal (vision + text)**: Understands and reasons over images alongside text in a single conversation.
28
+ - **Instruction-tuned**: Optimized for following natural-language instructions, chat, and assistant-style workflows.
29
+ - **Agentic capabilities**: Native support for function calling and structured JSON-style outputs for tool and API orchestration.
30
+ - **Large context window**: Up to **256k tokens** for long documents, multi-step workflows, and complex sessions.
31
+ - **Edge-optimized FP8 weights**: FP8 checkpoint designed for efficient deployment and serving, including on a single modern GPU.
32
+ - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.
33
+ - **Part of the Ministral 3 family**: Seamlessly aligned with 3B/8B/14B base, instruct, and reasoning variants for scalable deployments.
34
+
35
+ ## Use Cases
36
+ - **Vision + language assistants**
37
+ - Image captioning and explanation (UI screenshots, photos, diagrams)
38
+ - Multimodal Q&A (e.g., “describe this chart and summarize its implications”)
39
+ - **Lightweight agents and tools**
40
+ - Function-calling workflows (retrieval, calculators, external APIs)
41
+ - JSON-structured responses for downstream automation
42
+ - **Text understanding & generation**
43
+ - Classification, tagging, routing, and extraction from long documents
44
+ - Short-form copywriting, drafting, and rewriting across multiple languages
45
+ - **Edge & low-resource deployments**
46
+ - On-device or near-edge assistants where latency, context length, and cost matter
47
+ - Local/private workloads that benefit from a small yet capable multimodal model
48
+
49
+ ## Inputs and Outputs
50
+
51
+ **Inputs**
52
+ - **Text-only prompts**
53
+ - Single-turn or multi-turn chat-style conversations (`system`, `user`, `assistant` roles).
54
+ - Long-context inputs up to the model’s context limit (e.g., documents, logs, transcripts).
55
+ - **Multimodal prompts**
56
+ - One or more images (e.g., URLs or image tensors) combined with text.
57
+ - **Structured tool schemas**
58
+ - Function / tool definitions for agentic workflows (JSON schemas describing functions and parameters).
59
+
60
+ **Outputs**
61
+ - **Generated text**
62
+ - Answers, explanations, step-by-step reasoning, summaries, and creative content.
63
+ - **Multimodal-aware responses**
64
+ - Text grounded in the provided images (descriptions, comparisons, localized details).
65
+ - **Structured tool calls**
66
+ - JSON-like tool call objects for function execution and programmatic integration.
67
+ - **Logits / probabilities (advanced)**
68
+ - For users accessing the raw model via low-level APIs, token-level scores for custom decoding or research.
69
+
70
+ ## License
71
+ This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact `dev@nexa.ai`