File size: 6,221 Bytes
63f4ec0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: mit
---
## Model Card: Dolphin3.0-Llama3.2-3B (Core ML)

### Model summary

This workflow produces **Core ML model packages (`.mlpackage`)** converted from the Hugging Face model **`cognitivecomputations/Dolphin3.0-Llama3.2-3B`**, outputting three variants:

* **FP16**: `Dolphin3.0-Llama3.2-3B-fp16.mlpackage`
* **INT8**: `Dolphin3.0-Llama3.2-3B-int8.mlpackage`
* **INT4-LUT**: `Dolphin3.0-Llama3.2-3B-int4-lut.mlpackage` (palettized / lookup-table compressed weights) ([Hugging Face][1])

The upstream model is a **Dolphin instruction-tuned** variant built on **Meta Llama 3.2 3B**. ([Hugging Face][1])

---

### Model details

* **Model family / architecture:** decoder-only Transformer LLM (Llama family), ~3B parameters (as implied by the model name and base). ([Hugging Face][1])
* **Primary use mode:** chat / instruction-following using a **ChatML-style** formatting template. ([Hugging Face][1])
* **Core ML format:** converted as an **`mlprogram`** and therefore saved as a **model package (`.mlpackage`)** rather than `.mlmodel`. ([apple.github.io][2])

---

### What’s in the artifacts

* `*.mlpackage`: Core ML “ML Program” packages (weights + program) suitable for on-device inference. ML Programs target **iOS 15 / macOS 12+** by default (unless the converter explicitly overrides). ([apple.github.io][2])
* `coreml_artifacts.json`: conversion metadata emitted by the conversion script (contents depend on `scripts/convert_to_coreml.py`, but commonly includes conversion settings and model/tokenizer info).

---

### Intended use

**Intended:** on-device text generation (assistant/chat, summarization, brainstorming, general Q&A) inside Apple ecosystem apps, with the speed/size tradeoffs offered by FP16 / INT8 / INT4-LUT variants. ([apple.github.io][2])

**Not intended / high-risk:** medical/legal/financial decision-making, safety-critical control, or uses restricted by the Llama 3.2 Acceptable Use Policy (see “License & use policy”). ([Oracle Docs][3])

---

### Prompting / chat template

The upstream Dolphin model card indicates a **ChatML** template and provides an example “system/user/assistant” structure. Use the same formatting (or an equivalent wrapper in your app) to match expected behaviour. ([Hugging Face][1])

---

### Training / data provenance (upstream)

This Core ML model is a **format conversion** of the upstream weights; it does **not** introduce new training data by itself.

The upstream Dolphin model card lists a mixture of instruction/chat datasets and related sources used in the fine-tuning pipeline (e.g., FLAN, OASST, Capybara, etc.). ([Hugging Face][1])

---

### Quantization / compression notes (Core ML variants)

* **FP16 (`-fp16`)**: float16 weights and execution (Core ML Tools defaults ML Programs to float16 precision unless overridden). ([apple.github.io][2])
* **INT8 (`-int8`)**: linear quantization of weights to reduce size; Core ML supports INT8 weight quantization as a compression technique. ([apple.github.io][4])
* **INT4-LUT (`-int4-lut`)**: **palettization (weight clustering)** where weights are represented via indices into a **lookup table (LUT)** of centroids; this can achieve very aggressive compression. ([apple.github.io][5])

**Deployment caution:** palettized weight representation for `mlprogram` is available for **iOS 16 / macOS 13+** (per Core ML Tools docs). Plan your app’s minimum OS accordingly if you ship the INT4-LUT package. ([apple.github.io][5])

---

### Limitations

Like other LLMs, this model can:

* **Hallucinate** facts and citations.
* Reflect **biases** present in training data.
* Produce unsafe or policy-violating content if prompted.

Additionally, the upstream Dolphin card explicitly positions the model as having **reduced built-in “ethical guardrails”** relative to many assistant-tuned models, meaning **application-level safety controls** (filters, refusal policies, logging, rate limits) are strongly recommended. ([Hugging Face][1])

---

### License & use policy (important)

This model inherits licensing obligations from **Meta’s Llama 3.2 Community License** (and any additional terms from the Dolphin distribution, if present).

Key requirements highlighted in the Llama 3.2 license text include:

* If you redistribute the model (or a derivative), you must **provide a copy of the license** and prominently display **“Built with Llama”** in relevant product/docs. ([Hugging Face][6])
* Use must comply with the **Llama 3.2 Acceptable Use Policy**, which prohibits (among other things) illegal activity, harassment, disallowed professional practice, malware creation, and other harmful uses. ([Oracle Docs][3])

---

### Evaluation

The upstream Dolphin model card lists **evaluations as TBD**. Treat real-world performance (especially after quantization) as **application-specific** and validate on your target device(s). ([Hugging Face][1])

---

### Responsible deployment recommendations

* Use the **FP16** model as your baseline for quality testing; measure deltas for **INT8** and **INT4-LUT** on your real prompts.
* Add **safety and policy enforcement** in the app layer (particularly given Dolphin’s stated stance on guardrails). ([Hugging Face][1])
* Document OS requirements clearly: **ML Program ⇒ iOS 15+**, **INT4-LUT palettization ⇒ iOS 16+**. ([apple.github.io][2])

[1]: https://huggingface.co/dphn/Dolphin3.0-Llama3.2-3B/blob/ebfdc372541d6f699d05e83a2b0e0d4e1fdda828/README.md "README.md · dphn/Dolphin3.0-Llama3.2-3B at ebfdc372541d6f699d05e83a2b0e0d4e1fdda828"
[2]: https://apple.github.io/coremltools/docs-guides/source/convert-to-ml-program.html "Convert Models to ML Programs — Guide to Core ML Tools"
[3]: https://docs.oracle.com/cd/E17952_01/mysql-ai-9.5-license-com-en/license-llama-3-2-3b-instruct.html "2.19 Llama-3.2-3B-Instruct"
[4]: https://apple.github.io/coremltools/docs-guides/source/opt-overview.html?utm_source=chatgpt.com "Overview — Guide to Core ML Tools - Apple"
[5]: https://apple.github.io/coremltools/docs-guides/source/opt-palettization-overview.html "Palettization Overview — Guide to Core ML Tools"
[6]: https://huggingface.co/meta-llama/Llama-3.2-3B "meta-llama/Llama-3.2-3B · Hugging Face"