Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,105 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
## Model Card: Dolphin3.0-Llama3.2-3B (Core ML)
|
| 5 |
+
|
| 6 |
+
### Model summary
|
| 7 |
+
|
| 8 |
+
This workflow produces **Core ML model packages (`.mlpackage`)** converted from the Hugging Face model **`cognitivecomputations/Dolphin3.0-Llama3.2-3B`**, outputting three variants:
|
| 9 |
+
|
| 10 |
+
* **FP16**: `Dolphin3.0-Llama3.2-3B-fp16.mlpackage`
|
| 11 |
+
* **INT8**: `Dolphin3.0-Llama3.2-3B-int8.mlpackage`
|
| 12 |
+
* **INT4-LUT**: `Dolphin3.0-Llama3.2-3B-int4-lut.mlpackage` (palettized / lookup-table compressed weights) ([Hugging Face][1])
|
| 13 |
+
|
| 14 |
+
The upstream model is a **Dolphin instruction-tuned** variant built on **Meta Llama 3.2 3B**. ([Hugging Face][1])
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
### Model details
|
| 19 |
+
|
| 20 |
+
* **Model family / architecture:** decoder-only Transformer LLM (Llama family), ~3B parameters (as implied by the model name and base). ([Hugging Face][1])
|
| 21 |
+
* **Primary use mode:** chat / instruction-following using a **ChatML-style** formatting template. ([Hugging Face][1])
|
| 22 |
+
* **Core ML format:** converted as an **`mlprogram`** and therefore saved as a **model package (`.mlpackage`)** rather than `.mlmodel`. ([apple.github.io][2])
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
### What’s in the artifacts
|
| 27 |
+
|
| 28 |
+
* `*.mlpackage`: Core ML “ML Program” packages (weights + program) suitable for on-device inference. ML Programs target **iOS 15 / macOS 12+** by default (unless the converter explicitly overrides). ([apple.github.io][2])
|
| 29 |
+
* `coreml_artifacts.json`: conversion metadata emitted by the conversion script (contents depend on `scripts/convert_to_coreml.py`, but commonly includes conversion settings and model/tokenizer info).
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
### Intended use
|
| 34 |
+
|
| 35 |
+
**Intended:** on-device text generation (assistant/chat, summarization, brainstorming, general Q&A) inside Apple ecosystem apps, with the speed/size tradeoffs offered by FP16 / INT8 / INT4-LUT variants. ([apple.github.io][2])
|
| 36 |
+
|
| 37 |
+
**Not intended / high-risk:** medical/legal/financial decision-making, safety-critical control, or uses restricted by the Llama 3.2 Acceptable Use Policy (see “License & use policy”). ([Oracle Docs][3])
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
### Prompting / chat template
|
| 42 |
+
|
| 43 |
+
The upstream Dolphin model card indicates a **ChatML** template and provides an example “system/user/assistant” structure. Use the same formatting (or an equivalent wrapper in your app) to match expected behaviour. ([Hugging Face][1])
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
### Training / data provenance (upstream)
|
| 48 |
+
|
| 49 |
+
This Core ML model is a **format conversion** of the upstream weights; it does **not** introduce new training data by itself.
|
| 50 |
+
|
| 51 |
+
The upstream Dolphin model card lists a mixture of instruction/chat datasets and related sources used in the fine-tuning pipeline (e.g., FLAN, OASST, Capybara, etc.). ([Hugging Face][1])
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
### Quantization / compression notes (Core ML variants)
|
| 56 |
+
|
| 57 |
+
* **FP16 (`-fp16`)**: float16 weights and execution (Core ML Tools defaults ML Programs to float16 precision unless overridden). ([apple.github.io][2])
|
| 58 |
+
* **INT8 (`-int8`)**: linear quantization of weights to reduce size; Core ML supports INT8 weight quantization as a compression technique. ([apple.github.io][4])
|
| 59 |
+
* **INT4-LUT (`-int4-lut`)**: **palettization (weight clustering)** where weights are represented via indices into a **lookup table (LUT)** of centroids; this can achieve very aggressive compression. ([apple.github.io][5])
|
| 60 |
+
|
| 61 |
+
**Deployment caution:** palettized weight representation for `mlprogram` is available for **iOS 16 / macOS 13+** (per Core ML Tools docs). Plan your app’s minimum OS accordingly if you ship the INT4-LUT package. ([apple.github.io][5])
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
### Limitations
|
| 66 |
+
|
| 67 |
+
Like other LLMs, this model can:
|
| 68 |
+
|
| 69 |
+
* **Hallucinate** facts and citations.
|
| 70 |
+
* Reflect **biases** present in training data.
|
| 71 |
+
* Produce unsafe or policy-violating content if prompted.
|
| 72 |
+
|
| 73 |
+
Additionally, the upstream Dolphin card explicitly positions the model as having **reduced built-in “ethical guardrails”** relative to many assistant-tuned models, meaning **application-level safety controls** (filters, refusal policies, logging, rate limits) are strongly recommended. ([Hugging Face][1])
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
### License & use policy (important)
|
| 78 |
+
|
| 79 |
+
This model inherits licensing obligations from **Meta’s Llama 3.2 Community License** (and any additional terms from the Dolphin distribution, if present).
|
| 80 |
+
|
| 81 |
+
Key requirements highlighted in the Llama 3.2 license text include:
|
| 82 |
+
|
| 83 |
+
* If you redistribute the model (or a derivative), you must **provide a copy of the license** and prominently display **“Built with Llama”** in relevant product/docs. ([Hugging Face][6])
|
| 84 |
+
* Use must comply with the **Llama 3.2 Acceptable Use Policy**, which prohibits (among other things) illegal activity, harassment, disallowed professional practice, malware creation, and other harmful uses. ([Oracle Docs][3])
|
| 85 |
+
|
| 86 |
+
---
|
| 87 |
+
|
| 88 |
+
### Evaluation
|
| 89 |
+
|
| 90 |
+
The upstream Dolphin model card lists **evaluations as TBD**. Treat real-world performance (especially after quantization) as **application-specific** and validate on your target device(s). ([Hugging Face][1])
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
### Responsible deployment recommendations
|
| 95 |
+
|
| 96 |
+
* Use the **FP16** model as your baseline for quality testing; measure deltas for **INT8** and **INT4-LUT** on your real prompts.
|
| 97 |
+
* Add **safety and policy enforcement** in the app layer (particularly given Dolphin’s stated stance on guardrails). ([Hugging Face][1])
|
| 98 |
+
* Document OS requirements clearly: **ML Program ⇒ iOS 15+**, **INT4-LUT palettization ⇒ iOS 16+**. ([apple.github.io][2])
|
| 99 |
+
|
| 100 |
+
[1]: https://huggingface.co/dphn/Dolphin3.0-Llama3.2-3B/blob/ebfdc372541d6f699d05e83a2b0e0d4e1fdda828/README.md "README.md · dphn/Dolphin3.0-Llama3.2-3B at ebfdc372541d6f699d05e83a2b0e0d4e1fdda828"
|
| 101 |
+
[2]: https://apple.github.io/coremltools/docs-guides/source/convert-to-ml-program.html "Convert Models to ML Programs — Guide to Core ML Tools"
|
| 102 |
+
[3]: https://docs.oracle.com/cd/E17952_01/mysql-ai-9.5-license-com-en/license-llama-3-2-3b-instruct.html "2.19 Llama-3.2-3B-Instruct"
|
| 103 |
+
[4]: https://apple.github.io/coremltools/docs-guides/source/opt-overview.html?utm_source=chatgpt.com "Overview — Guide to Core ML Tools - Apple"
|
| 104 |
+
[5]: https://apple.github.io/coremltools/docs-guides/source/opt-palettization-overview.html "Palettization Overview — Guide to Core ML Tools"
|
| 105 |
+
[6]: https://huggingface.co/meta-llama/Llama-3.2-3B "meta-llama/Llama-3.2-3B · Hugging Face"
|