Dolphin3.0-CoreML / README.md

ales27pm

Update README.md

63f4ec0 verified about 1 month ago

preview code

raw

history blame contribute delete

6.22 kB

metadata

license: mit

Model Card: Dolphin3.0-Llama3.2-3B (Core ML)

Model summary

This workflow produces Core ML model packages (.mlpackage) converted from the Hugging Face model cognitivecomputations/Dolphin3.0-Llama3.2-3B, outputting three variants:

FP16: Dolphin3.0-Llama3.2-3B-fp16.mlpackage
INT8: Dolphin3.0-Llama3.2-3B-int8.mlpackage
INT4-LUT: Dolphin3.0-Llama3.2-3B-int4-lut.mlpackage (palettized / lookup-table compressed weights) (Hugging Face)

The upstream model is a Dolphin instruction-tuned variant built on Meta Llama 3.2 3B. (Hugging Face)

Model details

Model family / architecture: decoder-only Transformer LLM (Llama family), ~3B parameters (as implied by the model name and base). (Hugging Face)
Primary use mode: chat / instruction-following using a ChatML-style formatting template. (Hugging Face)
Core ML format: converted as an mlprogram and therefore saved as a model package (.mlpackage) rather than .mlmodel. (apple.github.io)

What’s in the artifacts

*.mlpackage: Core ML “ML Program” packages (weights + program) suitable for on-device inference. ML Programs target iOS 15 / macOS 12+ by default (unless the converter explicitly overrides). (apple.github.io)
coreml_artifacts.json: conversion metadata emitted by the conversion script (contents depend on scripts/convert_to_coreml.py, but commonly includes conversion settings and model/tokenizer info).

Intended use

Intended: on-device text generation (assistant/chat, summarization, brainstorming, general Q&A) inside Apple ecosystem apps, with the speed/size tradeoffs offered by FP16 / INT8 / INT4-LUT variants. (apple.github.io)

Not intended / high-risk: medical/legal/financial decision-making, safety-critical control, or uses restricted by the Llama 3.2 Acceptable Use Policy (see “License & use policy”). (Oracle Docs)

Prompting / chat template

The upstream Dolphin model card indicates a ChatML template and provides an example “system/user/assistant” structure. Use the same formatting (or an equivalent wrapper in your app) to match expected behaviour. (Hugging Face)

Training / data provenance (upstream)

This Core ML model is a format conversion of the upstream weights; it does not introduce new training data by itself.

The upstream Dolphin model card lists a mixture of instruction/chat datasets and related sources used in the fine-tuning pipeline (e.g., FLAN, OASST, Capybara, etc.). (Hugging Face)

Quantization / compression notes (Core ML variants)

FP16 (-fp16): float16 weights and execution (Core ML Tools defaults ML Programs to float16 precision unless overridden). (apple.github.io)
INT8 (-int8): linear quantization of weights to reduce size; Core ML supports INT8 weight quantization as a compression technique. (apple.github.io)
INT4-LUT (-int4-lut): palettization (weight clustering) where weights are represented via indices into a lookup table (LUT) of centroids; this can achieve very aggressive compression. (apple.github.io)

Deployment caution: palettized weight representation for mlprogram is available for iOS 16 / macOS 13+ (per Core ML Tools docs). Plan your app’s minimum OS accordingly if you ship the INT4-LUT package. (apple.github.io)

Limitations

Like other LLMs, this model can:

Hallucinate facts and citations.
Reflect biases present in training data.
Produce unsafe or policy-violating content if prompted.

Additionally, the upstream Dolphin card explicitly positions the model as having reduced built-in “ethical guardrails” relative to many assistant-tuned models, meaning application-level safety controls (filters, refusal policies, logging, rate limits) are strongly recommended. (Hugging Face)

License & use policy (important)

This model inherits licensing obligations from Meta’s Llama 3.2 Community License (and any additional terms from the Dolphin distribution, if present).

Key requirements highlighted in the Llama 3.2 license text include:

If you redistribute the model (or a derivative), you must provide a copy of the license and prominently display “Built with Llama” in relevant product/docs. (Hugging Face)
Use must comply with the Llama 3.2 Acceptable Use Policy, which prohibits (among other things) illegal activity, harassment, disallowed professional practice, malware creation, and other harmful uses. (Oracle Docs)

Evaluation

The upstream Dolphin model card lists evaluations as TBD. Treat real-world performance (especially after quantization) as application-specific and validate on your target device(s). (Hugging Face)

Responsible deployment recommendations

Use the FP16 model as your baseline for quality testing; measure deltas for INT8 and INT4-LUT on your real prompts.
Add safety and policy enforcement in the app layer (particularly given Dolphin’s stated stance on guardrails). (Hugging Face)
Document OS requirements clearly: ML Program ⇒ iOS 15+, INT4-LUT palettization ⇒ iOS 16+. (apple.github.io)