File size: 6,760 Bytes
25f89bc d2a7fcd 25f89bc 4731e2b 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc b5694e0 25f89bc d2a7fcd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | ---
license: other
license_name: lfm1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE
base_model: LiquidAI/LFM2.5-350M
tags:
- coreml
- ane
- lfm2
- on-device
- iphone
language:
- en
- ja
library_name: coreml
pipeline_tag: text-generation
---
## Use it from Swift
<!-- swift-usage-begin -->
### Add the package
`Package.swift`:
```swift
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),
// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
```
Platforms: iOS 18+ / macOS 15+.
### Download + chat (one call)
```swift
import CoreMLLLM
// First call pulls the bundle from this repo to Documents/Models/.
// Subsequent calls reuse the on-disk copy.
let llm = try await CoreMLLLM.load(repo: "mlboydaisuke/lfm2.5-350m-coreml")
let stream = try await llm.generate(
[CoreMLLLM.Message(role: .user, content: "Hello!")],
maxTokens: 256
)
for await chunk in stream {
print(chunk, terminator: "")
}
```
Multi-turn: keep an `[CoreMLLLM.Message]` array, append the
user/assistant turns, and pass the whole history to
`generate(_:)` again. Call `llm.reset()` to start a new
conversation (clears the KV cache).
<!-- swift-usage-end -->
# LFM2.5 350M — CoreML build for Apple Neural Engine
CoreML port of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M)
for the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) iOS / macOS
runtime. fp16, 97.8 % ANE-resident, **52 tok/s decode on iPhone 17 Pro**.
## Use it from Swift
### 1. Add the package
`Package.swift`:
```swift
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),
// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
```
Platforms: iOS 18+ / macOS 15+.
### 2. Download + chat (one-turn streaming)
```swift
import CoreMLLLM
let info = ModelDownloader.ModelInfo.lfm2_5_350m
let downloader = ModelDownloader.shared
// First launch: pulls ~810 MB from this repo to
// Documents/Models/lfm2.5-350m/. Subsequent launches no-op.
if !downloader.isDownloaded(info) {
_ = try await downloader.download(info)
}
let modelDir = downloader.localModelURL(for: info)!
.deletingLastPathComponent() // bundle root (parent of model.mlmodelc)
let llm = try await CoreMLLLM.load(from: modelDir)
let stream = try await llm.generate(
[CoreMLLLM.Message(role: .user, content: "Hello!")],
maxTokens: 256
)
for await chunk in stream {
print(chunk, terminator: "")
}
```
### 3. Multi-turn chat
```swift
var history: [CoreMLLLM.Message] = [
.init(role: .system, content: "You are a concise assistant."),
]
func reply(to user: String) async throws -> String {
history.append(.init(role: .user, content: user))
var out = ""
let stream = try await llm.generate(history, maxTokens: 512)
for await chunk in stream {
out += chunk
print(chunk, terminator: "")
}
history.append(.init(role: .assistant, content: out))
return out
}
llm.reset() // start a fresh conversation (clears KV + conv state)
```
`CoreMLLLM.load()` honours the model's ChatML template, the
`<|im_end|>` / `<|endoftext|>` EOS tokens, and the conv-state I/O
contract automatically — you don't pass any of that yourself.
### 4. Compute units
Defaults to `.cpuAndNeuralEngine` (the 52 tok/s number above).
Override at load time:
```swift
let llm = try await CoreMLLLM.load(
from: modelDir,
computeUnits: .cpuOnly, // or .all / .cpuAndGPU
)
```
Or via env (only affects LFM2):
```swift
setenv("LLM_LFM2_USE_CPU", "1", 1)
```
## App: CoreMLLLMChat
If you just want to try it without writing code, the example app
([Examples/CoreMLLLMChat](https://github.com/john-rocky/CoreML-LLM/tree/main/Examples/CoreMLLLMChat))
ships an LFM2.5 350M (ANE) entry in its model picker — open the
project in Xcode, run on a device, tap **Download**.
## Sideload (Mac → iPhone, no in-app download)
For development / offline use:
```bash
DEVICE=$(xcrun devicectl list devices | awk '/connected/{print $3}' | head -1)
xcrun devicectl device copy to --device "$DEVICE" \
--domain-type appDataContainer \
--domain-identifier com.example.CoreMLLLMChat \
--source ./lfm2.5-350m-coreml \
--destination Documents/Models/lfm2.5-350m \
--remove-existing-content true
```
Note: `xcrun devicectl` writes files as UID 0 / 0755, which the app
sandbox can't unlink later — the picker's trash button will fail with
a permission error. To clear a sideloaded copy run
[scripts/uninstall_sideloaded_model.sh](https://github.com/john-rocky/CoreML-LLM/blob/main/scripts/uninstall_sideloaded_model.sh)
from the host or uninstall the app to wipe the container.
## Files in this repo
```
model.mlmodelc/ compiled model — load via MLModel(contentsOf:)
model_config.json context_length, num_hidden_layers, lfm2_conv_l_pad …
hf_model/ tokenizer (ChatML, sanitised for swift-transformers)
```
## Architecture notes
* Hybrid: 6 attention layers (GQA + RoPE + QK-norm) + 10 short-conv
layers (depthwise causal Conv1d, kernel = 3).
* The conv-state rolling window is a regular **input/output tensor**,
not an MLState — the M-series ANE planner rejects the dual-state
combination (`kv_cache_0` + `conv_cache_0`) at predict-time
(`status=0x1d`).
* `L_pad = conv_L_cache = 3`. An earlier 16-wide padding fed enough
fp16 noise into the depthwise reduction that autoregressive output
collapsed to "kingkingking…" within a few tokens. Dropping the
padding fixed both correctness and ANE compatibility.
* Compute precision is the default fp16 — no fp32 fallback needed
once the padding is fixed.
* Chat template: ChatML (`<|im_start|>role\n…<|im_end|>\n`) wrapped
in `<|startoftext|>`. EOS = `<|im_end|>` (id 7) and `<|endoftext|>`
(id 2).
Full conversion + drift writeup:
[docs/LFM2_CONVERSION_FINDINGS.md](https://github.com/john-rocky/CoreML-LLM/blob/main/docs/LFM2_CONVERSION_FINDINGS.md).
## License
This CoreML port inherits **[LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)** from the [base model](https://huggingface.co/LiquidAI/LFM2.5-350M).
**Important — commercial use limit**: the LFM Open License grants free
commercial use only up to a **revenue threshold of US $10M / year**.
Above that threshold (and for non-501(c)(3) entities) you need a
separate commercial license from Liquid AI. See the upstream
[LICENSE](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)
and [Liquid AI commercial licensing](https://www.liquid.ai/) for
details.
The CoreML conversion code in this repo (the model class, conversion
scripts, runtime glue) is Apache 2.0 (parent project
[CoreML-LLM](https://github.com/john-rocky/CoreML-LLM)).
|