lfm2.5-350m-coreml / README.md
mlboydaisuke's picture
Upload README.md with huggingface_hub
d2a7fcd verified
---
license: other
license_name: lfm1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE
base_model: LiquidAI/LFM2.5-350M
tags:
- coreml
- ane
- lfm2
- on-device
- iphone
language:
- en
- ja
library_name: coreml
pipeline_tag: text-generation
---
## Use it from Swift
<!-- swift-usage-begin -->
### Add the package
`Package.swift`:
```swift
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),
// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
```
Platforms: iOS 18+ / macOS 15+.
### Download + chat (one call)
```swift
import CoreMLLLM
// First call pulls the bundle from this repo to Documents/Models/.
// Subsequent calls reuse the on-disk copy.
let llm = try await CoreMLLLM.load(repo: "mlboydaisuke/lfm2.5-350m-coreml")
let stream = try await llm.generate(
[CoreMLLLM.Message(role: .user, content: "Hello!")],
maxTokens: 256
)
for await chunk in stream {
print(chunk, terminator: "")
}
```
Multi-turn: keep an `[CoreMLLLM.Message]` array, append the
user/assistant turns, and pass the whole history to
`generate(_:)` again. Call `llm.reset()` to start a new
conversation (clears the KV cache).
<!-- swift-usage-end -->
# LFM2.5 350M — CoreML build for Apple Neural Engine
CoreML port of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M)
for the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) iOS / macOS
runtime. fp16, 97.8 % ANE-resident, **52 tok/s decode on iPhone 17 Pro**.
## Use it from Swift
### 1. Add the package
`Package.swift`:
```swift
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),
// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
```
Platforms: iOS 18+ / macOS 15+.
### 2. Download + chat (one-turn streaming)
```swift
import CoreMLLLM
let info = ModelDownloader.ModelInfo.lfm2_5_350m
let downloader = ModelDownloader.shared
// First launch: pulls ~810 MB from this repo to
// Documents/Models/lfm2.5-350m/. Subsequent launches no-op.
if !downloader.isDownloaded(info) {
_ = try await downloader.download(info)
}
let modelDir = downloader.localModelURL(for: info)!
.deletingLastPathComponent() // bundle root (parent of model.mlmodelc)
let llm = try await CoreMLLLM.load(from: modelDir)
let stream = try await llm.generate(
[CoreMLLLM.Message(role: .user, content: "Hello!")],
maxTokens: 256
)
for await chunk in stream {
print(chunk, terminator: "")
}
```
### 3. Multi-turn chat
```swift
var history: [CoreMLLLM.Message] = [
.init(role: .system, content: "You are a concise assistant."),
]
func reply(to user: String) async throws -> String {
history.append(.init(role: .user, content: user))
var out = ""
let stream = try await llm.generate(history, maxTokens: 512)
for await chunk in stream {
out += chunk
print(chunk, terminator: "")
}
history.append(.init(role: .assistant, content: out))
return out
}
llm.reset() // start a fresh conversation (clears KV + conv state)
```
`CoreMLLLM.load()` honours the model's ChatML template, the
`<|im_end|>` / `<|endoftext|>` EOS tokens, and the conv-state I/O
contract automatically — you don't pass any of that yourself.
### 4. Compute units
Defaults to `.cpuAndNeuralEngine` (the 52 tok/s number above).
Override at load time:
```swift
let llm = try await CoreMLLLM.load(
from: modelDir,
computeUnits: .cpuOnly, // or .all / .cpuAndGPU
)
```
Or via env (only affects LFM2):
```swift
setenv("LLM_LFM2_USE_CPU", "1", 1)
```
## App: CoreMLLLMChat
If you just want to try it without writing code, the example app
([Examples/CoreMLLLMChat](https://github.com/john-rocky/CoreML-LLM/tree/main/Examples/CoreMLLLMChat))
ships an LFM2.5 350M (ANE) entry in its model picker — open the
project in Xcode, run on a device, tap **Download**.
## Sideload (Mac → iPhone, no in-app download)
For development / offline use:
```bash
DEVICE=$(xcrun devicectl list devices | awk '/connected/{print $3}' | head -1)
xcrun devicectl device copy to --device "$DEVICE" \
--domain-type appDataContainer \
--domain-identifier com.example.CoreMLLLMChat \
--source ./lfm2.5-350m-coreml \
--destination Documents/Models/lfm2.5-350m \
--remove-existing-content true
```
Note: `xcrun devicectl` writes files as UID 0 / 0755, which the app
sandbox can't unlink later — the picker's trash button will fail with
a permission error. To clear a sideloaded copy run
[scripts/uninstall_sideloaded_model.sh](https://github.com/john-rocky/CoreML-LLM/blob/main/scripts/uninstall_sideloaded_model.sh)
from the host or uninstall the app to wipe the container.
## Files in this repo
```
model.mlmodelc/ compiled model — load via MLModel(contentsOf:)
model_config.json context_length, num_hidden_layers, lfm2_conv_l_pad …
hf_model/ tokenizer (ChatML, sanitised for swift-transformers)
```
## Architecture notes
* Hybrid: 6 attention layers (GQA + RoPE + QK-norm) + 10 short-conv
layers (depthwise causal Conv1d, kernel = 3).
* The conv-state rolling window is a regular **input/output tensor**,
not an MLState — the M-series ANE planner rejects the dual-state
combination (`kv_cache_0` + `conv_cache_0`) at predict-time
(`status=0x1d`).
* `L_pad = conv_L_cache = 3`. An earlier 16-wide padding fed enough
fp16 noise into the depthwise reduction that autoregressive output
collapsed to "kingkingking…" within a few tokens. Dropping the
padding fixed both correctness and ANE compatibility.
* Compute precision is the default fp16 — no fp32 fallback needed
once the padding is fixed.
* Chat template: ChatML (`<|im_start|>role\n…<|im_end|>\n`) wrapped
in `<|startoftext|>`. EOS = `<|im_end|>` (id 7) and `<|endoftext|>`
(id 2).
Full conversion + drift writeup:
[docs/LFM2_CONVERSION_FINDINGS.md](https://github.com/john-rocky/CoreML-LLM/blob/main/docs/LFM2_CONVERSION_FINDINGS.md).
## License
This CoreML port inherits **[LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)** from the [base model](https://huggingface.co/LiquidAI/LFM2.5-350M).
**Important — commercial use limit**: the LFM Open License grants free
commercial use only up to a **revenue threshold of US $10M / year**.
Above that threshold (and for non-501(c)(3) entities) you need a
separate commercial license from Liquid AI. See the upstream
[LICENSE](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)
and [Liquid AI commercial licensing](https://www.liquid.ai/) for
details.
The CoreML conversion code in this repo (the model class, conversion
scripts, runtime glue) is Apache 2.0 (parent project
[CoreML-LLM](https://github.com/john-rocky/CoreML-LLM)).