File size: 6,760 Bytes

---
license: other
license_name: lfm1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE
base_model: LiquidAI/LFM2.5-350M
tags:
- coreml
- ane
- lfm2
- on-device
- iphone
language:
- en
- ja
library_name: coreml
pipeline_tag: text-generation
---

## Use it from Swift

<!-- swift-usage-begin -->
### Add the package

`Package.swift`:

```swift
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),

// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
```

Platforms: iOS 18+ / macOS 15+.

### Download + chat (one call)

```swift
import CoreMLLLM

// First call pulls the bundle from this repo to Documents/Models/.
// Subsequent calls reuse the on-disk copy.
let llm = try await CoreMLLLM.load(repo: "mlboydaisuke/lfm2.5-350m-coreml")

let stream = try await llm.generate(
    [CoreMLLLM.Message(role: .user, content: "Hello!")],
    maxTokens: 256
)
for await chunk in stream {
    print(chunk, terminator: "")
}
```

Multi-turn: keep an `[CoreMLLLM.Message]` array, append the
user/assistant turns, and pass the whole history to
`generate(_:)` again.  Call `llm.reset()` to start a new
conversation (clears the KV cache).
<!-- swift-usage-end -->



# LFM2.5 350M — CoreML build for Apple Neural Engine

CoreML port of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M)
for the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) iOS / macOS
runtime. fp16, 97.8 % ANE-resident, **52 tok/s decode on iPhone 17 Pro**.

## Use it from Swift

### 1. Add the package

`Package.swift`:

```swift
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),

// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
```

Platforms: iOS 18+ / macOS 15+.

### 2. Download + chat (one-turn streaming)

```swift
import CoreMLLLM

let info = ModelDownloader.ModelInfo.lfm2_5_350m
let downloader = ModelDownloader.shared

// First launch: pulls ~810 MB from this repo to
// Documents/Models/lfm2.5-350m/.  Subsequent launches no-op.
if !downloader.isDownloaded(info) {
    _ = try await downloader.download(info)
}

let modelDir = downloader.localModelURL(for: info)!
    .deletingLastPathComponent()  // bundle root (parent of model.mlmodelc)

let llm = try await CoreMLLLM.load(from: modelDir)

let stream = try await llm.generate(
    [CoreMLLLM.Message(role: .user, content: "Hello!")],
    maxTokens: 256
)
for await chunk in stream {
    print(chunk, terminator: "")
}
```

### 3. Multi-turn chat

```swift
var history: [CoreMLLLM.Message] = [
    .init(role: .system, content: "You are a concise assistant."),
]

func reply(to user: String) async throws -> String {
    history.append(.init(role: .user, content: user))
    var out = ""
    let stream = try await llm.generate(history, maxTokens: 512)
    for await chunk in stream {
        out += chunk
        print(chunk, terminator: "")
    }
    history.append(.init(role: .assistant, content: out))
    return out
}

llm.reset()  // start a fresh conversation (clears KV + conv state)
```

`CoreMLLLM.load()` honours the model's ChatML template, the
`<|im_end|>` / `<|endoftext|>` EOS tokens, and the conv-state I/O
contract automatically — you don't pass any of that yourself.

### 4. Compute units

Defaults to `.cpuAndNeuralEngine` (the 52 tok/s number above).
Override at load time:

```swift
let llm = try await CoreMLLLM.load(
    from: modelDir,
    computeUnits: .cpuOnly,  // or .all / .cpuAndGPU
)
```

Or via env (only affects LFM2):

```swift
setenv("LLM_LFM2_USE_CPU", "1", 1)
```

## App: CoreMLLLMChat

If you just want to try it without writing code, the example app
([Examples/CoreMLLLMChat](https://github.com/john-rocky/CoreML-LLM/tree/main/Examples/CoreMLLLMChat))
ships an LFM2.5 350M (ANE) entry in its model picker — open the
project in Xcode, run on a device, tap **Download**.

## Sideload (Mac → iPhone, no in-app download)

For development / offline use:

```bash
DEVICE=$(xcrun devicectl list devices | awk '/connected/{print $3}' | head -1)
xcrun devicectl device copy to --device "$DEVICE" \
    --domain-type appDataContainer \
    --domain-identifier com.example.CoreMLLLMChat \
    --source ./lfm2.5-350m-coreml \
    --destination Documents/Models/lfm2.5-350m \
    --remove-existing-content true
```

Note: `xcrun devicectl` writes files as UID 0 / 0755, which the app
sandbox can't unlink later — the picker's trash button will fail with
a permission error. To clear a sideloaded copy run
[scripts/uninstall_sideloaded_model.sh](https://github.com/john-rocky/CoreML-LLM/blob/main/scripts/uninstall_sideloaded_model.sh)
from the host or uninstall the app to wipe the container.

## Files in this repo

```
model.mlmodelc/      compiled model — load via MLModel(contentsOf:)
model_config.json    context_length, num_hidden_layers, lfm2_conv_l_pad …
hf_model/            tokenizer (ChatML, sanitised for swift-transformers)
```

## Architecture notes

* Hybrid: 6 attention layers (GQA + RoPE + QK-norm) + 10 short-conv
  layers (depthwise causal Conv1d, kernel = 3).
* The conv-state rolling window is a regular **input/output tensor**,
  not an MLState — the M-series ANE planner rejects the dual-state
  combination (`kv_cache_0` + `conv_cache_0`) at predict-time
  (`status=0x1d`).
* `L_pad = conv_L_cache = 3`. An earlier 16-wide padding fed enough
  fp16 noise into the depthwise reduction that autoregressive output
  collapsed to "kingkingking…" within a few tokens. Dropping the
  padding fixed both correctness and ANE compatibility.
* Compute precision is the default fp16 — no fp32 fallback needed
  once the padding is fixed.
* Chat template: ChatML (`<|im_start|>role\n…<|im_end|>\n`) wrapped
  in `<|startoftext|>`. EOS = `<|im_end|>` (id 7) and `<|endoftext|>`
  (id 2).

Full conversion + drift writeup:
[docs/LFM2_CONVERSION_FINDINGS.md](https://github.com/john-rocky/CoreML-LLM/blob/main/docs/LFM2_CONVERSION_FINDINGS.md).

## License

This CoreML port inherits **[LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)** from the [base model](https://huggingface.co/LiquidAI/LFM2.5-350M).

**Important — commercial use limit**: the LFM Open License grants free
commercial use only up to a **revenue threshold of US $10M / year**.
Above that threshold (and for non-501(c)(3) entities) you need a
separate commercial license from Liquid AI. See the upstream
[LICENSE](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)
and [Liquid AI commercial licensing](https://www.liquid.ai/) for
details.

The CoreML conversion code in this repo (the model class, conversion
scripts, runtime glue) is Apache 2.0 (parent project
[CoreML-LLM](https://github.com/john-rocky/CoreML-LLM)).