--- license: other license_name: lfm1.0 license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE base_model: LiquidAI/LFM2.5-350M tags: - coreml - ane - lfm2 - on-device - iphone language: - en - ja library_name: coreml pipeline_tag: text-generation --- ## Use it from Swift ### Add the package `Package.swift`: ```swift .package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"), // In your target: .product(name: "CoreMLLLM", package: "CoreML-LLM"), ``` Platforms: iOS 18+ / macOS 15+. ### Download + chat (one call) ```swift import CoreMLLLM // First call pulls the bundle from this repo to Documents/Models/. // Subsequent calls reuse the on-disk copy. let llm = try await CoreMLLLM.load(repo: "mlboydaisuke/lfm2.5-350m-coreml") let stream = try await llm.generate( [CoreMLLLM.Message(role: .user, content: "Hello!")], maxTokens: 256 ) for await chunk in stream { print(chunk, terminator: "") } ``` Multi-turn: keep an `[CoreMLLLM.Message]` array, append the user/assistant turns, and pass the whole history to `generate(_:)` again. Call `llm.reset()` to start a new conversation (clears the KV cache). # LFM2.5 350M — CoreML build for Apple Neural Engine CoreML port of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M) for the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) iOS / macOS runtime. fp16, 97.8 % ANE-resident, **52 tok/s decode on iPhone 17 Pro**. ## Use it from Swift ### 1. Add the package `Package.swift`: ```swift .package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"), // In your target: .product(name: "CoreMLLLM", package: "CoreML-LLM"), ``` Platforms: iOS 18+ / macOS 15+. ### 2. Download + chat (one-turn streaming) ```swift import CoreMLLLM let info = ModelDownloader.ModelInfo.lfm2_5_350m let downloader = ModelDownloader.shared // First launch: pulls ~810 MB from this repo to // Documents/Models/lfm2.5-350m/. Subsequent launches no-op. if !downloader.isDownloaded(info) { _ = try await downloader.download(info) } let modelDir = downloader.localModelURL(for: info)! .deletingLastPathComponent() // bundle root (parent of model.mlmodelc) let llm = try await CoreMLLLM.load(from: modelDir) let stream = try await llm.generate( [CoreMLLLM.Message(role: .user, content: "Hello!")], maxTokens: 256 ) for await chunk in stream { print(chunk, terminator: "") } ``` ### 3. Multi-turn chat ```swift var history: [CoreMLLLM.Message] = [ .init(role: .system, content: "You are a concise assistant."), ] func reply(to user: String) async throws -> String { history.append(.init(role: .user, content: user)) var out = "" let stream = try await llm.generate(history, maxTokens: 512) for await chunk in stream { out += chunk print(chunk, terminator: "") } history.append(.init(role: .assistant, content: out)) return out } llm.reset() // start a fresh conversation (clears KV + conv state) ``` `CoreMLLLM.load()` honours the model's ChatML template, the `<|im_end|>` / `<|endoftext|>` EOS tokens, and the conv-state I/O contract automatically — you don't pass any of that yourself. ### 4. Compute units Defaults to `.cpuAndNeuralEngine` (the 52 tok/s number above). Override at load time: ```swift let llm = try await CoreMLLLM.load( from: modelDir, computeUnits: .cpuOnly, // or .all / .cpuAndGPU ) ``` Or via env (only affects LFM2): ```swift setenv("LLM_LFM2_USE_CPU", "1", 1) ``` ## App: CoreMLLLMChat If you just want to try it without writing code, the example app ([Examples/CoreMLLLMChat](https://github.com/john-rocky/CoreML-LLM/tree/main/Examples/CoreMLLLMChat)) ships an LFM2.5 350M (ANE) entry in its model picker — open the project in Xcode, run on a device, tap **Download**. ## Sideload (Mac → iPhone, no in-app download) For development / offline use: ```bash DEVICE=$(xcrun devicectl list devices | awk '/connected/{print $3}' | head -1) xcrun devicectl device copy to --device "$DEVICE" \ --domain-type appDataContainer \ --domain-identifier com.example.CoreMLLLMChat \ --source ./lfm2.5-350m-coreml \ --destination Documents/Models/lfm2.5-350m \ --remove-existing-content true ``` Note: `xcrun devicectl` writes files as UID 0 / 0755, which the app sandbox can't unlink later — the picker's trash button will fail with a permission error. To clear a sideloaded copy run [scripts/uninstall_sideloaded_model.sh](https://github.com/john-rocky/CoreML-LLM/blob/main/scripts/uninstall_sideloaded_model.sh) from the host or uninstall the app to wipe the container. ## Files in this repo ``` model.mlmodelc/ compiled model — load via MLModel(contentsOf:) model_config.json context_length, num_hidden_layers, lfm2_conv_l_pad … hf_model/ tokenizer (ChatML, sanitised for swift-transformers) ``` ## Architecture notes * Hybrid: 6 attention layers (GQA + RoPE + QK-norm) + 10 short-conv layers (depthwise causal Conv1d, kernel = 3). * The conv-state rolling window is a regular **input/output tensor**, not an MLState — the M-series ANE planner rejects the dual-state combination (`kv_cache_0` + `conv_cache_0`) at predict-time (`status=0x1d`). * `L_pad = conv_L_cache = 3`. An earlier 16-wide padding fed enough fp16 noise into the depthwise reduction that autoregressive output collapsed to "kingkingking…" within a few tokens. Dropping the padding fixed both correctness and ANE compatibility. * Compute precision is the default fp16 — no fp32 fallback needed once the padding is fixed. * Chat template: ChatML (`<|im_start|>role\n…<|im_end|>\n`) wrapped in `<|startoftext|>`. EOS = `<|im_end|>` (id 7) and `<|endoftext|>` (id 2). Full conversion + drift writeup: [docs/LFM2_CONVERSION_FINDINGS.md](https://github.com/john-rocky/CoreML-LLM/blob/main/docs/LFM2_CONVERSION_FINDINGS.md). ## License This CoreML port inherits **[LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)** from the [base model](https://huggingface.co/LiquidAI/LFM2.5-350M). **Important — commercial use limit**: the LFM Open License grants free commercial use only up to a **revenue threshold of US $10M / year**. Above that threshold (and for non-501(c)(3) entities) you need a separate commercial license from Liquid AI. See the upstream [LICENSE](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE) and [Liquid AI commercial licensing](https://www.liquid.ai/) for details. The CoreML conversion code in this repo (the model class, conversion scripts, runtime glue) is Apache 2.0 (parent project [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM)).