Upload README.md with huggingface_hub

d2a7fcd verified 28 days ago

6.76 kB

	---
	license: other
	license_name: lfm1.0
	license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE
	base_model: LiquidAI/LFM2.5-350M
	tags:
	- coreml
	- ane
	- lfm2
	- on-device
	- iphone
	language:
	- en
	- ja
	library_name: coreml
	pipeline_tag: text-generation
	---

	## Use it from Swift

	<!-- swift-usage-begin -->
	### Add the package

	`Package.swift`:

	```swift
	.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),

	// In your target:
	.product(name: "CoreMLLLM", package: "CoreML-LLM"),
	```

	Platforms: iOS 18+ / macOS 15+.

	### Download + chat (one call)

	```swift
	import CoreMLLLM

	// First call pulls the bundle from this repo to Documents/Models/.
	// Subsequent calls reuse the on-disk copy.
	let llm = try await CoreMLLLM.load(repo: "mlboydaisuke/lfm2.5-350m-coreml")

	let stream = try await llm.generate(
	[CoreMLLLM.Message(role: .user, content: "Hello!")],
	maxTokens: 256
	)
	for await chunk in stream {
	print(chunk, terminator: "")
	}
	```

	Multi-turn: keep an `[CoreMLLLM.Message]` array, append the
	user/assistant turns, and pass the whole history to
	`generate(_:)` again. Call `llm.reset()` to start a new
	conversation (clears the KV cache).
	<!-- swift-usage-end -->



	# LFM2.5 350M — CoreML build for Apple Neural Engine

	CoreML port of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M)
	for the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) iOS / macOS
	runtime. fp16, 97.8 % ANE-resident, 52 tok/s decode on iPhone 17 Pro.

	## Use it from Swift

	### 1. Add the package

	`Package.swift`:

	```swift
	.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),

	// In your target:
	.product(name: "CoreMLLLM", package: "CoreML-LLM"),
	```

	Platforms: iOS 18+ / macOS 15+.

	### 2. Download + chat (one-turn streaming)

	```swift
	import CoreMLLLM

	let info = ModelDownloader.ModelInfo.lfm2_5_350m
	let downloader = ModelDownloader.shared

	// First launch: pulls ~810 MB from this repo to
	// Documents/Models/lfm2.5-350m/. Subsequent launches no-op.
	if !downloader.isDownloaded(info) {
	_ = try await downloader.download(info)
	}

	let modelDir = downloader.localModelURL(for: info)!
	.deletingLastPathComponent() // bundle root (parent of model.mlmodelc)

	let llm = try await CoreMLLLM.load(from: modelDir)

	let stream = try await llm.generate(
	[CoreMLLLM.Message(role: .user, content: "Hello!")],
	maxTokens: 256
	)
	for await chunk in stream {
	print(chunk, terminator: "")
	}
	```

	### 3. Multi-turn chat

	```swift
	var history: [CoreMLLLM.Message] = [
	.init(role: .system, content: "You are a concise assistant."),
	]

	func reply(to user: String) async throws -> String {
	history.append(.init(role: .user, content: user))
	var out = ""
	let stream = try await llm.generate(history, maxTokens: 512)
	for await chunk in stream {
	out += chunk
	print(chunk, terminator: "")
	}
	history.append(.init(role: .assistant, content: out))
	return out
	}

	llm.reset() // start a fresh conversation (clears KV + conv state)
	```

	`CoreMLLLM.load()` honours the model's ChatML template, the
	`<\|im_end\|>` / `<\|endoftext\|>` EOS tokens, and the conv-state I/O
	contract automatically — you don't pass any of that yourself.

	### 4. Compute units

	Defaults to `.cpuAndNeuralEngine` (the 52 tok/s number above).
	Override at load time:

	```swift
	let llm = try await CoreMLLLM.load(
	from: modelDir,
	computeUnits: .cpuOnly, // or .all / .cpuAndGPU
	)
	```

	Or via env (only affects LFM2):

	```swift
	setenv("LLM_LFM2_USE_CPU", "1", 1)
	```

	## App: CoreMLLLMChat

	If you just want to try it without writing code, the example app
	([Examples/CoreMLLLMChat](https://github.com/john-rocky/CoreML-LLM/tree/main/Examples/CoreMLLLMChat))
	ships an LFM2.5 350M (ANE) entry in its model picker — open the
	project in Xcode, run on a device, tap Download.

	## Sideload (Mac → iPhone, no in-app download)

	For development / offline use:

	```bash
	DEVICE=$(xcrun devicectl list devices \| awk '/connected/{print $3}' \| head -1)
	xcrun devicectl device copy to --device "$DEVICE" \
	--domain-type appDataContainer \
	--domain-identifier com.example.CoreMLLLMChat \
	--source ./lfm2.5-350m-coreml \
	--destination Documents/Models/lfm2.5-350m \
	--remove-existing-content true
	```

	Note: `xcrun devicectl` writes files as UID 0 / 0755, which the app
	sandbox can't unlink later — the picker's trash button will fail with
	a permission error. To clear a sideloaded copy run
	[scripts/uninstall_sideloaded_model.sh](https://github.com/john-rocky/CoreML-LLM/blob/main/scripts/uninstall_sideloaded_model.sh)
	from the host or uninstall the app to wipe the container.

	## Files in this repo

	```
	model.mlmodelc/ compiled model — load via MLModel(contentsOf:)
	model_config.json context_length, num_hidden_layers, lfm2_conv_l_pad …
	hf_model/ tokenizer (ChatML, sanitised for swift-transformers)
	```

	## Architecture notes

	* Hybrid: 6 attention layers (GQA + RoPE + QK-norm) + 10 short-conv
	layers (depthwise causal Conv1d, kernel = 3).
	* The conv-state rolling window is a regular input/output tensor,
	not an MLState — the M-series ANE planner rejects the dual-state
	combination (`kv_cache_0` + `conv_cache_0`) at predict-time
	(`status=0x1d`).
	* `L_pad = conv_L_cache = 3`. An earlier 16-wide padding fed enough
	fp16 noise into the depthwise reduction that autoregressive output
	collapsed to "kingkingking…" within a few tokens. Dropping the
	padding fixed both correctness and ANE compatibility.
	* Compute precision is the default fp16 — no fp32 fallback needed
	once the padding is fixed.
	* Chat template: ChatML (`<\|im_start\|>role\n…<\|im_end\|>\n`) wrapped
	in `<\|startoftext\|>`. EOS = `<\|im_end\|>` (id 7) and `<\|endoftext\|>`
	(id 2).

	Full conversion + drift writeup:
	[docs/LFM2_CONVERSION_FINDINGS.md](https://github.com/john-rocky/CoreML-LLM/blob/main/docs/LFM2_CONVERSION_FINDINGS.md).

	## License

	This CoreML port inherits [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE) from the [base model](https://huggingface.co/LiquidAI/LFM2.5-350M).

	Important — commercial use limit: the LFM Open License grants free
	commercial use only up to a revenue threshold of US $10M / year.
	Above that threshold (and for non-501(c)(3) entities) you need a
	separate commercial license from Liquid AI. See the upstream
	[LICENSE](https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE)
	and [Liquid AI commercial licensing](https://www.liquid.ai/) for
	details.

	The CoreML conversion code in this repo (the model class, conversion
	scripts, runtime glue) is Apache 2.0 (parent project
	[CoreML-LLM](https://github.com/john-rocky/CoreML-LLM)).