Upload folder using huggingface_hub

Browse files

Files changed (11) hide show

README.md +86 -0
hf_model/config.json +60 -0
hf_model/generation_config.json +7 -0
hf_model/tokenizer.json +0 -0
hf_model/tokenizer_config.json +21 -0
model.mlmodelc/analytics/coremldata.bin +3 -0
model.mlmodelc/coremldata.bin +3 -0
model.mlmodelc/metadata.json +154 -0
model.mlmodelc/model.mil +0 -0
model.mlmodelc/weights/weight.bin +3 -0
model_config.json +21 -0

README.md ADDED Viewed

	@@ -0,0 +1,86 @@

+---
+license: apache-2.0
+base_model: LiquidAI/LFM2.5-350M
+tags:
+- coreml
+- ane
+- lfm2
+- on-device
+- iphone
+language:
+- en
+- ja
+library_name: coreml
+pipeline_tag: text-generation
+---
+# LFM2.5 350M — CoreML build for Apple Neural Engine
+CoreML port of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M)
+for the [CoreML-LLM](https://github.com/john-rocky/CoreML-LLM) iOS / macOS
+runtime.  fp16 weights, 97.8 % ANE-resident on iPhone 17 Pro,
+**52 tok/s** decode in CoreMLLLMChat.
+## Files
+```
+model.mlmodelc/      # compiled — ready to MLModel(contentsOf:)
+model_config.json    # context_length, num_hidden_layers, lfm2_conv_l_pad …
+hf_model/            # tokenizer (ChatML, sanitised for swift-transformers)
+  ├── tokenizer.json
+  ├── tokenizer_config.json
+  ├── config.json
+  └── generation_config.json
+```
+## How to use
+### Sideload to CoreMLLLMChat
+```bash
+DEVICE=$(xcrun devicectl list devices | awk '/connected/{print $3}' | head -1)
+xcrun devicectl device copy to --device "$DEVICE" \
+    --domain-type appDataContainer \
+    --domain-identifier com.example.CoreMLLLMChat \
+    --source ./lfm2.5-350m-coreml \
+    --destination Documents/Models/lfm2.5-350m \
+    --remove-existing-content true
+```
+The chat app's model picker will surface "LFM2.5 350M (ANE)" once the
+folder is in place.
+### Direct CoreML load
+```python
+import coremltools as ct, numpy as np
+m = ct.models.CompiledMLModel("model.mlmodelc",
+                               compute_units=ct.ComputeUnit.CPU_AND_NE)
+state = m.make_state()
+ctx = 2048
+conv = np.zeros((10, 1024, 3), dtype=np.float16)  # n_conv × hidden × L_cache
+# … feed input_ids / position_ids / causal_mask / update_mask / conv_state_in,
+# carry conv_state_out forward as the next conv_state_in.
+```
+## Architecture notes
+* Hybrid: 6 attention layers (GQA + RoPE + QK-norm) + 10 short-conv
+  layers (depthwise causal Conv1d, kernel = 3).
+* The conv-state rolling window is passed as an **input/output tensor**,
+  not via MLState — the M-series ANE planner rejects the dual-state
+  combination (kv_cache + conv_cache) at predict-time.
+* `L_pad = conv_L_cache = 3`.  An earlier 16-wide padding fed enough
+  fp16 noise into the depthwise reduction that autoregressive output
+  collapsed to "kingkingking…" within a few tokens.
+* Compute precision is the default fp16; no fp32 fallback needed once
+  the padding is fixed.
+* Chat template: ChatML (`<|im_start|>role\n…<|im_end|>\n`) wrapped in
+  `<|startoftext|>`.  EOS = `<|im_end|>` (id 7) and `<|endoftext|>` (id 2).
+Full conversion + ANE-residency + drift writeup:
+[docs/LFM2_CONVERSION_FINDINGS.md](https://github.com/john-rocky/CoreML-LLM/blob/main/docs/LFM2_CONVERSION_FINDINGS.md).
+## License
+Apache 2.0 (inherited from the base model).

hf_model/config.json ADDED Viewed

	@@ -0,0 +1,60 @@

+{
+  "architectures": [
+    "Lfm2ForCausalLM"
+  ],
+  "block_auto_adjust_ff_dim": true,
+  "block_dim": 1024,
+  "block_ff_dim": 6656,
+  "block_ffn_dim_multiplier": 1.0,
+  "block_mlp_init_scale": 1.0,
+  "block_multiple_of": 256,
+  "block_norm_eps": 1e-05,
+  "block_out_init_scale": 1.0,
+  "block_use_swiglu": true,
+  "block_use_xavier_init": true,
+  "bos_token_id": 1,
+  "conv_L_cache": 3,
+  "conv_bias": false,
+  "conv_dim": 1024,
+  "conv_use_xavier_init": true,
+  "dtype": "bfloat16",
+  "eos_token_id": 7,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 6656,
+  "layer_types": [
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "full_attention",
+    "conv",
+    "full_attention",
+    "conv",
+    "full_attention",
+    "conv"
+  ],
+  "max_position_embeddings": 128000,
+  "model_type": "lfm2",
+  "norm_eps": 1e-05,
+  "num_attention_heads": 16,
+  "num_heads": 16,
+  "num_hidden_layers": 16,
+  "num_key_value_heads": 8,
+  "pad_token_id": 0,
+  "rope_parameters": {
+    "rope_theta": 1000000.0,
+    "rope_type": "default"
+  },
+  "tie_embedding": true,
+  "transformers_version": "5.0.0rc1",
+  "use_cache": true,
+  "use_pos_enc": true,
+  "vocab_size": 65536
+}

hf_model/generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 7,
+  "pad_token_id": 0,
+  "transformers_version": "5.0.0rc1"
+}

hf_model/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

hf_model/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "additional_special_tokens": null,
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "extra_special_tokens": [],
+  "is_local": true,
+  "legacy": false,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "model_specific_special_tokens": {},
+  "pad_token": "<|pad|>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "use_default_system_prompt": false,
+  "use_fast": true
+}

model.mlmodelc/analytics/coremldata.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:09f06c44cf914efb67d7e6a53bc758708cab426ecd9ebdaa14974dea1086e02d
+size 243

model.mlmodelc/coremldata.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad46897b7e9d632356a272d0ef2c3d3ca1c63242b09c640838944cc638f618e7
+size 581

model.mlmodelc/metadata.json ADDED Viewed

	@@ -0,0 +1,154 @@

+[
+  {
+    "metadataOutputVersion" : "3.0",
+    "storagePrecision" : "Float16",
+    "outputSchema" : [
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 1)",
+        "shortDescription" : "",
+        "shape" : "[1]",
+        "name" : "token_id",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 1)",
+        "shortDescription" : "",
+        "shape" : "[1]",
+        "name" : "token_logit",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 10 × 1024 × 3)",
+        "shortDescription" : "",
+        "shape" : "[10, 1024, 3]",
+        "name" : "conv_state_out",
+        "type" : "MultiArray"
+      }
+    ],
+    "modelParameters" : [
+    ],
+    "specificationVersion" : 10,
+    "mlProgramOperationTypeHistogram" : {
+      "Ios18.softmax" : 6,
+      "Ios19.mul" : 192,
+      "Ios18.matmul" : 12,
+      "Ios19.sliceUpdate" : 12,
+      "Ios19.stack" : 1,
+      "Ios18.gatherAlongAxis" : 1,
+      "Ios19.squeeze" : 61,
+      "Ios18.readState" : 12,
+      "Tile" : 24,
+      "Ios18.gather" : 3,
+      "Ios19.add" : 63,
+      "Ios18.layerNorm" : 45,
+      "Ios18.writeState" : 12,
+      "Ios19.concat" : 67,
+      "Ios19.transpose" : 132,
+      "Ios18.reduceArgmax" : 1,
+      "Ios19.expandDims" : 66,
+      "Ios18.conv" : 103,
+      "Ios18.silu" : 16,
+      "Ios18.cast" : 1,
+      "Ios19.split" : 67,
+      "Ios19.sliceByIndex" : 32,
+      "Ios19.sub" : 1,
+      "Ios19.select" : 1,
+      "Ios19.greaterEqual" : 1,
+      "Ios19.reshape" : 50
+    },
+    "computePrecision" : "Mixed (Float16, Int32, UInt16)",
+    "isUpdatable" : "0",
+    "stateSchema" : [
+      {
+        "dataType" : "Float16",
+        "isOptional" : "0",
+        "formattedType" : "State (Float16 12 × 8 × 2048 × 64)",
+        "shortDescription" : "",
+        "shape" : "[12, 8, 2048, 64]",
+        "name" : "kv_cache_0",
+        "type" : "State"
+      }
+    ],
+    "availability" : {
+      "macOS" : "16.0",
+      "tvOS" : "19.0",
+      "visionOS" : "3.0",
+      "watchOS" : "12.0",
+      "iOS" : "19.0",
+      "macCatalyst" : "19.0"
+    },
+    "modelType" : {
+      "name" : "MLModelType_mlProgram"
+    },
+    "userDefinedMetadata" : {
+      "com.github.apple.coremltools.conversion_date" : "2026-04-28",
+      "com.github.apple.coremltools.source" : "torch==2.11.0",
+      "com.github.apple.coremltools.version" : "9.0",
+      "com.github.apple.coremltools.source_dialect" : "TorchScript"
+    },
+    "inputSchema" : [
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 1 × 1)",
+        "shortDescription" : "",
+        "shape" : "[1, 1]",
+        "name" : "input_ids",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 1)",
+        "shortDescription" : "",
+        "shape" : "[1]",
+        "name" : "position_ids",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 2048)",
+        "shortDescription" : "",
+        "shape" : "[1, 1, 1, 2048]",
+        "name" : "causal_mask",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 1 × 1 × 2048 × 1)",
+        "shortDescription" : "",
+        "shape" : "[1, 1, 2048, 1]",
+        "name" : "update_mask",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 10 × 1024 × 3)",
+        "shortDescription" : "",
+        "shape" : "[10, 1024, 3]",
+        "name" : "conv_state_in",
+        "type" : "MultiArray"
+      }
+    ],
+    "generatedClassName" : "model",
+    "method" : "predict"
+  }
+]

model.mlmodelc/model.mil ADDED Viewed

The diff for this file is too large to render. See raw diff

model.mlmodelc/weights/weight.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c8fdbdae27808c26bb067e2c267aa8aee85cf153bb903ece94175623ee3650e
+size 844243968

model_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "model_name": "hf_model",
+  "architecture": "lfm2",
+  "hidden_size": 1024,
+  "num_hidden_layers": 16,
+  "num_attention_heads": 16,
+  "num_key_value_heads": 8,
+  "head_dim": 64,
+  "vocab_size": 65536,
+  "context_length": 2048,
+  "rms_norm_eps": 1e-05,
+  "bos_token_id": 1,
+  "eos_token_id": 7,
+  "quantization": "int4",
+  "compute_units": "ALL",
+  "parts": {
+    "model": "model.mlpackage"
+  },
+  "tokenizer_repo": "hf_model",
+  "lfm2_conv_l_pad": 3
+}