spatialft
/

LFM2-350M-StepGame-GGUF

+---
+license: apache-2.0
+base_model: LiquidAI/LFM2-350M
+tags:
+  - gguf
+  - spatial-reasoning
+  - lora
+  - fine-tuned
+  - lfm2
+  - stepgame
+  - llama-cpp
+datasets:
+  - ZhengyanShi/StepGame
+language:
+  - en
+pipeline_tag: text-generation
+library_name: llama-cpp
+model-index:
+  - name: LFM2-350M-StepGame
+    results:
+      - task:
+          type: text-generation
+          name: Spatial Reasoning (StepGame)
+        dataset:
+          name: StepGame (validation split)
+          type: ZhengyanShi/StepGame
+          split: validation
+        metrics:
+          - type: accuracy
+            value: 74.4
+            name: Overall Accuracy
+          - type: accuracy
+            value: 94.0
+            name: 1-hop Accuracy
+          - type: accuracy
+            value: 90.0
+            name: 2-hop Accuracy
+          - type: accuracy
+            value: 76.0
+            name: 3-hop Accuracy
+          - type: accuracy
+            value: 54.0
+            name: 4-hop Accuracy
+          - type: accuracy
+            value: 58.0
+            name: 5-hop Accuracy
+---
+# LFM2-350M-StepGame (GGUF)
+Fine-tuned [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) on the [StepGame](https://huggingface.co/datasets/ZhengyanShi/StepGame) spatial reasoning benchmark. The model answers directional relationship questions (left, right, above, below, upper-left, upper-right, lower-left, lower-right) given a sequence of positional statements.
+## Results
+| Metric | Baseline | Fine-tuned | Delta |
+|--------|----------|------------|-------|
+| Overall | 16.0% | **74.4%** | +58.4 |
+| 1-hop | 24.0% | **94.0%** | +70.0 |
+| 2-hop | 14.0% | **90.0%** | +76.0 |
+| 3-hop | 14.0% | **76.0%** | +62.0 |
+| 4-hop | 18.0% | **54.0%** | +36.0 |
+| 5-hop | 10.0% | **58.0%** | +48.0 |
+Evaluated on 250 held-out examples (50 per hop level) from the StepGame validation split.
+## How to use
+### llama.cpp / llama-server
+```bash
+llama-server \
+  --model LFM2-350M-StepGame-f16.gguf \
+  --ctx-size 8192 \
+  --host 0.0.0.0 --port 8080
+```
+### llama-cpp-python
+```python
+from llama_cpp import Llama
+llm = Llama(model_path="LFM2-350M-StepGame-f16.gguf", n_ctx=8192)
+output = llm.create_chat_completion(messages=[
+    {"role": "system", "content": (
+        "You are a spatial reasoning assistant. "
+        "Given a sequence of positional relationships between objects, "
+        "determine the spatial relationship between two specified objects. "
+        "Answer with a single direction from: "
+        "left, right, above, below, upper-left, upper-right, lower-left, lower-right."
+    )},
+    {"role": "user", "content": (
+        "J and A are in a vertical line with A below J.\n\n"
+        "What is the relation of the agent A to the agent J?"
+    )},
+])
+print(output["choices"][0]["message"]["content"])
+# => "below"
+```
+## Training details
+| Parameter | Value |
+|-----------|-------|
+| Base model | [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) |
+| Method | LoRA (PEFT) |
+| Rank (r) | 16 |
+| Alpha | 32 |
+| Dropout | 0.05 |
+| Target modules | q_proj, k_proj, v_proj, w1, w2, w3, in_proj, out_proj |
+| Training examples | 10,000 (2,000 per hop level, stratified) |
+| Epochs | 3 |
+| Learning rate | 2e-4 |
+| Batch size | 2 (x8 gradient accumulation) |
+| Optimizer | paged_adamw_8bit |
+| Quantization | QLoRA (NF4, double quant) |
+| Final loss | 0.2033 |
+| Training time | ~77 min (Colab T4) |
+## GGUF details
+| File | Quant | Size |
+|------|-------|------|
+| `LFM2-350M-StepGame-f16.gguf` | F16 | 679 MB |
+Produced by merging the LoRA adapter into the base model, then converting with llama.cpp `convert_hf_to_gguf.py`.
+## Dataset
+Training and evaluation data come from different splits of [ZhengyanShi/StepGame](https://huggingface.co/datasets/ZhengyanShi/StepGame):
+- **Training**: 10,000 examples from the `train` split (stratified, 2,000 per k-hop level)
+- **Evaluation**: 250 examples from the `validation` split (stratified, 50 per k-hop level)
+Examples with the "overlap" label were filtered out. Only the 8 cardinal/intercardinal directions are used.
+## Prompt format
+The model uses ChatML-style prompts (`<|im_start|>`/`<|im_end|>` tokens):
+```
+<|im_start|>system
+You are a spatial reasoning assistant. Given a sequence of positional relationships between objects, determine the spatial relationship between two specified objects. Answer with a single direction from: left, right, above, below, upper-left, upper-right, lower-left, lower-right.<|im_end|>
+<|im_start|>user
+{story}
+{question}<|im_end|>
+<|im_start|>assistant
+```
+## Source
+Project repository: [spatialft/spatialft.github.io](https://github.com/spatialft/spatialft.github.io)
+Built for AIPI 590.03 Intelligent Agents (Duke University).