LFM2-350M-StepGame (GGUF)

Fine-tuned LiquidAI/LFM2-350M on the StepGame spatial reasoning benchmark. The model answers directional relationship questions (left, right, above, below, upper-left, upper-right, lower-left, lower-right) given a sequence of positional statements.

Results

Metric Baseline Fine-tuned Delta
Overall 16.0% 74.4% +58.4
1-hop 24.0% 94.0% +70.0
2-hop 14.0% 90.0% +76.0
3-hop 14.0% 76.0% +62.0
4-hop 18.0% 54.0% +36.0
5-hop 10.0% 58.0% +48.0

Evaluated on 250 held-out examples (50 per hop level) from the StepGame validation split.

How to use

llama.cpp / llama-server

llama-server \
  --model LFM2-350M-StepGame-f16.gguf \
  --ctx-size 8192 \
  --host 0.0.0.0 --port 8080

llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="LFM2-350M-StepGame-f16.gguf", n_ctx=8192)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": (
        "You are a spatial reasoning assistant. "
        "Given a sequence of positional relationships between objects, "
        "determine the spatial relationship between two specified objects. "
        "Answer with a single direction from: "
        "left, right, above, below, upper-left, upper-right, lower-left, lower-right."
    )},
    {"role": "user", "content": (
        "J and A are in a vertical line with A below J.\n\n"
        "What is the relation of the agent A to the agent J?"
    )},
])
print(output["choices"][0]["message"]["content"])
# => "below"

Training details

Parameter Value
Base model LiquidAI/LFM2-350M
Method LoRA (PEFT)
Rank (r) 16
Alpha 32
Dropout 0.05
Target modules q_proj, k_proj, v_proj, w1, w2, w3, in_proj, out_proj
Training examples 10,000 (2,000 per hop level, stratified)
Epochs 3
Learning rate 2e-4
Batch size 2 (x8 gradient accumulation)
Optimizer paged_adamw_8bit
Quantization QLoRA (NF4, double quant)
Final loss 0.2033
Training time ~77 min (Colab T4)

GGUF details

File Quant Size
LFM2-350M-StepGame-f16.gguf F16 679 MB

Produced by merging the LoRA adapter into the base model, then converting with llama.cpp convert_hf_to_gguf.py.

Dataset

Training and evaluation data come from different splits of ZhengyanShi/StepGame:

  • Training: 10,000 examples from the train split (stratified, 2,000 per k-hop level)
  • Evaluation: 250 examples from the validation split (stratified, 50 per k-hop level)

Examples with the "overlap" label were filtered out. Only the 8 cardinal/intercardinal directions are used.

Prompt format

The model uses ChatML-style prompts (<|im_start|>/<|im_end|> tokens):

<|im_start|>system
You are a spatial reasoning assistant. Given a sequence of positional relationships between objects, determine the spatial relationship between two specified objects. Answer with a single direction from: left, right, above, below, upper-left, upper-right, lower-left, lower-right.<|im_end|>
<|im_start|>user
{story}

{question}<|im_end|>
<|im_start|>assistant

Source

Project repository: spatialft/spatialft.github.io

Built for AIPI 590.03 Intelligent Agents (Duke University).

Downloads last month
27
GGUF
Model size
0.4B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spatialft/LFM2-350M-StepGame-GGUF

Adapter
(16)
this model

Dataset used to train spatialft/LFM2-350M-StepGame-GGUF

Evaluation results