LFM2.5-VL-1.6B UCF Crime — LoRA Adapters

Base model: LiquidAI/LFM2.5-VL-1.6B fine-tuned on the UCF Crime dataset for surveillance crime detection.

Fine-tuned 2× faster with Unsloth on ~26k surveillance images — entirely on a free Google Colab T4 GPU.

🔓 Training notebook (free Colab): Open in Colab — reproduce this fine-tune yourself, no paid GPU needed.


About this model

This repo contains LoRA adapter weights for LFM2.5-VL-1.6B, trained to analyze CCTV/surveillance images and detect harmful or criminal activity across 15 UCF Crime categories:

Abuse · Arrest · Arson · Assault · Burglary · Explosion · Fighting · Robbery · Shooting · Shoplifting · Stealing · Vandalism · Road Accident · Normal

The model targets a structured JSON output:

{
  "isHarm": true,
  "descriptionIfHarm": "The image depicts fighting."
}

When no harmful activity is detected:

{
  "isHarm": false
}

⚠️ Output format note: The model was trained toward JSON output but does not produce it by default. It works best when you explicitly instruct it to respond in JSON via the system prompt. The real gain is improved domain understanding of UCF Crime CCTV imagery — see evaluation below.


Training Details

Parameter Value
Hardware Google Colab T4 (free tier)
Training time ~10 hours
Eval time ~3 hours
Dataset split 80/20
Training samples ~26,000 images
Eval samples ~5,600 images
Epochs 1
LoRA rank (r) 16
LoRA alpha 16
Learning rate 2e-4
Batch size 2 (grad accum: 4)
Optimizer adamw_8bit
Max seq length 2048
Vision layers Frozen
Language layers Fine-tuned

The full UCF Crime dataset has ~600k images (1,900+ CCTV videos). Training on the full set would take weeks on a free T4, so a balanced subset of 26k images (1,000 per crime class + equal normal samples) was used.


Evaluation (5,200 samples)

Evaluated against the base model using an LLM judge on a held-out test set.

eval chart

Model Accuracy
Base model (LFM2.5-VL-1.6B, untrained) 35.2%
LoRA fine-tuned (this model) 44.8%

The fine-tuned model achieves a +9.6 percentage point improvement over the base model on UCF Crime CCTV imagery, demonstrating that even a small 26k subset meaningfully improves domain understanding — all from a free Colab session.


Usage

Load with Unsloth (recommended)

from unsloth import FastVisionModel
from PIL import Image

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="rajofearth/lfm-ucf-unsloth",
    max_seq_length=2048,
    load_in_4bit=False,
    attn_implementation="eager",
)
FastVisionModel.for_inference(model)

image = Image.open("your_surveillance_image.jpg").convert("RGB")

system_prompt = """Analyze this frame with extreme caution. Detect ANY potential harm.
If ANY doubt, flag as harmful. Reply ONLY in strict JSON:
{"isHarm": true/false, "description": "brief exact reason only if true, else null"}. No explanations outside JSON."""

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": system_prompt},
            {"type": "image", "image": image}
        ]
    }
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(text=prompt, images=image, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
input_len = inputs["input_ids"].shape[1]
print(tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True))

Tip: Always include the system prompt instructing JSON output — the model is trained for it but won't default to it without being told.

Load with Transformers + PEFT

from transformers import AutoProcessor, AutoModelForVision2Seq
from peft import PeftModel

base = AutoModelForVision2Seq.from_pretrained("LiquidAI/LFM2.5-VL-1.6B")
model = PeftModel.from_pretrained(base, "rajofearth/lfm-ucf-unsloth")
processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B")

Reproduce This Fine-Tune

Want to train this yourself or adapt it to your own surveillance dataset? The full training notebook is free and public:

Open In Colab

The notebook covers:

  • Setting up Unsloth + LFM2.5-VL-1.6B on a free T4
  • Loading and preprocessing the UCF Crime dataset
  • LoRA fine-tuning with vision layers frozen
  • Evaluation using an LLM judge
  • Exporting to GGUF for llama.cpp / Ollama

Related

Resource Link
Base model LiquidAI/LFM2.5-VL-1.6B
GGUF version rajofearth/lfm-ucf-gguf
Training notebook Google Colab
Dataset tanzzpatil/ucf-crime-small

Developed by: rajofearth · Created with Unsloth + Google Colab (free tier).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rajofearth/lfm-ucf-unsloth

Adapter
(8)
this model