LFM2.5-VL-1.6B UCF Crime — LoRA Adapters

Base model: LiquidAI/LFM2.5-VL-1.6B fine-tuned on the UCF Crime dataset for surveillance crime detection.

Fine-tuned 2× faster with Unsloth on ~26k surveillance images — entirely on a free Google Colab T4 GPU.

🔓 Training notebook (free Colab): Open in Colab — reproduce this fine-tune yourself, no paid GPU needed.

About this model

This repo contains LoRA adapter weights for LFM2.5-VL-1.6B, trained to analyze CCTV/surveillance images and detect harmful or criminal activity across 15 UCF Crime categories:

Abuse · Arrest · Arson · Assault · Burglary · Explosion · Fighting · Robbery · Shooting · Shoplifting · Stealing · Vandalism · Road Accident · Normal

The model targets a structured JSON output:

{
  "isHarm": true,
  "descriptionIfHarm": "The image depicts fighting."
}

When no harmful activity is detected:

{
  "isHarm": false
}

⚠️ Output format note: The model was trained toward JSON output but does not produce it by default. It works best when you explicitly instruct it to respond in JSON via the system prompt. The real gain is improved domain understanding of UCF Crime CCTV imagery — see evaluation below.

Training Details

Parameter	Value
Hardware	Google Colab T4 (free tier)
Training time	~10 hours
Eval time	~3 hours
Dataset split	80/20
Training samples	~26,000 images
Eval samples	~5,600 images
Epochs	1
LoRA rank (r)	16
LoRA alpha	16
Learning rate	2e-4
Batch size	2 (grad accum: 4)
Optimizer	adamw_8bit
Max seq length	2048
Vision layers	Frozen
Language layers	Fine-tuned

The full UCF Crime dataset has ~600k images (1,900+ CCTV videos). Training on the full set would take weeks on a free T4, so a balanced subset of 26k images (1,000 per crime class + equal normal samples) was used.

Evaluation (5,200 samples)

Evaluated against the base model using an LLM judge on a held-out test set.

Model	Accuracy
Base model (LFM2.5-VL-1.6B, untrained)	35.2%
LoRA fine-tuned (this model)	44.8%

The fine-tuned model achieves a +9.6 percentage point improvement over the base model on UCF Crime CCTV imagery, demonstrating that even a small 26k subset meaningfully improves domain understanding — all from a free Colab session.

Usage

Load with Unsloth (recommended)

from unsloth import FastVisionModel
from PIL import Image

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="rajofearth/lfm-ucf-unsloth",
    max_seq_length=2048,
    load_in_4bit=False,
    attn_implementation="eager",
)
FastVisionModel.for_inference(model)

image = Image.open("your_surveillance_image.jpg").convert("RGB")

system_prompt = """Analyze this frame with extreme caution. Detect ANY potential harm.
If ANY doubt, flag as harmful. Reply ONLY in strict JSON:
{"isHarm": true/false, "description": "brief exact reason only if true, else null"}. No explanations outside JSON."""

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": system_prompt},
            {"type": "image", "image": image}
        ]
    }
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(text=prompt, images=image, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
input_len = inputs["input_ids"].shape[1]
print(tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True))

Tip: Always include the system prompt instructing JSON output — the model is trained for it but won't default to it without being told.

Load with Transformers + PEFT

from transformers import AutoProcessor, AutoModelForVision2Seq
from peft import PeftModel

base = AutoModelForVision2Seq.from_pretrained("LiquidAI/LFM2.5-VL-1.6B")
model = PeftModel.from_pretrained(base, "rajofearth/lfm-ucf-unsloth")
processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B")

Reproduce This Fine-Tune

Want to train this yourself or adapt it to your own surveillance dataset? The full training notebook is free and public:

The notebook covers:

Setting up Unsloth + LFM2.5-VL-1.6B on a free T4
Loading and preprocessing the UCF Crime dataset
LoRA fine-tuning with vision layers frozen
Evaluation using an LLM judge
Exporting to GGUF for llama.cpp / Ollama

Resource	Link
Base model	LiquidAI/LFM2.5-VL-1.6B
GGUF version	rajofearth/lfm-ucf-gguf
Training notebook	Google Colab
Dataset	tanzzpatil/ucf-crime-small

Developed by: rajofearth · Created with Unsloth + Google Colab (free tier).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for rajofearth/lfm-ucf-unsloth

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-VL-1.6B

Adapter

(8)

this model

rajofearth
/

lfm-ucf-unsloth