SmolVLM-256M-Instruct-Agri: Local Evaluation Pack

This folder contains a runnable local benchmark for:

  • Latency
  • Throughput (tokens/s)
  • Memory (process RSS peak)
  • Basic model/runtime metadata

Model evaluated: Dharunkumar9/SmolVLM-256M-Instruct-Agri

Files

  • benchmark.py — benchmark runner
  • benchmark_results.json — latest measured results
  • HF_MODEL_CARD_README.md — ready-to-upload model card text for Hugging Face

Latest measured results (this machine)

Environment:

  • Device: mps (Apple Metal)
  • Runtime dtype: torch.float16
  • transformers==5.3.0, torch==2.10.0

Model metadata:

  • Parameters: 256,484,928
  • Load time: 64.912 s (includes first download)

Benchmark cases:

Case Input tokens Generated tokens Latency (s) Tokens/s Peak RSS (MB)
text_only_short 25 64 3.274 19.547 451.31
image_short 887 46 5.285 8.704 546.58
image_long 913 16 3.382 4.731 595.73

Notes

  • First run includes model download time and can be much slower.
  • Peak RSS is process memory, not full system memory, and MPS GPU memory is not fully represented in RSS.
  • Throughput decreases for multimodal prompts because image processing and longer context increase cost.
Downloads last month
2
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Dataset used to train Dharunkumar9/SmolVLM-256M-Instruct-Agri