mlx-community/LocateAnything-3B-4bit

MLX mixed 4/8-bit (mixed_4_8, ~6.7 bits/weight) conversion of nvidia/LocateAnything-3B, a vision-language model for fast, high-quality visual grounding (object detection, referring-expression grounding, pointing, GUI/text localization). Converted with mlx-vlm for Apple Silicon.

Box coordinates stay accurate (within ~1-2 quant levels of bf16); semantic labels may generalize (e.g. object instead of remote). Pure 4-bit was not released because quantizing the tied embed_tokens/lm_head destroys coordinate-token precision.

Requirements

Note: LocateAnything support in mlx-vlm currently lives in a pull request and is not yet in a released mlx-vlm. Until it merges, install from the branch that adds the locateanything model:

pip install "git+https://github.com/beshkenadze/mlx-vlm@feat/locateanything-3b"

Usage

python -m mlx_vlm.generate --model mlx-community/LocateAnything-3B-4bit \
  --image http://images.cocodataset.org/val2017/000000039769.jpg \
  --prompt "Detect all objects in the image." --max-tokens 128 --temperature 0.0

Output is structured coordinate tokens, e.g. <ref>remote</ref><box><64><152><273><244></box> with coordinates quantized to <0>..<1000> (normalized). Decoding modes: autoregressive (slow, default) and Parallel Box Decoding (fast/hybrid, ~2x faster) via generation_mode.

Attribution & license

  • Derived from nvidia/LocateAnything-3B — released under the NVIDIA License: non-commercial, research/academic use only (commercial use not permitted except by NVIDIA). Redistribution must retain this license and attribution.
  • Vision encoder: MoonViT-SO-400M (MIT). Language model: Qwen2.5-3B-Instruct (Qwen Research License). Part of the Eagle VLM family.

The LICENSE file from the source model is included in this repo.

Downloads last month
218
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/LocateAnything-3B-4bit

Base model

Qwen/Qwen2.5-3B
Quantized
(3)
this model