--- license: apache-2.0 language: - en - es - zh tags: - mlx - vision - vlm - osirisbrain - apple-silicon - qwen2.5-vl base_model: Qwen/Qwen2.5-VL-3B-Instruct pipeline_tag: image-text-to-text library_name: mlx --- # OsirisHippocampus-Vision-v7-MLX **The Hippocampus** — Osiris's visual cortex. A lightweight 3B VLM that processes screenshots, images, and visual input. Runs natively on Apple Silicon via MLX Metal. ## Architecture - **Base Model:** Qwen2.5-VL-3B-Instruct (3B parameters, vision-language) - **Format:** MLX 4-bit quantized (Apple Silicon native) - **Size:** ~2.9 GB - **Speed:** ~150+ tokens/sec on M2 Pro - **Capabilities:** OCR, screenshot analysis, image understanding, visual QA ## Usage ```python from mlx_vlm import load, generate model, processor = load("osirisbrain/OsirisHippocampus-Vision-v7-MLX") output = generate(model, processor, "What do you see in this image?", ["screenshot.png"]) ``` ## Credits MLX conversion by [mlx-community](https://huggingface.co/mlx-community/Qwen2.5-VL-3B-Instruct-4bit). Original model: [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) by Alibaba.