osirisbrain's picture
Upload folder using huggingface_hub
dc84f65 verified
---
license: apache-2.0
language:
- en
- es
- zh
tags:
- mlx
- vision
- vlm
- osirisbrain
- apple-silicon
- qwen2.5-vl
base_model: Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
library_name: mlx
---
# OsirisHippocampus-Vision-v7-MLX
**The Hippocampus** — Osiris's visual cortex. A lightweight 3B VLM that processes screenshots, images, and visual input. Runs natively on Apple Silicon via MLX Metal.
## Architecture
- **Base Model:** Qwen2.5-VL-3B-Instruct (3B parameters, vision-language)
- **Format:** MLX 4-bit quantized (Apple Silicon native)
- **Size:** ~2.9 GB
- **Speed:** ~150+ tokens/sec on M2 Pro
- **Capabilities:** OCR, screenshot analysis, image understanding, visual QA
## Usage
```python
from mlx_vlm import load, generate
model, processor = load("osirisbrain/OsirisHippocampus-Vision-v7-MLX")
output = generate(model, processor, "What do you see in this image?", ["screenshot.png"])
```
## Credits
MLX conversion by [mlx-community](https://huggingface.co/mlx-community/Qwen2.5-VL-3B-Instruct-4bit).
Original model: [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) by Alibaba.