---
library_name: transformers
pipeline_tag: image-text-to-text
license: other
base_model:
- arcee-ai/Trinity-Nano-Preview
- moondream/moondream3-preview
tags:
- vision-language-model
- multimodal
- custom_code
- trinity
- moondream
---

# TrinityVLM

Trinity VLM is a vision model built on top of [arcee-ai/Trinity-Nano-Preview](https://huggingface.co/arcee-ai/Trinity-Nano-Preview) using the vision encoder extracted from [moondream/moondream3-preview](https://huggingface.co/moondream/moondream3-preview)

This is not inteded to be a good model, but is only an experiment in adding vision capabilites to a text-only model from scratch.

The model is trained using the following datamix:
- 20% anthracite-org/pixmo-cap-images
- 30% anthracite-org/pixmo-cap-qa-images
- 25% anthracite-org/pixmo-point-explanations-images
- 25% nvidia/Llama-Nemotron-Post-Training-Dataset chat examples with irrelevant PixMo images attached to avoid overfitting on image explaination when the prompt do not require image context.

The model is licensed under the BSL 1.1 terms of Moondream 3.