Image-Text-to-Text
Transformers
How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="kanashi6/UFO")
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("kanashi6/UFO", dtype="auto")
Quick Links

This repository contains the model presented in the paper UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface.

UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space. It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks.

For more details, please refer to the original paper and the GitHub repository:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for kanashi6/UFO