|
|
--- |
|
|
tags: |
|
|
- model_hub_mixin |
|
|
- pytorch_model_hub_mixin |
|
|
license: mit |
|
|
datasets: |
|
|
- carolineec/CyclePrefDB-I2T |
|
|
- carolineec/CyclePrefDB-T2I |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for CycleReward-Combo |
|
|
|
|
|
[Project page](https://cyclereward.github.io) | [Paper](https://huggingface.co/papers/2506.02095) | [Code](https://github.com/hjbahng/cyclereward) |
|
|
|
|
|
Reward model for image-text alignment trained on both image-to-text and text-to-image comparison pairs from [CyclePrefDB-I2T](https://huggingface.co/datasets/carolineec/CyclePrefDB-I2T) and [CyclePrefDB-T2I](https://huggingface.co/datasets/carolineec/CyclePrefDB-T2I) datasets. |
|
|
|
|
|
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration. |
|
|
|
|
|
|
|
|
## Loading the model |
|
|
|
|
|
Download the `model.py`, `med_config.json` files and `blip` folder from this repository. You can load the pretrained model using the code below: |
|
|
|
|
|
|
|
|
``` |
|
|
import torch |
|
|
from PIL import Image |
|
|
from model import CycleReward |
|
|
|
|
|
device='cuda' |
|
|
model = CycleReward.from_pretrained("carolineec/CycleReward-Combo") |
|
|
model.to(device) |
|
|
model.eval() |
|
|
|
|
|
preprocess = model.preprocess |
|
|
image_path = "cat.jpg" |
|
|
caption = "a photo of a cat" |
|
|
image = preprocess(Image.open(image_path)).unsqueeze(0).to(device) |
|
|
print('prepared data') |
|
|
|
|
|
score = model.score(image, caption) |
|
|
print('my score:', score.item()) |
|
|
|
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@article{bahng2025cyclereward, |
|
|
title={Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences}, |
|
|
author= {Bahng, Hyojin and Chan, Caroline and Durand, Fredo and Isola, Phillip}, |
|
|
journal={arXiv preprint arXiv:2506.02095}, |
|
|
year={2025} |
|
|
} |
|
|
``` |