File size: 1,695 Bytes
29e555f
 
 
 
a4d1dcd
 
 
 
 
 
29e555f
 
a4d1dcd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
license: mit
datasets:
- carolineec/CyclePrefDB-I2T
- carolineec/CyclePrefDB-T2I
language:
- en
---

# Model Card for CycleReward-Combo

[Project page](https://cyclereward.github.io) | [Paper](https://huggingface.co/papers/2506.02095) | [Code](https://github.com/hjbahng/cyclereward)

Reward model for image-text alignment trained on both image-to-text and text-to-image comparison pairs from [CyclePrefDB-I2T](https://huggingface.co/datasets/carolineec/CyclePrefDB-I2T) and [CyclePrefDB-T2I](https://huggingface.co/datasets/carolineec/CyclePrefDB-T2I) datasets.

This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration.


## Loading the model

Download the `model.py`, `med_config.json` files and `blip` folder from this repository. You can load the pretrained model using the code below:


```
import torch
from PIL import Image
from model import CycleReward

device='cuda'
model = CycleReward.from_pretrained("carolineec/CycleReward-Combo")
model.to(device)
model.eval()

preprocess = model.preprocess
image_path = "cat.jpg"
caption = "a photo of a cat"
image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
print('prepared data')

score = model.score(image, caption) 
print('my score:', score.item())

```

## Citation

```
@article{bahng2025cyclereward,
title={Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences},
author= {Bahng, Hyojin and Chan, Caroline and Durand, Fredo and Isola, Phillip},
journal={arXiv preprint arXiv:2506.02095},
year={2025}
}
```