---
title: README
emoji: 😻
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
---

## About

**SynthFairCLIP** is a research initiative focused on fair vision–language models.

We study how to reduce bias in CLIP-style models by combining:

- **Real data** from large-scale datasets such as DataComp/CommonPool.
- **Synthetic data** generated with state-of-the-art diffusion models.
- **Curation and balancing** of demographic attributes across professions, activities and contexts.

---

### What we release

- **CLIP models** trained on hybrid real–synthetic data.
- **Large-scale WebDataset shards** of synthetic / hybrid image–text data.
- **Eval tools and benchmarks** for analysing bias and fairness in CLIP-like models.

[![GitHub – evaluation tools](https://img.shields.io/badge/GitHub-Evaluation%20Tools-black?logo=github)](https://github.com/lluisgomez/SynthFairCLIP/tree/main/evals)

If you use our resources, please consider citing the SynthFairCLIP project.

---

### Acknowledgement

We acknowledge EuroHPC JU for awarding the project ID EHPC-AI-2024A02-040 access to MareNostrum 5 hosted at BSC-CNS.