Ladder

File size: 2,699 Bytes

---
license: mit
language:
- en
---
# 🪜 LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers

[![Project](https://img.shields.io/badge/Project-%23dfb317)](https://shantanu-ai.github.io/projects/ACL-2025-Ladder/index.html)
[![Paper](https://img.shields.io/badge/Paper-ACL%202025-%23dfb317)](https://aclanthology.org/2025.findings-acl.1177/)
[![Code](https://img.shields.io/badge/GitHub-batmanlab%2FLADDER-%2312100e)](https://github.com/batmanlab/Ladder)
[![Model](https://img.shields.io/badge/HuggingFace-Pretrained--Checkpoints-blue)](https://huggingface.co/shawn24/Ladder/tree/main)

---

## 📌 Summary

**LADDER** is a general framework that enables vision classifiers to automatically discover subpopulations (or "slices") of data where the model is underperforming — without requiring group annotations. It leverages **vision-language representations** and the **reasoning capabilities of large language models (LLMs)** to detect and rectify bias-inducing features in both natural and medical imaging domains.

---

## 🧠 Architecture & Components

- 🔍 **Slice Discovery** using:
  - CLIP, Mammo-CLIP, and CXR-CLIP features
  - BLIP and GPT-4o-generated captions
- 🧠 **Hypothesis Generation** using:
  - GPT-4o, Claude, Gemini, LLaMA
- ✅ **Bias Mitigation** via reweighting & pseudo-labeling

---

## 📊 Datasets Used

- **Natural Images**: Waterbirds, CelebA, MetaShift
- **Medical Images**: NIH ChestX-ray, RSNA Mammograms, VinDr Mammograms

---

## 📦 Files Included

| File | Description |
|------|-------------|
| `model.pt` | Pretrained model checkpoint |
| `feature_cache.pkl` | Cached representations (CLIP/Mammo-CLIP/CXR-CLIP) |
| `metadata.csv` | Metadata with discovered slice labels |
| `caption_blip.json` | BLIP-generated captions |
| `caption_gpt4o.json` | GPT-4o-generated captions |
| `predictions.json` | Model predictions on test set |

---


## 🧪 Benchmarks

LADDER outperforms traditional slice discovery methods (Domino, FACTS) across 6 datasets and >200 classifiers. It is especially effective in:

- Discovering hidden biases without explicit attribute labels
- Reasoning about non-visual factors (e.g., preprocessing artifacts)
- Operating without human-written captions

---

## 📜 Citation

```bibtex
@article{ghosh2024ladder,
  title={LADDER: Language Driven Slice Discovery and Error Rectification},
  author={Ghosh, Shantanu and Syed, Rayan and Wang, Chenyu and Poynton, Clare B and Visweswaran, Shyam and Batmanghelich, Kayhan},
  journal={arXiv preprint arXiv:2408.07832},
  year={2024}
}
```

---

## 🤝 Acknowledgements

Boston University, Stanford University, BUMC, and the University of Pittsburgh.