---
datasets:
- ogutsevda/graph-tcga-brca
library_name: pytorch
license: apache-2.0
pipeline_tag: graph-ml
tags:
- graph-neural-networks
- histopathology
- self-supervised-learning
- pytorch-geometric
- graph-representation-learning
---

# GrapHist: Graph Self-Supervised Learning for Histopathology

This repository contains the pre-trained model from the paper [GrapHist: Graph Self-Supervised Learning for Histopathology](https://huggingface.co/papers/2603.00143).

Pre-trained on the [graph-tcga-brca](https://huggingface.co/datasets/ogutsevda/graph-tcga-brca) dataset, it employs an **ACM-GIN** (Adaptive Channel Mixing Graph Isomorphism Network) encoder-decoder architecture with a masked node attribute prediction objective.

- **Paper:** [arXiv:2603.00143](https://arxiv.org/abs/2603.00143)
- **Code:** [GitHub Repository](https://github.com/ogutsevda/graphist)

<p align="center">
  <img src="graphist.png" alt="GrapHist architecture" width="100%">
</p>

## Repository Structure

```
graphist/
├── graphist.pt            # Pre-trained model checkpoint
├── graphist.png           # Architecture overview
├── models/
│   ├── __init__.py        # build_model(args) factory
│   ├── edcoder.py         # PreModel encoder-decoder wrapper
│   ├── acm_gin.py         # ACM-GIN backbone (encoder/decoder)
│   └── utils.py           # Activation and normalization helpers
└── README.md
```

## Requirements

```bash
pip install torch torch-geometric huggingface_hub
```

The model expects graphs in PyTorch Geometric format with `x`, `edge_index`, `edge_attr`, and `batch`.

## Usage

### 1. Clone the repository

```python
from huggingface_hub import snapshot_download

repo_path = snapshot_download(repo_id="ogutsevda/graphist")
```

### 2. Build and load the model

```python
import sys, torch
sys.path.insert(0, repo_path)

from models import build_model

class Args:
    encoder = "acm_gin"
    decoder = "acm_gin"
    drop_edge_rate = 0.0
    mask_rate = 0.5
    replace_rate = 0.1
    num_hidden = 512
    num_layers = 5
    num_heads = 4
    num_out_heads = 1
    residual = None
    attn_drop = 0.1
    in_drop = 0.2
    norm = None
    negative_slope = 0.2
    batchnorm = False
    activation = "prelu"
    loss_fn = "sce"
    alpha_l = 3
    concat_hidden = True
    num_features = 46
    num_edge_features = 1

args = Args()
model = build_model(args)
checkpoint = torch.load(f"{repo_path}/graphist.pt", weights_only=False)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
```

### 3. Generate embeddings

```python
with torch.no_grad():
    embeddings = model.embed(
        batch.x, batch.edge_index, batch.edge_attr, batch.batch
    )
```

## Acknowledgements

The model architecture adapts code from [GraphMAE](https://github.com/THUDM/GraphMAE) and [ACM-GNN](https://github.com/SitaoLuan/ACM-GNN).

## Citation

```bibtex
@misc{ogut2026graphist,
    title={GrapHist: Graph Self-Supervised Learning for Histopathology}, 
    author={Sevda Öğüt and Cédric Vincent-Cuaz and Natalia Dubljevic and Carlos Hurtado and Vaishnavi Subramanian and Pascal Frossard and Dorina Thanou},
    year={2026},
    eprint={2603.00143},
    url={https://arxiv.org/abs/2603.00143}, 
}
```