File size: 3,295 Bytes
c63cbba
 
20337d9
 
 
 
c63cbba
20337d9
 
 
 
 
 
c63cbba
 
20337d9
c63cbba
 
3817b7b
c63cbba
 
 
20337d9
c63cbba
3817b7b
 
 
 
dddf54e
 
 
 
 
 
 
3817b7b
dddf54e
c63cbba
3817b7b
 
 
 
 
dddf54e
3817b7b
 
 
ff97d7a
3817b7b
 
c63cbba
 
 
20337d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3817b7b
dddf54e
 
3817b7b
 
 
 
 
c63cbba
 
3817b7b
 
 
 
 
 
ff97d7a
 
20337d9
ff97d7a
3817b7b
 
20337d9
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
datasets:
- EchoDynamic
- RVENet
- EchoNet-Pediatric-LVH
language: en
library_name: pytorch
license: mit
tags:
- self-supervised-learning
- echocardiography
- medical-imaging
- video-representation
model_index: deep-learning
paper: https://arxiv.org/pdf/2506.11777
pipeline_tag: video-feature-extraction
---

# πŸ«€ DISCOVR β€” Self-Supervised Echocardiography Representations

**Paper:** *Self-Supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation* β€” NeurIPS 2025  
πŸ“„ [arXiv:2506.11777](https://arxiv.org/pdf/2506.11777)
**Code:** [https://github.com/mdivyanshu97/DISCOVR](https://github.com/mdivyanshu97/DISCOVR)

---

## πŸ“¦ Available Checkpoints

| Epochs | Filename | Description |
|:-------:|:-----------|:-------------|
| 200 | `checkpoint-199.pth` | Model trained for ~200 epochs |
| 300 | `checkpoint-299.pth` | Model trained for ~300 epochs |
| 400 | `checkpoint-399.pth` | Model trained for ~400 epochs |
| 600 | `checkpoint-599.pth` | Model trained for ~600 epochs |
| 800 | `checkpoint-799.pth` | Model trained for ~800 epochs |

> Each checkpoint corresponds to a model trained for the indicated number of epochs on **adult and pediatric echocardiography datasets** (EchoDynamic, RVENet, EchoNet-Pediatric LVH).

---

## 🧠 Model Overview

DISCOVR is a self-supervised framework for learning spatio-temporal echocardiographic video representations via **online cluster distillation**.  
It learns both fine-grained anatomical semantics and global temporal dynamics, supporting downstream tasks such as:
- Cardiac view classification  
- Functional abnormality detection  
- Video segmentation  
- Representation learning for medical imaging  

**Not for clinical or diagnostic use.**

---

## Sample Usage

To pretrain the model on echocardiographic videos:

```bash
python -m torch.distributed.launch --nproc_per_node=NUM_GPUS \
    scripts/run_mae_pretraining.py \
    --data_path /path/to/echo_videos \
    --data_path_csv /path/to/train.csv \
    --data_path_val /path/to/val.csv \
    --data_path_test /path/to/test.csv \
    --mask_type multi_local \
    --loss_func SIGMA \
    --model pretrain_videomae_base_patch16_224 \
    --batch_size 48 \
    --num_frames 64 \
    --opt adamw \
    --opt_betas 0.9 0.95 \
    --warmup_epochs 40 \
    --epochs 400
```

---

## πŸ”– Quick Facts
- **Repo:** `Div97/DISCOVR_ADULT_PEDIATRIC_MODEL`  
- **Model family:** DISCOVR checkpoints (199 β†’ 799)  
- **Architecture:** ViT-Base backbone, 64-frame clips (stride 3)  
- **Datasets used:** EchoDynamic, RVENet, EchoNet-Pediatric LVH  
- **Training objective:** Self-supervised online cluster distillation  
- **Intended use:** Research & education  
- **Not intended for:** Clinical decision-making or real-world patient care  

---

## 🧩 Citation

If you use DISCOVR in your work, please cite:

```bibtex
@article{mishra2025self,
  title={Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation},
  author={Mishra, Divyanshu and Salehi, Mohammadreza and Saha, Pramit Saha and Patey, Olga and Papageorghiou, Aris T and Asano, Yuki M and Noble, J Alison},
  journal={arXiv preprint arXiv:2506.11777},
  year={2025}
}
```

---

## License
This project is licensed under the MIT License.