adelelsayed1991 commited on
Commit
a35bb09
Β·
verified Β·
1 Parent(s): 5ffe2e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +333 -329
README.md CHANGED
@@ -1,329 +1,333 @@
1
- # Masked Autoencoder (MAE) for Medical Imaging
2
-
3
- A PyTorch implementation of Masked Autoencoder (MAE) for self-supervised learning on chest X-ray images, specifically designed for the CheXpert dataset.
4
-
5
- ## πŸ“‹ Overview
6
-
7
- This project implements a Vision Transformer-based Masked Autoencoder that learns representations from chest X-ray images through self-supervised reconstruction. The model randomly masks 75% of image patches and learns to reconstruct the original image, enabling it to learn powerful visual representations without requiring labeled data.
8
-
9
- ### Key Features
10
-
11
- - **Vision Transformer Architecture**: Encoder-decoder transformer architecture with positional encodings
12
- - **Self-Supervised Learning**: Pre-training through masked image reconstruction
13
- - **Optimized for Medical Imaging**: Designed specifically for chest X-ray analysis
14
- - **Production-Ready Training Pipeline**:
15
- - Mixed precision training (FP16) with gradient scaling
16
- - Gradient accumulation support
17
- - Learning rate warmup and cosine annealing
18
- - Automatic checkpointing and resumption
19
- - **Efficient Data Loading**:
20
- - Optimized ZIP file reader with LRU caching
21
- - Class-balanced sampling with weighted random sampler
22
- - Multi-worker data loading with persistent workers
23
- - **Comprehensive Logging**: Training/validation metrics tracking and visualization
24
-
25
- ## πŸ—οΈ Architecture
26
-
27
- ### Masked Autoencoder Structure
28
-
29
- ```
30
- Input Image (384Γ—384)
31
- ↓
32
- Patchify (16Γ—16 patches β†’ 576 patches)
33
- ↓
34
- Random Masking (75% masked, 25% visible)
35
- ↓
36
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
37
- β”‚ MAE ENCODER β”‚
38
- β”‚ - Linear patch embedding β”‚
39
- β”‚ - Positional encoding (visible) β”‚
40
- β”‚ - 12 Transformer blocks β”‚
41
- β”‚ - 8 attention heads, 768 hidden β”‚
42
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
43
- ↓
44
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
45
- β”‚ MAE DECODER β”‚
46
- β”‚ - Learnable mask tokens β”‚
47
- β”‚ - Positional encoding (all) β”‚
48
- β”‚ - 8 Transformer blocks β”‚
49
- β”‚ - 8 attention heads, 512 hidden β”‚
50
- β”‚ - Pixel reconstruction head β”‚
51
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
52
- ↓
53
- Reconstructed Image
54
- ↓
55
- MSE Loss (on masked patches only)
56
- ```
57
-
58
- ### Model Configuration
59
-
60
- | Parameter | Default Value | Description |
61
- |-----------|---------------|-------------|
62
- | Image Size | 384Γ—384 | Input image resolution |
63
- | Patch Size | 16Γ—16 | Size of each patch |
64
- | Mask Ratio | 0.75 | Fraction of patches to mask |
65
- | Encoder Depth | 12 layers | Number of transformer blocks |
66
- | Encoder Dim | 768 | Hidden dimension |
67
- | Encoder Heads | 8 | Number of attention heads |
68
- | Decoder Depth | 8 layers | Number of transformer blocks |
69
- | Decoder Dim | 512 | Hidden dimension |
70
- | Decoder Heads | 8 | Number of attention heads |
71
- | MLP Ratio | 4Γ— | MLP expansion ratio (3072) |
72
- | Dropout | 0.25 | Dropout rate |
73
-
74
- ## πŸš€ Getting Started
75
-
76
- ### Prerequisites
77
-
78
- - Python >= 3.8
79
- - CUDA-capable GPU (recommended)
80
- - 16GB+ RAM
81
-
82
- ### Installation
83
-
84
- 1. Clone the repository:
85
- ```bash
86
- git clone https://github.com/adelelsayed/mae.git
87
- cd mae
88
- ```
89
-
90
- 2. Install dependencies:
91
- ```bash
92
- pip install -r requirements.txt
93
- ```
94
-
95
- ### Dataset Preparation
96
-
97
- This project is configured for the **CheXpert dataset**. To use it:
98
-
99
- 1. Download CheXpert-v1.0-small from [Stanford ML Group](https://stanfordmlgroup.github.io/competitions/chexpert/)
100
- 2. Update paths in `configs/configs.py`:
101
- - `root`: Base directory for your data
102
- - `zip_path`: Path to zipped dataset (optional, for faster loading)
103
- - `csv`: Path to training CSV
104
- - `train_csv`, `val_csv`, `test_csv`: Split CSV files
105
-
106
- ## πŸ“Š Usage
107
-
108
- ### Training
109
-
110
- Start training from scratch:
111
- ```bash
112
- python trainer/trainer.py
113
- ```
114
-
115
- The trainer will:
116
- - Automatically create checkpoint and log directories
117
- - Resume from the last checkpoint if available
118
- - Log training/validation metrics to text files
119
- - Save plots every 10 epochs
120
- - Save best model based on validation loss
121
-
122
- ### Training Configuration
123
-
124
- Edit `configs/configs.py` to customize training:
125
-
126
- ```python
127
- mae_config = {
128
- # Training hyperparameters
129
- "lr": 1e-4, # Learning rate
130
- "warmup": 5, # Warmup epochs
131
- "weight_decay": 5e-4, # AdamW weight decay
132
- "num_epochs": 200, # Total training epochs
133
- "batch_size": 96, # Batch size
134
- "accumulation": 1, # Gradient accumulation steps
135
-
136
- # Model architecture
137
- "mask_ratio": 0.75, # Masking ratio
138
- "encoder_depth": 12, # Encoder layers
139
- "decoder_depth": 8, # Decoder layers
140
-
141
- # Paths
142
- "checkpoints": "/path/to/checkpoints",
143
- "logdir": "/path/to/logs",
144
- ...
145
- }
146
- ```
147
-
148
- ### Monitoring Training
149
-
150
- Training logs are saved in three files:
151
- - `training_log.txt`: Training metrics per epoch
152
- - `val_log.txt`: Validation metrics per epoch
153
- - `test_log.txt`: Test set evaluation results
154
-
155
- Metrics plots are saved every 10 epochs in `{logdir}/{epoch}/metrics.png`
156
-
157
- ### Evaluation
158
-
159
- The project includes a test method in the trainer. To evaluate:
160
- ```python
161
- from trainer.utils import MAETrainer
162
- from configs.configs import mae_config
163
-
164
- trainer = MAETrainer(mae_config)
165
- trainer.test()
166
- ```
167
-
168
- ## πŸ“ Project Structure
169
-
170
- ```
171
- mae/
172
- β”œβ”€β”€ configs/
173
- β”‚ β”œβ”€β”€ __init__.py
174
- β”‚ └── configs.py # Training configuration
175
- β”œβ”€β”€ data/
176
- β”‚ β”œβ”€β”€ __init__.py
177
- β”‚ β”œβ”€β”€ dataset.py # CheXpert dataset loader
178
- β”‚ └── splitter.py # Dataset splitting utilities
179
- β”œβ”€β”€ loss/
180
- β”‚ β”œβ”€β”€ __init__.py
181
- β”‚ └── mae_loss.py # MAE reconstruction loss
182
- β”œβ”€β”€ models/
183
- β”‚ β”œβ”€β”€ __init__.py
184
- β”‚ └── mae.py # MAE architecture
185
- β”œβ”€β”€ trainer/
186
- β”‚ β”œβ”€β”€ __init__.py
187
- β”‚ β”œβ”€β”€ trainer.py # Main training script
188
- β”‚ └── utils.py # Training utilities
189
- β”œβ”€β”€ notebooks/
190
- β”‚ └── chexpert_mae.ipynb # Jupyter notebook for experiments
191
- β”œβ”€β”€ training logs/ # Logged metrics and plots
192
- β”œβ”€β”€ weights/ # Model checkpoints
193
- β”œβ”€β”€ results/ # Evaluation results
194
- β”œβ”€β”€ requirements.txt # Python dependencies
195
- β”œβ”€β”€ LICENSE # Project license
196
- └── README.md # This file
197
- ```
198
-
199
- ## πŸ”§ Components
200
-
201
- ### Dataset (`data/dataset.py`)
202
-
203
- - **OptimizedZipReader**: Fast ZIP file reading with LRU caching
204
- - **CheXpertDataset**: PyTorch dataset for CheXpert chest X-rays
205
- - 14 pathology labels: No Finding, Cardiomegaly, Edema, Consolidation, etc.
206
- - Albumentations-based augmentation pipeline
207
- - Class-balanced sampling support
208
- - Frontal/lateral view filtering
209
-
210
- ### Model (`models/mae.py`)
211
-
212
- - **Patchify/Unpatchify**: Image-to-patch conversion utilities
213
- - **Random Masking**: Stochastic patch masking with restore indices
214
- - **PositionalEncoding**: Learnable position embeddings
215
- - **TransformerBlock**: Multi-head self-attention + MLP
216
- - **MAEEncoder**: Processes visible patches only
217
- - **MAEDecoder**: Reconstructs full image with mask tokens
218
- - **MaskedAutoEncoder**: Complete MAE model
219
-
220
- ### Loss (`loss/mae_loss.py`)
221
-
222
- Mean Squared Error (MSE) computed only on masked patches:
223
- ```python
224
- loss = ((pred - target) ** 2 * mask).sum() / mask.sum()
225
- ```
226
-
227
- ### Trainer (`trainer/utils.py`)
228
-
229
- - **MAETrainer**: Complete training pipeline
230
- - Mixed precision training (AMP)
231
- - Gradient clipping and accumulation
232
- - Learning rate scheduling (warmup β†’ cosine)
233
- - Automatic checkpointing
234
- - Multi-file logging (train/val/test)
235
- - Live metric monitoring with tqdm
236
- - Periodic metric visualization
237
-
238
- ## 🎯 CheXpert Pathologies
239
-
240
- The dataset includes 14 chest X-ray findings:
241
-
242
- 1. No Finding
243
- 2. Enlarged Cardiomediastinum
244
- 3. Cardiomegaly
245
- 4. Lung Opacity
246
- 5. Lung Lesion
247
- 6. Edema
248
- 7. Consolidation
249
- 8. Pneumonia
250
- 9. Atelectasis
251
- 10. Pneumothorax
252
- 11. Pleural Effusion
253
- 12. Pleural Other
254
- 13. Fracture
255
- 14. Support Devices
256
-
257
- ## πŸ“ˆ Training Tips
258
-
259
- 1. **Learning Rate**: Start with 1e-4, use warmup for stability
260
- 2. **Batch Size**: Maximize based on GPU memory (96 works well on 40GB GPUs)
261
- 3. **Gradient Accumulation**: Use if batch size is limited by memory
262
- 4. **Mixed Precision**: Enabled by default for faster training
263
- 5. **Masking Ratio**: 75% is standard, higher ratios increase difficulty
264
- 6. **Resume Training**: Model automatically resumes from last checkpoint
265
-
266
- ## πŸ”¬ Use Cases
267
-
268
- ### Pre-training for Downstream Tasks
269
- Use the trained encoder as a feature extractor:
270
- ```python
271
- from models.mae import MaskedAutoEncoder
272
-
273
- # Load pre-trained model
274
- mae = MaskedAutoEncoder()
275
- mae.load_state_dict(torch.load("best_mae.pth")["model"])
276
-
277
- # Use encoder for feature extraction
278
- encoder = mae.encoder
279
- features, _, _, _ = encoder(images)
280
- ```
281
-
282
- ### Fine-tuning on Classification
283
- Add a classification head to the encoder for supervised tasks.
284
-
285
- ### Anomaly Detection
286
- Reconstruction error can indicate abnormalities in medical images.
287
-
288
- ## πŸ“Š Performance Optimization
289
-
290
- This implementation includes several optimizations:
291
-
292
- - **Efficient ZIP Reading**: Avoids extracting files to disk
293
- - **LRU Cache**: Keeps frequently accessed images in memory
294
- - **Persistent Workers**: Reduces data loading overhead
295
- - **Mixed Precision**: 2Γ— faster training with minimal quality loss
296
- - **Gradient Checkpointing**: Reduces memory usage (if enabled)
297
- - **CUDA Memory Management**: Proper cache clearing and synchronization
298
-
299
- ## 🀝 Contributing
300
-
301
- Contributions are welcome! Please feel free to submit a Pull Request.
302
-
303
- ## πŸ“„ License
304
-
305
- This project is licensed under the terms specified in the LICENSE file.
306
-
307
- ## πŸ“š References
308
-
309
- 1. **Masked Autoencoders Are Scalable Vision Learners**
310
- He, K., Chen, X., Xie, S., Li, Y., DollΓ‘r, P., & Girshick, R. (2022)
311
- [arXiv:2111.06377](https://arxiv.org/abs/2111.06377)
312
-
313
- 2. **CheXpert: A Large Chest Radiograph Dataset**
314
- Irvin, J., et al. (2019)
315
- [Stanford ML Group](https://stanfordmlgroup.github.io/competitions/chexpert/)
316
-
317
- ## πŸ™ Acknowledgments
318
-
319
- - Original MAE paper by Meta AI Research
320
- - CheXpert dataset by Stanford ML Group
321
- - PyTorch and Albumentations communities
322
-
323
- ## πŸ“§ Contact
324
-
325
- For questions or issues, please open an issue on GitHub or contact the maintainer.
326
-
327
- ---
328
-
329
- **Note**: This is a research/educational implementation. For clinical applications, please ensure proper validation and regulatory compliance.
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-feature-extraction
4
+ ---
5
+ # Masked Autoencoder (MAE) for Medical Imaging
6
+
7
+ A PyTorch implementation of Masked Autoencoder (MAE) for self-supervised learning on chest X-ray images, specifically designed for the CheXpert dataset.
8
+
9
+ ## πŸ“‹ Overview
10
+
11
+ This project implements a Vision Transformer-based Masked Autoencoder that learns representations from chest X-ray images through self-supervised reconstruction. The model randomly masks 75% of image patches and learns to reconstruct the original image, enabling it to learn powerful visual representations without requiring labeled data.
12
+
13
+ ### Key Features
14
+
15
+ - **Vision Transformer Architecture**: Encoder-decoder transformer architecture with positional encodings
16
+ - **Self-Supervised Learning**: Pre-training through masked image reconstruction
17
+ - **Optimized for Medical Imaging**: Designed specifically for chest X-ray analysis
18
+ - **Production-Ready Training Pipeline**:
19
+ - Mixed precision training (FP16) with gradient scaling
20
+ - Gradient accumulation support
21
+ - Learning rate warmup and cosine annealing
22
+ - Automatic checkpointing and resumption
23
+ - **Efficient Data Loading**:
24
+ - Optimized ZIP file reader with LRU caching
25
+ - Class-balanced sampling with weighted random sampler
26
+ - Multi-worker data loading with persistent workers
27
+ - **Comprehensive Logging**: Training/validation metrics tracking and visualization
28
+
29
+ ## πŸ—οΈ Architecture
30
+
31
+ ### Masked Autoencoder Structure
32
+
33
+ ```
34
+ Input Image (384Γ—384)
35
+ ↓
36
+ Patchify (16Γ—16 patches β†’ 576 patches)
37
+ ↓
38
+ Random Masking (75% masked, 25% visible)
39
+ ↓
40
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
41
+ β”‚ MAE ENCODER β”‚
42
+ β”‚ - Linear patch embedding β”‚
43
+ β”‚ - Positional encoding (visible) β”‚
44
+ β”‚ - 12 Transformer blocks β”‚
45
+ β”‚ - 8 attention heads, 768 hidden β”‚
46
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
47
+ ↓
48
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
49
+ β”‚ MAE DECODER β”‚
50
+ β”‚ - Learnable mask tokens β”‚
51
+ β”‚ - Positional encoding (all) β”‚
52
+ β”‚ - 8 Transformer blocks β”‚
53
+ β”‚ - 8 attention heads, 512 hidden β”‚
54
+ β”‚ - Pixel reconstruction head β”‚
55
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
56
+ ↓
57
+ Reconstructed Image
58
+ ↓
59
+ MSE Loss (on masked patches only)
60
+ ```
61
+
62
+ ### Model Configuration
63
+
64
+ | Parameter | Default Value | Description |
65
+ |-----------|---------------|-------------|
66
+ | Image Size | 384Γ—384 | Input image resolution |
67
+ | Patch Size | 16Γ—16 | Size of each patch |
68
+ | Mask Ratio | 0.75 | Fraction of patches to mask |
69
+ | Encoder Depth | 12 layers | Number of transformer blocks |
70
+ | Encoder Dim | 768 | Hidden dimension |
71
+ | Encoder Heads | 8 | Number of attention heads |
72
+ | Decoder Depth | 8 layers | Number of transformer blocks |
73
+ | Decoder Dim | 512 | Hidden dimension |
74
+ | Decoder Heads | 8 | Number of attention heads |
75
+ | MLP Ratio | 4Γ— | MLP expansion ratio (3072) |
76
+ | Dropout | 0.25 | Dropout rate |
77
+
78
+ ## πŸš€ Getting Started
79
+
80
+ ### Prerequisites
81
+
82
+ - Python >= 3.8
83
+ - CUDA-capable GPU (recommended)
84
+ - 16GB+ RAM
85
+
86
+ ### Installation
87
+
88
+ 1. Clone the repository:
89
+ ```bash
90
+ git clone https://github.com/adelelsayed/mae.git
91
+ cd mae
92
+ ```
93
+
94
+ 2. Install dependencies:
95
+ ```bash
96
+ pip install -r requirements.txt
97
+ ```
98
+
99
+ ### Dataset Preparation
100
+
101
+ This project is configured for the **CheXpert dataset**. To use it:
102
+
103
+ 1. Download CheXpert-v1.0-small from [Stanford ML Group](https://stanfordmlgroup.github.io/competitions/chexpert/)
104
+ 2. Update paths in `configs/configs.py`:
105
+ - `root`: Base directory for your data
106
+ - `zip_path`: Path to zipped dataset (optional, for faster loading)
107
+ - `csv`: Path to training CSV
108
+ - `train_csv`, `val_csv`, `test_csv`: Split CSV files
109
+
110
+ ## πŸ“Š Usage
111
+
112
+ ### Training
113
+
114
+ Start training from scratch:
115
+ ```bash
116
+ python trainer/trainer.py
117
+ ```
118
+
119
+ The trainer will:
120
+ - Automatically create checkpoint and log directories
121
+ - Resume from the last checkpoint if available
122
+ - Log training/validation metrics to text files
123
+ - Save plots every 10 epochs
124
+ - Save best model based on validation loss
125
+
126
+ ### Training Configuration
127
+
128
+ Edit `configs/configs.py` to customize training:
129
+
130
+ ```python
131
+ mae_config = {
132
+ # Training hyperparameters
133
+ "lr": 1e-4, # Learning rate
134
+ "warmup": 5, # Warmup epochs
135
+ "weight_decay": 5e-4, # AdamW weight decay
136
+ "num_epochs": 200, # Total training epochs
137
+ "batch_size": 96, # Batch size
138
+ "accumulation": 1, # Gradient accumulation steps
139
+
140
+ # Model architecture
141
+ "mask_ratio": 0.75, # Masking ratio
142
+ "encoder_depth": 12, # Encoder layers
143
+ "decoder_depth": 8, # Decoder layers
144
+
145
+ # Paths
146
+ "checkpoints": "/path/to/checkpoints",
147
+ "logdir": "/path/to/logs",
148
+ ...
149
+ }
150
+ ```
151
+
152
+ ### Monitoring Training
153
+
154
+ Training logs are saved in three files:
155
+ - `training_log.txt`: Training metrics per epoch
156
+ - `val_log.txt`: Validation metrics per epoch
157
+ - `test_log.txt`: Test set evaluation results
158
+
159
+ Metrics plots are saved every 10 epochs in `{logdir}/{epoch}/metrics.png`
160
+
161
+ ### Evaluation
162
+
163
+ The project includes a test method in the trainer. To evaluate:
164
+ ```python
165
+ from trainer.utils import MAETrainer
166
+ from configs.configs import mae_config
167
+
168
+ trainer = MAETrainer(mae_config)
169
+ trainer.test()
170
+ ```
171
+
172
+ ## πŸ“ Project Structure
173
+
174
+ ```
175
+ mae/
176
+ β”œβ”€β”€ configs/
177
+ β”‚ β”œβ”€β”€ __init__.py
178
+ β”‚ └── configs.py # Training configuration
179
+ β”œβ”€β”€ data/
180
+ β”‚ β”œβ”€β”€ __init__.py
181
+ β”‚ β”œβ”€β”€ dataset.py # CheXpert dataset loader
182
+ β”‚ └── splitter.py # Dataset splitting utilities
183
+ β”œβ”€β”€ loss/
184
+ β”‚ β”œβ”€β”€ __init__.py
185
+ β”‚ └── mae_loss.py # MAE reconstruction loss
186
+ β”œβ”€β”€ models/
187
+ β”‚ β”œβ”€β”€ __init__.py
188
+ β”‚ └── mae.py # MAE architecture
189
+ β”œβ”€β”€ trainer/
190
+ β”‚ β”œβ”€β”€ __init__.py
191
+ β”‚ β”œβ”€β”€ trainer.py # Main training script
192
+ β”‚ └── utils.py # Training utilities
193
+ β”œβ”€β”€ notebooks/
194
+ β”‚ └── chexpert_mae.ipynb # Jupyter notebook for experiments
195
+ β”œβ”€β”€ training logs/ # Logged metrics and plots
196
+ β”œβ”€β”€ weights/ # Model checkpoints
197
+ β”œβ”€β”€ results/ # Evaluation results
198
+ β”œβ”€β”€ requirements.txt # Python dependencies
199
+ β”œβ”€β”€ LICENSE # Project license
200
+ └── README.md # This file
201
+ ```
202
+
203
+ ## πŸ”§ Components
204
+
205
+ ### Dataset (`data/dataset.py`)
206
+
207
+ - **OptimizedZipReader**: Fast ZIP file reading with LRU caching
208
+ - **CheXpertDataset**: PyTorch dataset for CheXpert chest X-rays
209
+ - 14 pathology labels: No Finding, Cardiomegaly, Edema, Consolidation, etc.
210
+ - Albumentations-based augmentation pipeline
211
+ - Class-balanced sampling support
212
+ - Frontal/lateral view filtering
213
+
214
+ ### Model (`models/mae.py`)
215
+
216
+ - **Patchify/Unpatchify**: Image-to-patch conversion utilities
217
+ - **Random Masking**: Stochastic patch masking with restore indices
218
+ - **PositionalEncoding**: Learnable position embeddings
219
+ - **TransformerBlock**: Multi-head self-attention + MLP
220
+ - **MAEEncoder**: Processes visible patches only
221
+ - **MAEDecoder**: Reconstructs full image with mask tokens
222
+ - **MaskedAutoEncoder**: Complete MAE model
223
+
224
+ ### Loss (`loss/mae_loss.py`)
225
+
226
+ Mean Squared Error (MSE) computed only on masked patches:
227
+ ```python
228
+ loss = ((pred - target) ** 2 * mask).sum() / mask.sum()
229
+ ```
230
+
231
+ ### Trainer (`trainer/utils.py`)
232
+
233
+ - **MAETrainer**: Complete training pipeline
234
+ - Mixed precision training (AMP)
235
+ - Gradient clipping and accumulation
236
+ - Learning rate scheduling (warmup β†’ cosine)
237
+ - Automatic checkpointing
238
+ - Multi-file logging (train/val/test)
239
+ - Live metric monitoring with tqdm
240
+ - Periodic metric visualization
241
+
242
+ ## 🎯 CheXpert Pathologies
243
+
244
+ The dataset includes 14 chest X-ray findings:
245
+
246
+ 1. No Finding
247
+ 2. Enlarged Cardiomediastinum
248
+ 3. Cardiomegaly
249
+ 4. Lung Opacity
250
+ 5. Lung Lesion
251
+ 6. Edema
252
+ 7. Consolidation
253
+ 8. Pneumonia
254
+ 9. Atelectasis
255
+ 10. Pneumothorax
256
+ 11. Pleural Effusion
257
+ 12. Pleural Other
258
+ 13. Fracture
259
+ 14. Support Devices
260
+
261
+ ## πŸ“ˆ Training Tips
262
+
263
+ 1. **Learning Rate**: Start with 1e-4, use warmup for stability
264
+ 2. **Batch Size**: Maximize based on GPU memory (96 works well on 40GB GPUs)
265
+ 3. **Gradient Accumulation**: Use if batch size is limited by memory
266
+ 4. **Mixed Precision**: Enabled by default for faster training
267
+ 5. **Masking Ratio**: 75% is standard, higher ratios increase difficulty
268
+ 6. **Resume Training**: Model automatically resumes from last checkpoint
269
+
270
+ ## πŸ”¬ Use Cases
271
+
272
+ ### Pre-training for Downstream Tasks
273
+ Use the trained encoder as a feature extractor:
274
+ ```python
275
+ from models.mae import MaskedAutoEncoder
276
+
277
+ # Load pre-trained model
278
+ mae = MaskedAutoEncoder()
279
+ mae.load_state_dict(torch.load("best_mae.pth")["model"])
280
+
281
+ # Use encoder for feature extraction
282
+ encoder = mae.encoder
283
+ features, _, _, _ = encoder(images)
284
+ ```
285
+
286
+ ### Fine-tuning on Classification
287
+ Add a classification head to the encoder for supervised tasks.
288
+
289
+ ### Anomaly Detection
290
+ Reconstruction error can indicate abnormalities in medical images.
291
+
292
+ ## πŸ“Š Performance Optimization
293
+
294
+ This implementation includes several optimizations:
295
+
296
+ - **Efficient ZIP Reading**: Avoids extracting files to disk
297
+ - **LRU Cache**: Keeps frequently accessed images in memory
298
+ - **Persistent Workers**: Reduces data loading overhead
299
+ - **Mixed Precision**: 2Γ— faster training with minimal quality loss
300
+ - **Gradient Checkpointing**: Reduces memory usage (if enabled)
301
+ - **CUDA Memory Management**: Proper cache clearing and synchronization
302
+
303
+ ## 🀝 Contributing
304
+
305
+ Contributions are welcome! Please feel free to submit a Pull Request.
306
+
307
+ ## πŸ“„ License
308
+
309
+ This project is licensed under the terms specified in the LICENSE file.
310
+
311
+ ## πŸ“š References
312
+
313
+ 1. **Masked Autoencoders Are Scalable Vision Learners**
314
+ He, K., Chen, X., Xie, S., Li, Y., DollΓ‘r, P., & Girshick, R. (2022)
315
+ [arXiv:2111.06377](https://arxiv.org/abs/2111.06377)
316
+
317
+ 2. **CheXpert: A Large Chest Radiograph Dataset**
318
+ Irvin, J., et al. (2019)
319
+ [Stanford ML Group](https://stanfordmlgroup.github.io/competitions/chexpert/)
320
+
321
+ ## πŸ™ Acknowledgments
322
+
323
+ - Original MAE paper by Meta AI Research
324
+ - CheXpert dataset by Stanford ML Group
325
+ - PyTorch and Albumentations communities
326
+
327
+ ## πŸ“§ Contact
328
+
329
+ For questions or issues, please open an issue on GitHub or contact the maintainer.
330
+
331
+ ---
332
+
333
+ **Note**: This is a research/educational implementation. For clinical applications, please ensure proper validation and regulatory compliance.