adelelsayed1991 commited on Dec 8, 2025

Commit

abd02e7

verified ·

1 Parent(s): 243a764

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
LICENSE +21 -0
README.md +353 -3
configs/__init__.py +0 -0
configs/__pycache__/__init__.cpython-313.pyc +0 -0
configs/__pycache__/configs.cpython-313.pyc +0 -0
configs/configs.py +55 -0
data/__init__.py +0 -0
data/__pycache__/__init__.cpython-313.pyc +0 -0
data/__pycache__/__init__.cpython-314.pyc +0 -0
data/__pycache__/dataset.cpython-313.pyc +0 -0
data/__pycache__/dataset.cpython-314.pyc +0 -0
data/dataset.py +508 -0
data/splitter.py +347 -0
gitignore.txt +61 -0
loss/__init__.py +0 -0
loss/__pycache__/__init__.cpython-313.pyc +0 -0
loss/__pycache__/assymetric.cpython-313.pyc +0 -0
loss/assymetric.py +59 -0
models/__init__.py +0 -0
models/__pycache__/__init__.cpython-313.pyc +0 -0
models/__pycache__/classifier.cpython-313.pyc +0 -0
models/__pycache__/densenet.cpython-313.pyc +0 -0
models/__pycache__/mae.cpython-313.pyc +0 -0
models/classifier.py +323 -0
models/densenet.py +157 -0
models/mae.py +177 -0
notebooks/chexpert_mae.ipynb +0 -0
notebooks/chexpert_mae_mask_classifier.ipynb +0 -0
requirements.txt +29 -0
results/test-results.docx +0 -0
trainer/__init__.py +0 -0
trainer/__pycache__/__init__.cpython-313.pyc +0 -0
trainer/__pycache__/__init__.cpython-314.pyc +0 -0
trainer/__pycache__/trainer.cpython-313.pyc +0 -0
trainer/__pycache__/trainer.cpython-314.pyc +0 -0
trainer/__pycache__/utils.cpython-313.pyc +0 -0
trainer/test.py +15 -0
trainer/trainer.py +19 -0
trainer/utils.py +837 -0
training logs/classifier/1/metrics.png +0 -0
training logs/classifier/11/metrics.png +3 -0
training logs/classifier/Events.docx +3 -0
training logs/classifier/history.json +1 -0
training logs/classifier/test_log.txt +0 -0
training logs/classifier/training_log.txt +0 -0
training logs/classifier/val_log.txt +0 -0
training logs/mae/1/metrics.png +0 -0
training logs/mae/101/metrics.png +0 -0
training logs/mae/11/metrics.png +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+training[[:space:]]logs/classifier/11/metrics.png filter=lfs diff=lfs merge=lfs -text
+training[[:space:]]logs/classifier/Events.docx filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Adel Elsayed
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,3 +1,353 @@
----
-license: mit
----

+# CheXpert MAE-DenseNet-FPN
+A deep learning framework for multi-label chest X-ray classification using a hybrid architecture combining **Masked Autoencoders (MAE)**, **DenseNet** with CBAM attention, and **Feature Pyramid Networks (FPN)** with bidirectional cross-attention fusion.
+## 🏗️ Architecture Overview
+This project implements a novel multi-modal fusion architecture for medical image classification:
+- **MAE Encoder**: Vision Transformer-based masked autoencoder for self-supervised feature extraction
+- **DenseNet-169**: Dense convolutional network with Channel and Spatial Attention (CBAM)
+- **Feature Pyramid Network**: Multi-scale feature extraction at 4 different resolutions
+- **Bidirectional Cross-Attention**: Fusion mechanism allowing MAE and DenseNet features to attend to each other
+- **Learned Logit Ensemble**: Intelligent combination of 7 prediction heads with learnable temperature scaling
+### Key Components
+```
+Input Image (384×384)
+    │
+    ├─────────────────────────────┐
+    │                             │
+    ▼                             ▼
+MAE Encoder                  DenseNet-169
+(ViT-based)                  (with CBAM)
+    │                             │
+    │         ┌───────────────────┤
+    │         │                   │
+    │    FPN Pyramid          Dense Features
+    │    (P1-P4)              (Multi-scale)
+    │         │                   │
+    └─────────┴───────────────────┘
+              │
+    Bidirectional Cross-Attention
+              │
+    ┌─────────┴──────────┐
+    │                    │
+MAE Head          Dense Head + 4 FPN Heads
+    │                    │
+    └────────┬───────────┘
+             │
+    Learned Ensemble (7 heads)
+             │
+             ▼
+    14-class Predictions
+```
+## ✨ Features
+- **Hybrid Architecture**: Combines transformer-based and convolutional approaches
+- **Multi-scale Learning**: FPN extracts features at 4 different resolutions
+- **Advanced Fusion**: Bidirectional cross-attention between MAE and DenseNet features
+- **Optimized Training**:
+  - Mixed precision training (FP16)
+  - Gradient accumulation
+  - Weighted sampling for class imbalance
+  - Cosine annealing with linear warmup
+  - Gradient checkpointing for memory efficiency
+- **Smart Data Loading**:
+  - ZIP file reader with LRU caching
+  - On-the-fly augmentation using Albumentations
+  - Multi-worker data loading with persistent workers
+- **Comprehensive Evaluation**:
+  - Per-class AUC metrics
+  - Optimal threshold computation per class
+  - Macro and Micro AUC tracking
+## 📋 Requirements
+- Python 3.8+
+- CUDA-capable GPU (recommended: 16GB+ VRAM)
+- CheXpert dataset
+## 🚀 Installation
+1. **Clone the repository**
+```bash
+git clone https://github.com/adelelsayed/chexpert-mae-densenet-fpn.git
+cd chexpert-mae-densenet-fpn
+```
+2. **Create a virtual environment**
+```bash
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+```
+3. **Install dependencies**
+```bash
+pip install -r requirements.txt
+```
+## 📊 Dataset Setup
+1. **Download CheXpert Dataset**
+   - Visit: https://stanfordmlgroup.github.io/competitions/chexpert/
+   - Download CheXpert-v1.0-small
+2. **Prepare the dataset**
+```bash
+# Extract the dataset
+unzip CheXpert-v1.0-small.zip
+# Optionally, create a ZIP archive for faster loading
+cd CheXpert-v1.0-small
+zip -r chexpert.zip train/ valid/
+```
+3. **Update configuration**
+   - Edit `configs/configs.py`
+   - Update `root` variable to point to your dataset location
+   - Update all paths accordingly
+## 🔧 Configuration
+Edit `configs/configs.py` to customize:
+```python
+# Example: Update paths
+root = "/path/to/your/data"
+mae_config = {
+    "lr": 1e-4,
+    "num_epochs": 200,
+    "batch_size": 96,
+    # ... other parameters
+}
+config = {
+    "lr": 1e-4,
+    "num_epochs": 200,
+    "batch_size": 36,
+    # ... other parameters
+}
+```
+## 🎯 Training
+### Phase 1: Pre-train MAE
+```bash
+python trainer/trainer.py
+# When prompted, type: mae
+```
+The MAE pre-training learns robust feature representations through masked image reconstruction.
+### Phase 2: Train Classifier
+```bash
+python trainer/trainer.py
+# When prompted, type: classifier
+```
+This loads the pre-trained MAE encoder and trains the full classification pipeline.
+### Training Configuration
+- **MAE Training**:
+  - Batch size: 96
+  - Mask ratio: 0.75 (masks 75% of patches)
+  - Reconstruction loss on masked patches
+- **Classifier Training**:
+  - Batch size: 36 with gradient accumulation (8 steps)
+  - Effective batch size: 288
+  - Asymmetric loss with class weights
+  - Per-class threshold optimization
+## 🧪 Testing
+```python
+from trainer.utils import Trainer
+from configs.configs import config
+# Initialize trainer
+trainer = Trainer(config)
+# Run evaluation on test set
+macro_auc, micro_auc, per_class = trainer.test(
+    model_path="path/to/checkpoint.pth"
+)
+print(f"Macro AUC: {macro_auc:.4f}")
+print(f"Micro AUC: {micro_auc:.4f}")
+```
+## 📁 Project Structure
+```
+chexpert-mae-densenet-fpn/
+├── configs/
+│   ├── __init__.py
+│   └── configs.py          # Configuration parameters
+├── data/
+│   ├── __init__.py
+│   ├── dataset.py          # CheXpert dataset with ZIP caching
+│   └── splitter.py         # Data splitting utilities
+├── loss/
+│   ├── __init__.py
+│   └── assymetric.py       # Asymmetric loss for imbalanced data
+├── models/
+│   ├── __init__.py
+│   ├── mae.py              # Masked Autoencoder implementation
+│   ├── densenet.py         # DenseNet-169 with CBAM
+│   └── classifier.py       # Full classification architecture
+├── trainer/
+│   ├── __init__.py
+│   ├── trainer.py          # Main training script
+│   ├── utils.py            # Training utilities and loops
+│   └── test.py             # Testing utilities
+├── notebooks/
+│   ├── chexpert_mae.ipynb              # MAE experiments
+│   └── chexpert_mae_mask_classifier.ipynb  # Full pipeline experiments
+├── requirements.txt
+└── README.md
+```
+## 📈 Model Architecture Details
+### MAE Encoder
+- **Patch size**: 16×16
+- **Embedding dim**: 768
+- **Depth**: 12 transformer blocks
+- **Heads**: 8 attention heads
+- **MLP ratio**: 4×
+### DenseNet-169
+- **Growth rate (k)**: 64
+- **Layers**: [6, 12, 24, 16]
+- **CBAM**: Channel + Spatial attention at each stage
+- **Dropout**: Progressive (0.05 → 0.1 → 0.1 → 0.1)
+### Cross-Attention Fusion
+- **12 bidirectional cross-attention layers**
+- **Projection dim**: 512
+- **Attention heads**: 8
+### FPN
+- **Feature levels**: P1 (192×192), P2 (96×96), P3 (48×48), P4 (24×24)
+- **Channel unification**: 256 channels per level
+## 🎓 CheXpert Labels
+The model predicts 14 pathologies:
+1. No Finding
+2. Enlarged Cardiomediastinum
+3. Cardiomegaly
+4. Lung Opacity
+5. Lung Lesion
+6. Edema
+7. Consolidation
+8. Pneumonia
+9. Atelectasis
+10. Pneumothorax
+11. Pleural Effusion
+12. Pleural Other
+13. Fracture
+14. Support Devices
+## 🔬 Data Augmentation
+Training augmentations (conservative for medical images):
+- Horizontal flip (p=0.5)
+- Random affine (translation, scale, rotation ±10°)
+- Random brightness/contrast
+- CLAHE histogram equalization
+- Gaussian blur and noise
+## 💾 Checkpoints
+The training automatically saves:
+- **Best MAE checkpoint**: Based on validation reconstruction loss
+- **Best classifier checkpoint**: Based on validation AUC (macro/micro)
+- **Training history**: JSON file with all metrics
+- **Per-epoch metrics plots**: Loss and AUC curves
+## 📊 Monitoring
+Training logs are saved to:
+- `training_log.txt`: Training progress with live metrics
+- `val_log.txt`: Validation results
+- `test_log.txt`: Test evaluation results
+- `history.json`: All metrics across epochs
+- `metrics.png`: Visualization plots
+## ⚡ Performance Tips
+1. **Memory Optimization**:
+   - Use gradient checkpointing (already enabled)
+   - Reduce batch size if OOM occurs
+   - Increase gradient accumulation steps
+2. **Speed Optimization**:
+   - Use persistent workers (already enabled)
+   - Enable cuDNN benchmark (already enabled)
+   - Use ZIP caching for faster data loading
+3. **Training Stability**:
+   - Gradient clipping at norm 1.0
+   - Mixed precision with dynamic loss scaling
+   - Warmup learning rate schedule
+## 🐛 Troubleshooting
+**Q: Out of memory errors?**
+- Reduce batch size in configs.py
+- Increase gradient accumulation steps
+- Enable gradient checkpointing
+**Q: Slow training?**
+- Check if ZIP caching is enabled
+- Verify persistent workers are active
+- Monitor GPU utilization
+**Q: Poor convergence?**
+- Ensure MAE is properly pre-trained first
+- Check learning rate and warmup settings
+- Verify class weights are computed correctly
+## 📚 Citation
+If you use this code in your research, please cite:
+```bibtex
+@misc{chexpert-mae-densenet-fpn,
+  author = {adel elsayed},
+  title = {CheXpert Classification with MAE-DenseNet-FPN},
+  year = {2025},
+  publisher = {GitHub},
+  url = {https://github.com/adelelsayed/chexpert-mae-densenet-fpn}
+}
+```
+## 🙏 Acknowledgments
+- **CheXpert Dataset**: Stanford ML Group
+- **Masked Autoencoders**: Meta AI Research (He et al., 2021)
+- **DenseNet**: Huang et al., 2017
+- **CBAM**: Woo et al., 2018
+- **Feature Pyramid Networks**: Lin et al., 2017
+## 📄 License
+## License
+This project is licensed under the MIT License.
+## 📧 Contact
+https://www.linkedin.com/in/adel-elsayed-a5260246/
+**Note**: This is a research project. For clinical use, please ensure proper validation and regulatory approval.

configs/__init__.py ADDED Viewed

File without changes

configs/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (152 Bytes). View file

configs/__pycache__/configs.cpython-313.pyc ADDED Viewed

Binary file (4.44 kB). View file

configs/configs.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import os
+import torch
+root = "/content/drive/MyDrive"
+mae_config={
+    "lr":1e-4,
+    "warmup":5,
+    "weight_decay":5e-4,
+    "num_epochs":200,
+    "num_classes":14,
+    "zip_path":os.path.join(root,"CheXpert-v1.0-small","chexpert.zip"),
+    "resume":os.path.join(root,"CheXpert-v1.0-small","maecheckpoints","best_mae.pth"),
+    "logdir":os.path.join(root,"CheXpert-v1.0-small","maelogs"),
+    "checkpoints":os.path.join(root,"CheXpert-v1.0-small","maecheckpoints"),
+    "datadir":root,
+    "lmdb":os.path.join(root,"CheXpert-v1.0-small","lmdb"),
+    "csv":os.path.join(root,"CheXpert-v1.0-small","train.csv"),
+    "batch_size":96,
+    "device":torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu"),
+    "accumulation":1,
+    "dirsToMake":[os.path.join(root,"CheXpert-v1.0-small","maecheckpoints"),os.path.join(root,"CheXpert-v1.0-small","maelogs")],
+    "train_csv":os.path.join(root,"CheXpert-v1.0-small","train_ready.csv"),
+    "val_csv":os.path.join(root,"CheXpert-v1.0-small","val_ready.csv"),
+    "test_csv":os.path.join(root,"CheXpert-v1.0-small","test_ready.csv")
+    ,"channels":1,"mask_ratio":0.75,"dropout":0.25,"img_size":384,"encoder_dim":768,
+    "mlp_dim":3072,"decoder_dim":512,"encoder_depth":12,"encoder_head":8,"decoder_depth":8,
+    "decoder_head":8,"patch_size":16
+  }
+config={
+    "lr":1e-4,
+    "warmup":10,
+    "weight_decay":5e-4,
+    "num_epochs":200,
+    "num_classes":14,
+    "zip_path":os.path.join(root,"CheXpert-v1.0-small","chexpert.zip"),
+    "backbone":os.path.join(root,"CheXpert-v1.0-small","maecheckpoints","best_mae.pth"),
+    "densebackbone":os.path.join(root,"CheXpert-v1.0-small","checkpoints","No Eca with masking best_dense.pth"),
+    "resume":os.path.join(root,"CheXpert-v1.0-small","maecheckpoints","fpn","best_mae_classifier.pth"),
+    "logdir":os.path.join(root,"CheXpert-v1.0-small","maelogs","fpn","classifier"),
+    "checkpoints":os.path.join(root,"CheXpert-v1.0-small","maecheckpoints"),
+    "datadir":root,
+    "lmdb":os.path.join(root,"CheXpert-v1.0-small","lmdb"),
+    "csv":os.path.join(root,"CheXpert-v1.0-small","train.csv"),
+    "batch_size":36,
+    "device":torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu"),
+    "accumulation":8,
+    "maskdir":os.path.join(root,"CheXpert-v1.0-small","fpn","mask"),
+    "dirsToMake":[os.path.join(root,"CheXpert-v1.0-small","maecheckpoints","fpn"),os.path.join(root,"CheXpert-v1.0-small","maelogs","fpn","classifier"),os.path.join(root,"CheXpert-v1.0-small","fpn","mask")],
+    "train_csv":os.path.join(root,"CheXpert-v1.0-small","train_ready.csv"),
+    "val_csv":os.path.join(root,"CheXpert-v1.0-small","val_ready.csv"),
+    "test_csv":os.path.join(root,"CheXpert-v1.0-small","test_ready.csv")
+    ,"channels":1,"mask_ratio":0,"dropout":0.25,"img_size":384,"encoder_dim":768,
+    "mlp_dim":3072,"decoder_dim":512,"encoder_depth":12,"encoder_head":8,"decoder_depth":8,
+    "decoder_head":8,"patch_size":16
+  }

data/__init__.py ADDED Viewed

File without changes

data/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (149 Bytes). View file

data/__pycache__/__init__.cpython-314.pyc ADDED Viewed

Binary file (151 Bytes). View file

data/__pycache__/dataset.cpython-313.pyc ADDED Viewed

Binary file (21.6 kB). View file

data/__pycache__/dataset.cpython-314.pyc ADDED Viewed

Binary file (22.3 kB). View file

data/dataset.py ADDED Viewed

	@@ -0,0 +1,508 @@

+# Standard library
+import os
+import io
+import zipfile
+import pickle
+from pathlib import Path
+# Data handling
+import pandas as pd
+import numpy as np
+# PyTorch
+import torch
+from torch.utils.data import Dataset
+# Image processing
+from PIL import Image
+import cv2
+# Augmentations
+import albumentations as A
+from albumentations.pytorch import ToTensorV2
+# Progress bar (for precompute_all_masks)
+from tqdm import tqdm
+class OptimizedZipReader:
+    """
+    Fast ZIP file reader with LRU caching
+    """
+    def __init__(self, zip_path, cache_size=1000):
+        """
+        Args:
+            zip_path: Path to ZIP file
+            cache_size: Number of images to cache in RAM
+        """
+        self.zip_path = zip_path
+        self.cache_size = cache_size
+        self._zip_file = None  # Will be lazily initialized
+        self._name_to_info = None
+        # Cache
+        self._cache = {}
+        self._cache_order = []
+        self._hits = 0
+        self._misses = 0
+    @property
+    def zip_file(self):
+        """Lazy initialization of ZIP file handle"""
+        if self._zip_file is None:
+            print(f"Opening ZIP file: {self.zip_path}")
+            self._zip_file = zipfile.ZipFile(self.zip_path, 'r', allowZip64=True)
+            # Build index on first access
+            print("Building ZIP index...")
+            self._name_to_info = {
+                info.filename: info
+                for info in self._zip_file.infolist()
+            }
+            print(f"✓ Indexed {len(self._name_to_info)} files")
+        return self._zip_file
+    def read_image(self, path):
+        """
+        Read image data with automatic caching
+        Returns: bytes (image file data)
+        """
+        # Check cache first
+        if path in self._cache:
+            self._hits += 1
+            return self._cache[path]
+        # Cache miss - read from ZIP (this triggers lazy initialization)
+        self._misses += 1
+        img_data = self.zip_file.read(path)  # Uses property getter
+        # Add to cache with LRU eviction
+        if len(self._cache) >= self.cache_size:
+            oldest = self._cache_order.pop(0)
+            del self._cache[oldest]
+        self._cache[path] = img_data
+        self._cache_order.append(path)
+        return img_data
+    def get_cache_stats(self):
+        """Return cache hit rate statistics"""
+        total = self._hits + self._misses
+        hit_rate = self._hits / total * 100 if total > 0 else 0
+        return {
+            'hits': self._hits,
+            'misses': self._misses,
+            'hit_rate': f"{hit_rate:.2f}%",
+            'cache_size': len(self._cache)
+        }
+    def close(self):
+        """Close ZIP file and clear cache"""
+        if self._zip_file is not None:
+            self._zip_file.close()
+            self._zip_file = None
+        self._cache.clear()
+        self._cache_order.clear()
+        self._name_to_info = None
+class CheXpertDataset(Dataset):
+    """
+    CheXpert Dataset class
+    NEW: Returns 3-channel images: (img, img*mask, mask)
+    - Channel 0: Original grayscale image
+    - Channel 1: Masked image (lung region only)
+    - Channel 2: Binary lung mask
+    Args:
+        csv_path (str): Path to the CSV file (train.csv or valid.csv)
+        root_dir (str): Root directory of the CheXpert dataset
+        image_size (int): Target image size (default: 384)
+        augment (bool): Whether to apply augmentations (default: False)
+        use_frontal_only (bool): If True, only use frontal view images (default: True)
+        fill_uncertain (str): How to handle uncertain labels: 'zeros', 'ones', 'ignore' (default: 'zeros')
+    """
+    # 14 pathology classes in CheXpert
+    PATHOLOGIES = [
+        'No Finding',
+        'Enlarged Cardiomediastinum',
+        'Cardiomegaly',
+        'Lung Opacity',
+        'Lung Lesion',
+        'Edema',
+        'Consolidation',
+        'Pneumonia',
+        'Atelectasis',
+        'Pneumothorax',
+        'Pleural Effusion',
+        'Pleural Other',
+        'Fracture',
+        'Support Devices'
+    ]
+    def __init__(
+        self,
+        csv_path,
+        root_dir,
+        image_size=384,
+        augment=False,
+        use_frontal_only=False,
+        fill_uncertain='ignore',
+        lmdb_path=None,
+        zip_path=None,
+        zip_cache_size=1000,
+        mask_dir=None, domask=False
+    ):
+        self.root_dir = root_dir
+        self.image_size = image_size
+        self.augment = augment
+        self.fill_uncertain = fill_uncertain
+        self.env =None #lmdb.open(lmdb_path, readonly=True, lock=False) if lmdb_path else None
+        self._zip_path = zip_path
+        self._zip_cache_size = zip_cache_size
+        self._zip_reader_instance = None
+        # Read CSV file
+        self.df = pd.read_csv(csv_path)
+        for pathology in self.PATHOLOGIES:
+            if pathology in self.df.columns:
+                self.df[pathology] = pd.to_numeric(self.df[pathology], errors='coerce')
+        # Filter for frontal views only if specified
+        if use_frontal_only:
+            self.df = self.df[self.df['Frontal/Lateral'] == 'Frontal'].reset_index(drop=True)
+        # Handle uncertain labels (-1 values)
+        self._process_uncertain_labels()
+        # Setup augmentations
+        self.train_transform = self._get_train_transforms()
+        self.val_transform = self._get_val_transforms()
+        print(f"Loaded {len(self.df)} images from {csv_path}")
+        print(f"Image size: {image_size}x{image_size}")
+        print(f"Augmentation: {augment}")
+        print(f"Uncertain labels filled with: {fill_uncertain}")
+        if mask_dir and domask:
+            self.precompute_all_masks(mask_dir)
+    # Run this ONCE before training
+    def precompute_all_masks(self, save_dir):
+        os.makedirs(save_dir, exist_ok=True)
+        for idx in tqdm(range(len(self))):
+            img_path = os.path.join(self.root_dir,self.df.iloc[idx]['Path'])
+            part_path="/".join(self.df.iloc[idx]['Path'].split("/")[1:])
+            if self.zip_reader:
+                    # Read image data from ZIP (no extraction!)
+                img_data = self.zip_reader.read_image(part_path)
+                # Open image from bytes in memory
+                image = Image.open(io.BytesIO(img_data)).convert('L')
+            else:
+                image = Image.open(img_path).convert('L')
+            image = np.array(image)
+            mask = chexpert_medsam_mask(image)
+            mask_path = os.path.join(save_dir, "_".join(self.df.iloc[idx]['Path'].split("/")[-3:]).replace('.jpg', '_mask.pt'))
+            os.makedirs(os.path.dirname(mask_path), exist_ok=True)
+            torch.save(mask, mask_path)
+    @property
+    def zip_reader(self):
+        """
+        Lazy property getter for ZIP reader
+        The ZIP file is only opened when first accessed, not during __init__.
+        This is useful when:
+        - Creating multiple dataset objects but only using some
+        - Saving memory during dataset setup
+        - Working with multiprocessing (each worker creates its own)
+        """
+        if self._zip_reader_instance is None and self._zip_path is not None:
+            self._zip_reader_instance = OptimizedZipReader(
+                self._zip_path,
+                cache_size=self._zip_cache_size
+            )
+        return self._zip_reader_instance
+    def _load_and_cache_image(self, img_path, idx):
+        """
+        Load image with automatic resizing and caching.
+        If resized version exists, load it. Otherwise, resize, save, and load.
+        Args:
+            img_path (str): Original image path from CSV
+            idx (int): Index for tracking
+        Returns:
+            np.ndarray: Loaded image (grayscale)
+        """
+        # Create cache directory structure
+        cache_dir = Path(self.root_dir) #/ f"cache_{self.image_size}"
+        # Preserve the relative path structure in cache
+        path_parts = list(Path(img_path).parts)
+        path_parts[-1]=f"{self.image_size}_{path_parts[-1]}"
+        relative_path = Path(*path_parts)
+        cached_path =relative_path.with_suffix('.jpg')
+        # Check if cached version exists
+        if cached_path.exists():
+            # Load cached image
+            image = Image.open(cached_path).convert('L')
+            image = np.array(image)
+            # Verify it's the correct size
+            if image.shape[0] == self.image_size and image.shape[1] == self.image_size:
+                return image
+        # Cache doesn't exist or wrong size - load original
+        original_path = img_path
+        image = Image.open(original_path).convert('L')
+        # Check if original is already target size
+        width, height = image.size
+        if width == self.image_size and height == self.image_size:
+            # Already correct size, just convert to array
+            return np.array(image)
+        # Resize image
+        image_resized = image.resize(
+            (self.image_size, self.image_size),
+            Image.LANCZOS
+        )
+        # Save to cache
+        cached_path.parent.mkdir(parents=True, exist_ok=True)
+        image_resized.save(cached_path, 'JPEG', quality=95, optimize=True)
+        return np.array(image_resized)
+    def _process_uncertain_labels(self):
+        """Process uncertain labels (-1) based on the chosen strategy."""
+        for pathology in self.PATHOLOGIES:
+            if pathology in self.df.columns:
+                if self.fill_uncertain == 'zeros':
+                    # Map uncertain (-1) to negative (0)
+                    self.df[pathology] = self.df[pathology].replace(-1, 0)
+                elif self.fill_uncertain == 'ones':
+                    # Map uncertain (-1) to positive (1)
+                    self.df[pathology] = self.df[pathology].replace(-1, 1)
+                elif self.fill_uncertain == 'ignore':
+                    # Keep -1 as is (you'll need to handle this in loss function)
+                    pass
+                # Fill NaN with 0 (negative)
+                self.df[pathology] = self.df[pathology].fillna(0)
+    def _get_train_transforms(self):
+        """Get training augmentations suitable for chest X-rays."""
+        import cv2
+        return A.Compose([
+            # Resize to target size
+            A.LongestMaxSize(max_size=self.image_size),
+            A.PadIfNeeded(self.image_size, self.image_size, border_mode=cv2.BORDER_CONSTANT, position='center'),
+            # Geometric augmentations (conservative for medical images)
+            A.HorizontalFlip(p=0.5),
+            A.Affine(
+                translate_percent={"x": (-0.1, 0.1), "y": (-0.1, 0.1)},
+                scale=(0.9, 1.1),
+                rotate=(-10, 10),
+                fit_output=False,
+                p=0.5
+            ),
+            # Intensity augmentations
+            A.OneOf([
+                A.RandomBrightnessContrast(
+                    brightness_limit=0.2,
+                    contrast_limit=0.2,
+                    p=1.0
+                ),
+                A.RandomGamma(gamma_limit=(80, 120), p=1.0),
+                A.CLAHE(clip_limit=4.0, tile_grid_size=(8, 8), p=1.0),
+            ], p=0.5),
+            # Add slight blur to simulate different imaging conditions
+            A.OneOf([
+                A.GaussianBlur(blur_limit=(3, 5), p=1.0),
+                A.MedianBlur(blur_limit=3, p=1.0),
+            ], p=0.2),
+            # Add noise
+            A.GaussNoise(p=0.2),
+            # Normalize to [0, 1]
+            A.Normalize(
+                mean=[0.5],
+                std=[0.5],
+                max_pixel_value=255.0
+            ),
+            ToTensorV2()
+        ])
+    def _get_val_transforms(self):
+        """Get validation/test transforms (no augmentation)."""
+        return A.Compose([
+            A.LongestMaxSize(max_size=self.image_size),
+            A.PadIfNeeded(self.image_size, self.image_size, border_mode=cv2.BORDER_CONSTANT, position='center'),
+            A.Normalize(
+                mean=[0.5],
+                std=[0.5],
+                max_pixel_value=255.0
+            ),
+            ToTensorV2()
+        ])
+    def __len__(self):
+        return len(self.df)
+    def __del__(self):
+        """Close ZIP when done"""
+        if hasattr(self, 'zip_reader'):
+            self.zip_reader.close()
+    def __getitem__(self, idx):
+        if self.env:
+            with self.env.begin() as txn:
+                # Retrieve serialized data
+                data = txn.get(str(idx).encode())
+                sample = pickle.loads(data)
+                return sample
+        else:
+            # Get image path
+            img_path = os.path.join(self.root_dir,self.df.iloc[idx]['Path'])
+            #image = self._load_and_cache_image(img_path, idx)
+            # Load image
+            #image = Image.open(img_path).convert('L')  # Convert to grayscale
+            part_path="/".join(self.df.iloc[idx]['Path'].split("/")[1:])
+            if self.zip_reader:
+                    # Read image data from ZIP (no extraction!)
+                img_data = self.zip_reader.read_image(part_path)
+                # Open image from bytes in memory
+                image = Image.open(io.BytesIO(img_data)).convert('L')
+            else:
+                image = Image.open(img_path).convert('L')
+            image = np.array(image)
+            # Load pre-computed mask
+            #mask_path = os.path.join(self.mask_dir, "_".join(self.df.iloc[idx]['Path'].split("/")[-3:]).replace('.jpg', '_mask.pt'))
+            #masked_img = torch.load(mask_path)
+            # Apply transforms to BOTH image and mask together
+            if self.augment:
+                # Augmentation applies to both image and mask
+                transformed = self.train_transform(image=image)
+                image_transformed = transformed['image']  # (1, H, W) tensor, normalized
+                #masked_img=transformed['mask']
+                  # (H, W) tensor
+            else:
+                transformed = self.val_transform(image=image)
+                image_transformed = transformed['image']  # (1, H, W) tensor, normalized
+                #masked_img=transformed['mask']
+            # Expand dimensions to match
+            image_1ch = image_transformed  # (1, H, W)
+            masked_img = image_transformed
+            # Get labels for all pathologies
+            labels = []
+            for pathology in self.PATHOLOGIES:
+                if pathology in self.df.columns:
+                    label = self.df.iloc[idx][pathology]
+                    labels.append(float(label) if not pd.isna(label) else 0.0)
+                else:
+                    labels.append(0.0)
+            labels = torch.tensor(labels, dtype=torch.float32)
+            # Get additional metadata
+            metadata = {
+                'patient_id': self.df.iloc[idx]['Path'].split('/')[2],  # Extract patient ID from path
+                'study_id': self.df.iloc[idx]['Path'].split('/')[3],    # Extract study ID from path
+                'view': self.df.iloc[idx]['Frontal/Lateral'],
+                'sex': self.df.iloc[idx]['Sex'] if 'Sex' in self.df.columns else 'Unknown',
+                'age': self.df.iloc[idx]['Age'] if 'Age' in self.df.columns else -1,
+                'path': self.df.iloc[idx]['Path']
+            }
+            return {
+                'image': image_1ch,
+                'labels': labels,
+                'metadata': metadata
+            }
+    def get_label_names(self):
+        """Return list of pathology label names."""
+        return self.PATHOLOGIES
+    def get_label_distribution(self):
+        """Get distribution of positive labels for each pathology."""
+        distribution = {}
+        for pathology in self.PATHOLOGIES:
+            if pathology in self.df.columns:
+                positive_count = (self.df[pathology] == 1.0).sum()
+                distribution[pathology] = {
+                    'positive': int(positive_count),
+                    'percentage': round(positive_count / len(self.df) * 100, 2)
+                }
+        return distribution
+    def get_class_weights(self):
+        """
+        OPTIMIZED: Vectorized class weights calculation
+        """
+        weights = []
+        for pathology in self.PATHOLOGIES:
+            if pathology in self.df.columns:
+                # Vectorized counting (much faster than iterating)
+                values = self.df[pathology].values
+                pos = np.sum(values == 1.0)
+                neg = np.sum(values == 0.0)
+                weight = neg / pos if pos > 0 else 1.0
+                weights.append(weight)
+        return torch.tensor(weights, dtype=torch.float32)
+    def get_sample_weights(self):
+        """
+        OPTIMIZED: Vectorized sample weights calculation
+        Performance: ~1000x faster than original
+        Original: 15-30 seconds for 200k samples
+        This: 0.01-0.05 seconds for 200k samples
+        """
+        # Get class weights as numpy array
+        class_weights = self.get_class_weights().numpy()
+        # Get all labels as numpy array in ONE vectorized operation
+        labels_array = self.df[self.PATHOLOGIES].values.astype(np.float32)
+        # Create weighted labels matrix: where label=1, use class_weight, else -inf
+        # Shape: (n_samples, n_classes)
+        weighted_labels = np.where(
+            labels_array == 1.0,
+            class_weights,
+            -np.inf  # Use -inf instead of 0 so max will only consider positive labels
+        )
+        # For each sample, find the maximum class weight of its positive labels
+        # If a sample has no positive labels, max will be -inf, which we'll replace with 1.0
+        sample_weights = np.max(weighted_labels, axis=1)
+        sample_weights = np.where(
+            np.isinf(sample_weights),
+            1.0,  # Samples with no positive labels get weight 1.0
+            sample_weights
+        )
+        return torch.tensor(sample_weights, dtype=torch.float32)

data/splitter.py ADDED Viewed

	@@ -0,0 +1,347 @@

+# Standard library
+import os
+from pathlib import Path
+# Data handling
+import pandas as pd
+import numpy as np
+# Machine learning
+from sklearn.model_selection import train_test_split
+class CheXpertDataSplitter:
+    """
+    Advanced stratified train-validation splitter for CheXpert dataset.
+    Handles:
+    - Patient-level splitting (prevents data leakage)
+    - Multi-label stratification
+    - Class imbalance awareness
+    - Study-level grouping
+    """
+    PATHOLOGIES = [
+        'No Finding',
+        'Enlarged Cardiomediastinum',
+        'Cardiomegaly',
+        'Lung Opacity',
+        'Lung Lesion',
+        'Edema',
+        'Consolidation',
+        'Pneumonia',
+        'Atelectasis',
+        'Pneumothorax',
+        'Pleural Effusion',
+        'Pleural Other',
+        'Fracture',
+        'Support Devices'
+    ]
+    def __init__(self, csv_path, val_size=0.15,test_size=0.15, random_state=42,
+                 use_frontal_only=True, fill_uncertain='zeros',root=None):
+        """
+        Initialize the splitter.
+        Args:
+            csv_path: Path to train.csv from CheXpert-small
+            val_size: Validation set proportion (default: 0.15)
+            random_state: Random seed for reproducibility
+            use_frontal_only: Use only frontal view images
+            fill_uncertain: How to handle uncertain labels ('zeros', 'ones', 'ignore')
+        """
+        self.csv_path = csv_path
+        self.val_size = val_size
+        self.test_size = test_size
+        self.random_state = random_state
+        self.use_frontal_only = use_frontal_only
+        self.fill_uncertain = fill_uncertain
+        self.root=root
+        print("=" * 80)
+        print("CheXpert Data Splitter - Preventing Data Leakage & Class Bias")
+        print("=" * 80)
+    def load_and_preprocess(self):
+        """Load and preprocess the dataset."""
+        print("\n[1/5] Loading data...")
+        self.df = pd.read_csv(self.csv_path)
+        print(f"   Loaded {len(self.df)} images")
+        #self.df=self.df[self.df["Path"].apply(os.path.exists)]
+        # Filter for frontal views only
+        if self.use_frontal_only:
+            initial_count = len(self.df)
+            self.df = self.df[self.df['Frontal/Lateral'] == 'Frontal'].reset_index(drop=True)
+            print(f"   Filtered to frontal views: {len(self.df)} images ({initial_count - len(self.df)} removed)")
+        # Extract patient and study IDs from path
+        print("\n[2/5] Extracting patient and study IDs...")
+        self.df['patient_id'] = self.df['Path'].apply(lambda x: x.split('/')[2])
+        self.df['study_id'] = self.df['Path'].apply(lambda x: x.split('/')[3])
+        n_patients = self.df['patient_id'].nunique()
+        n_studies = self.df['study_id'].nunique()
+        print(f"   Unique patients: {n_patients}")
+        print(f"   Unique studies: {n_studies}")
+        print(f"   Images per patient (avg): {len(self.df) / n_patients:.2f}")
+        # Process uncertain labels
+        print("\n[3/5] Processing uncertain labels...")
+        self._process_uncertain_labels()
+        return self.df
+    def _process_uncertain_labels(self):
+        """Process uncertain labels (-1) based on the chosen strategy."""
+        for pathology in self.PATHOLOGIES:
+            if pathology in self.df.columns:
+                uncertain_count = (self.df[pathology] == -1).sum()
+                if self.fill_uncertain == 'zeros':
+                    self.df[pathology] = self.df[pathology].replace(-1, 0)
+                elif self.fill_uncertain == 'ones':
+                    self.df[pathology] = self.df[pathology].replace(-1, 1)
+                elif self.fill_uncertain == 'ignore':
+                    pass  # Keep -1 as is
+                # Fill NaN with 0
+                self.df[pathology] = self.df[pathology].fillna(0)
+        print(f"   Uncertain labels strategy: {self.fill_uncertain}")
+    def create_stratification_groups(self):
+        """
+        Create stratification groups based on multi-label combinations.
+        Uses patient-level aggregation to prevent data leakage.
+        """
+        print("\n[4/5] Creating stratification groups (patient-level)...")
+        # Group by patient and aggregate labels
+        patient_groups = self.df.groupby('patient_id').agg({
+            **{pathology: 'max' for pathology in self.PATHOLOGIES if pathology in self.df.columns},
+            'study_id': 'first',  # Keep one study_id for reference
+            'Sex': 'first',
+            'Age': 'first'
+        }).reset_index()
+        # Create label signature for each patient
+        # This is a binary string representing which conditions are present
+        def create_label_signature(row):
+            signature = []
+            for pathology in self.PATHOLOGIES:
+                if pathology in patient_groups.columns:
+                    signature.append(str(int(row[pathology])))
+            return ''.join(signature)
+        patient_groups['label_signature'] = patient_groups.apply(create_label_signature, axis=1)
+        # For rare combinations, group them together
+        signature_counts = patient_groups['label_signature'].value_counts()
+        rare_threshold = max(5, int(len(patient_groups) * 0.001))  # At least 5 or 0.1%
+        def get_stratification_group(signature):
+            if signature_counts[signature] < rare_threshold:
+                return 'RARE_COMBINATION'
+            return signature
+        patient_groups['stratification_group'] = patient_groups['label_signature'].apply(get_stratification_group)
+        # Print distribution statistics
+        print(f"\n   Patient-level label distribution:")
+        for pathology in self.PATHOLOGIES:
+            if pathology in patient_groups.columns:
+                positive_count = (patient_groups[pathology] == 1).sum()
+                percentage = positive_count / len(patient_groups) * 100
+                print(f"   {pathology:30s}: {positive_count:5d} ({percentage:5.2f}%)")
+        unique_groups = patient_groups['stratification_group'].nunique()
+        print(f"\n   Unique stratification groups: {unique_groups}")
+        print(f"   Rare combinations grouped: {(patient_groups['stratification_group'] == 'RARE_COMBINATION').sum()}")
+        return patient_groups
+    def perform_split(self, patient_groups):
+        """
+        Perform stratified train-validation-test split at patient level.
+        """
+        print("\n[5/5] Performing stratified patient-level split...")
+        stratification_labels = patient_groups['stratification_group'].values
+        # ---- train / (val+test) ----
+        train_patients, valtest_patients = train_test_split(
+            patient_groups['patient_id'].values,
+            test_size=self.val_size + self.test_size,          # <-- new
+            stratify=stratification_labels,
+            random_state=self.random_state
+        )
+        # ---- val / test from the remaining pool ----
+        remaining_labels = patient_groups.set_index('patient_id').loc[valtest_patients]['stratification_group'].values
+        val_patients, test_patients = train_test_split(
+            valtest_patients,
+            test_size=self.test_size / (self.val_size + self.test_size),   # <-- proportion of the val+test pool
+            stratify=remaining_labels,
+            random_state=self.random_state
+        )
+        print(f"   Train patients: {len(train_patients)}")
+        print(f"   Val   patients: {len(val_patients)}")
+        print(f"   Test  patients: {len(test_patients)}")
+        # Split the full dataframe
+        train_df = self.df[self.df['patient_id'].isin(train_patients)].copy()
+        val_df   = self.df[self.df['patient_id'].isin(val_patients)].copy()
+        test_df  = self.df[self.df['patient_id'].isin(test_patients)].copy()
+        # ---- leakage check (train vs val vs test) ----
+        sets = [('train', train_df), ('val', val_df), ('test', test_df)]
+        for i, (name_i, df_i) in enumerate(sets):
+            for j, (name_j, df_j) in enumerate(sets[i+1:]):
+                overlap = set(df_i['patient_id']).intersection(set(df_j['patient_id']))
+                if overlap:
+                    raise ValueError(f"Data leakage between {name_i} and {name_j}: {len(overlap)} patients overlap")
+        print("\n   No patient overlap – leakage prevented!")
+        return train_df, val_df, test_df
+    def run(self, output_dir='.', save_test=True):
+        self.load_and_preprocess()
+        patient_groups = self.create_stratification_groups()
+        train_df, val_df, test_df = self.perform_split(patient_groups)
+        self.verify_split_quality(train_df, val_df)
+        # optional: also verify train vs test (same function works with two dfs)
+        print("\n--- Train vs Test distribution check ---")
+        self.verify_split_quality(train_df, test_df)
+        train_path, val_path = self.save_splits(train_df, val_df, output_dir)
+        if save_test:
+            test_path = self.save_test_split(test_df, output_dir)
+        else:
+            test_path = None
+        print("\n" + "="*80)
+        print("Split Complete! (train / val / test)")
+        print("="*80)
+        return train_path, val_path, test_path
+    def save_test_split(self, test_df, output_dir):
+        output_dir = Path(output_dir)
+        output_dir.mkdir(exist_ok=True)
+        test_path = output_dir / 'test_ready.csv'
+        cols_to_drop = ['patient_id', 'study_id']
+        test_clean = test_df.drop(columns=[c for c in cols_to_drop if c in test_df.columns])
+        test_clean.to_csv(test_path, index=False)
+        print(f"Test set : {test_path} ({len(test_clean)} images)")
+        return test_path
+    def verify_split_quality(self, train_df, val_df):
+        """
+        Verify the quality of the split by comparing label distributions.
+        """
+        print("\n" + "=" * 80)
+        print("Split Quality Verification")
+        print("=" * 80)
+        print(f"\n{'Pathology':<30s} {'Train %':>10s} {'Val %':>10s} {'Difference':>12s}")
+        print("-" * 80)
+        max_diff = 0
+        for pathology in self.PATHOLOGIES:
+            if pathology in train_df.columns:
+                train_pos = (train_df[pathology] == 1).sum() / len(train_df) * 100
+                val_pos = (val_df[pathology] == 1).sum() / len(val_df) * 100
+                diff = abs(train_pos - val_pos)
+                max_diff = max(max_diff, diff)
+                print(f"{pathology:<30s} {train_pos:>9.2f}% {val_pos:>9.2f}% {diff:>11.2f}%")
+        print("-" * 80)
+        print(f"Maximum distribution difference: {max_diff:.2f}%")
+        if max_diff < 2.0:
+            print("✓ Excellent stratification (< 2% difference)")
+        elif max_diff < 5.0:
+            print("✓ Good stratification (< 5% difference)")
+        else:
+            print("⚠ Warning: Large distribution differences detected")
+        # Check for class imbalance
+        print("\n" + "=" * 80)
+        print("Class Imbalance Analysis (Train Set)")
+        print("=" * 80)
+        imbalance_ratios = []
+        for pathology in self.PATHOLOGIES:
+            if pathology in train_df.columns:
+                pos = (train_df[pathology] == 1).sum()
+                neg = (train_df[pathology] == 0).sum()
+                if pos > 0:
+                    ratio = neg / pos
+                    imbalance_ratios.append(ratio)
+                    severity = "Low" if ratio < 5 else "Medium" if ratio < 20 else "High"
+                    print(f"{pathology:<30s} Ratio: {ratio:>6.2f}:1 [{severity:>6s} imbalance]")
+        avg_imbalance = np.mean(imbalance_ratios)
+        print(f"\nAverage imbalance ratio: {avg_imbalance:.2f}:1")
+    def save_splits(self, train_df, val_df, output_dir='.'):
+        """Save train and validation splits to CSV files."""
+        output_dir = Path(output_dir)
+        output_dir.mkdir(exist_ok=True)
+        train_path = output_dir / 'train_ready.csv'
+        val_path = output_dir / 'val_ready.csv'
+        # Remove temporary columns used for splitting
+        columns_to_drop = ['patient_id', 'study_id']
+        train_df_clean = train_df.drop(columns=[col for col in columns_to_drop if col in train_df.columns])
+        val_df_clean = val_df.drop(columns=[col for col in columns_to_drop if col in val_df.columns])
+        train_df_clean.to_csv(train_path, index=False)
+        val_df_clean.to_csv(val_path, index=False)
+        print("\n" + "=" * 80)
+        print("Files Saved Successfully")
+        print("=" * 80)
+        print(f"Train set: {train_path} ({len(train_df_clean)} images)")
+        print(f"Val set:   {val_path} ({len(val_df_clean)} images)")
+        return train_path, val_path
+# Main execution
+if __name__ == "__main__":
+    root = "/content/drive/MyDrive"
+    # Configuration
+    CHEXPERT_CSV = os.path.join(root,"CheXpert-v1.0-small","train.csv")  # Adjust path as needed
+    OUTPUT_DIR = os.path.join(root,"CheXpert-v1.0-small")
+    VAL_SIZE = 0.15
+    RANDOM_STATE = 42
+    USE_FRONTAL_ONLY = True
+    FILL_UNCERTAIN = 'zeros'  # Options: 'zeros', 'ones', 'ignore'
+    # Create splitter
+    splitter = CheXpertDataSplitter(
+        csv_path=CHEXPERT_CSV,
+        val_size=VAL_SIZE,test_size=VAL_SIZE,
+        random_state=RANDOM_STATE,
+        use_frontal_only=USE_FRONTAL_ONLY,
+        fill_uncertain=FILL_UNCERTAIN,
+        root=OUTPUT_DIR
+    )
+    # Run the split
+    if os.path.exists(os.path.join(root,"CheXpert-v1.0-small","train_ready.csv")) and os.path.exists(os.path.join(root,"CheXpert-v1.0-small","val_ready.csv")):
+        train_path=os.path.join(root,"CheXpert-v1.0-small","train_ready.csv")
+        val_path=os.path.join(root,"CheXpert-v1.0-small","val_ready.csv")
+        test_path=os.path.join(root,"CheXpert-v1.0-small","test_ready.csv")
+    else:
+        train_path, val_path,test_path = splitter.run(output_dir=OUTPUT_DIR)
+    print("\nYou can now use these files with your CheXpertDataset class:")
+    print(f"  train_dataset = CheXpertDataset('{train_path}', root_dir='...', augment=True)")
+    print(f"  val_dataset = CheXpertDataset('{val_path}', root_dir='...', augment=False)")
+    print(f"  test_dataset = CheXpertDataset('{test_path}', root_dir='...', augment=False)")

gitignore.txt ADDED Viewed

	@@ -0,0 +1,61 @@

+```
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+ENV/
+.venv
+# Jupyter Notebook
+.ipynb_checkpoints
+*.ipynb_checkpoints/
+# PyTorch
+*.ckpt
+*.pth
+weights/
+runs/
+lightning_logs/
+# Data files (usually too large for GitHub)
+*.csv
+*.h5
+*.hdf5
+*.npy
+*.npz
+*.pkl
+*.pickle
+*.dcm
+*.nii
+*.nii.gz
+# Models (often too large)
+*.h5
+*.pb
+*.onnx
+saved_models/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Environment variables
+.env
+.env.local
+# Logs
+*.log
+logs/
+# Weights & Biases (if you use it)
+wandb/

loss/__init__.py ADDED Viewed

File without changes

loss/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (149 Bytes). View file

loss/__pycache__/assymetric.cpython-313.pyc ADDED Viewed

Binary file (2.98 kB). View file

loss/assymetric.py ADDED Viewed

	@@ -0,0 +1,59 @@

+import torch
+import torch.nn as nn
+class AsymmetricLoss(nn.Module):
+    def __init__(self, gamma_neg=2, gamma_pos=1, clip=0.05, eps=1e-8, class_weights=None):
+        super().__init__()
+        self.gamma_neg = gamma_neg
+        self.gamma_pos = gamma_pos
+        self.clip = clip
+        self.eps = eps
+        if class_weights is not None:
+            self.register_buffer('class_weights', class_weights)
+        else:
+            self.class_weights = None
+    def forward(self, predictions, targets):
+        """
+        FIXED VERSION with better numerical stability
+        predictions: (B, 14) - sigmoid outputs (already applied!)
+        targets: (B, 14) - binary labels
+        """
+        try:
+            # CRITICAL FIX: Better clamping range
+            predictions = torch.clamp(predictions, min=self.eps, max=1 - self.eps)
+            # ===== POSITIVE SAMPLES =====
+            predictions_pos = torch.clamp(predictions - self.clip, min=self.eps)
+            focal_weight_pos = (1 - predictions_pos) ** self.gamma_pos
+            # FIX: Add small epsilon to prevent log(0)
+            loss_pos = targets * focal_weight_pos * torch.log(predictions_pos + self.eps)
+            # ===== NEGATIVE SAMPLES =====
+            focal_weight_neg = predictions ** self.gamma_neg
+            # FIX: Add small epsilon to prevent log(0)
+            loss_neg = (1 - targets) * focal_weight_neg * torch.log(1 - predictions + self.eps)
+            # ===== COMBINE =====
+            loss = -(loss_pos + loss_neg)
+            # Apply per-class weights
+            if self.class_weights is not None:
+                loss = loss * self.class_weights
+            # Average across batch and classes
+            loss = torch.mean(loss)
+            # CRITICAL: Check for NaN and return safe value
+            if torch.isnan(loss) or torch.isinf(loss):
+                raise ValueError("Loss is NaN or Inf")
+        except ValueError as e:
+            print("⚠️ WARNING: NaN/Inf detected in loss, returning safe value")
+            print(e)
+            print("predictions:", predictions)
+            print("targets:", targets)
+            import traceback
+            traceback.print_exc()
+            return torch.tensor(0.0, device=loss.device, requires_grad=True)

models/__init__.py ADDED Viewed

File without changes

models/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (151 Bytes). View file

models/__pycache__/classifier.cpython-313.pyc ADDED Viewed

Binary file (17.6 kB). View file

models/__pycache__/densenet.cpython-313.pyc ADDED Viewed

Binary file (13.4 kB). View file

models/__pycache__/mae.cpython-313.pyc ADDED Viewed

Binary file (13.6 kB). View file

models/classifier.py ADDED Viewed

	@@ -0,0 +1,323 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+from models.mae import MaskedAutoEncoder
+from models.densenet import DenseNet
+class AttentionPool(nn.Module):
+    def __init__(self, dim=768, embed_dim=2048, num_heads=8):
+        super().__init__()
+        self.query = nn.Parameter(torch.randn(1, 1, dim))
+        self.attn = nn.MultiheadAttention(embed_dim=dim, num_heads=num_heads, batch_first=True)
+        self.proj = nn.Linear(dim, embed_dim)
+    def forward(self, x):  # x: (B, 576, 768)
+        B = x.size(0)
+        q = self.query.expand(B, -1, -1)   # (B, 1, 768)
+        attn_out, _ = self.attn(q, x, x)   # (B, 1, 768)
+        return self.proj(attn_out.squeeze(1))  # (B, 2048)
+class CrossAttentionBlock(nn.Module):
+    """
+    Cross-attention: Query tokens attend to Key/Value tokens from another modality.
+    """
+    def __init__(self, dim_q, dim_kv, num_heads=8, dropout=0.1, proj_dim=None):
+        super().__init__()
+        self.proj_dim = proj_dim or dim_q
+        self.num_heads = num_heads
+        self.head_dim = self.proj_dim // num_heads
+        self.scale = self.head_dim ** -0.5
+        self.q_proj = nn.Linear(dim_q, self.proj_dim)
+        self.k_proj = nn.Linear(dim_kv, self.proj_dim)
+        self.v_proj = nn.Linear(dim_kv, self.proj_dim)
+        self.out_proj = nn.Linear(self.proj_dim, dim_q)
+        self.dropout = nn.Dropout(dropout)
+        self.norm_q = nn.LayerNorm(dim_q)
+        self.norm_kv = nn.LayerNorm(dim_kv)
+    def forward(self, query, key_value):
+        B, N_q, _ = query.shape
+        N_kv = key_value.shape[1]
+        q = self.norm_q(query)
+        kv = self.norm_kv(key_value)
+        Q = self.q_proj(q).view(B, N_q, self.num_heads, self.head_dim).transpose(1, 2)
+        K = self.k_proj(kv).view(B, N_kv, self.num_heads, self.head_dim).transpose(1, 2)
+        V = self.v_proj(kv).view(B, N_kv, self.num_heads, self.head_dim).transpose(1, 2)
+        attn = (Q @ K.transpose(-2, -1)) * self.scale
+        attn = F.softmax(attn, dim=-1)
+        attn = self.dropout(attn)
+        out = (attn @ V).transpose(1, 2).reshape(B, N_q, self.proj_dim)
+        out = self.out_proj(out)
+        return query + self.dropout(out)
+class BidirectionalCrossAttention(nn.Module):
+    """
+    Bidirectional: MAE attends to DenseNet AND DenseNet attends to MAE.
+    """
+    def __init__(self, mae_dim=768, dense_dim=2048, num_heads=8, dropout=0.1, proj_dim=512):
+        super().__init__()
+        # MAE queries DenseNet
+        self.mae_cross = CrossAttentionBlock(mae_dim, dense_dim, num_heads, dropout, proj_dim)
+        # DenseNet queries MAE
+        self.dense_cross = CrossAttentionBlock(dense_dim, mae_dim, num_heads, dropout, proj_dim)
+        # FFN blocks
+        self.mae_ffn = nn.Sequential(
+            nn.LayerNorm(mae_dim),
+            nn.Linear(mae_dim, mae_dim * 4),
+            nn.GELU(),
+            nn.Dropout(dropout),
+            nn.Linear(mae_dim * 4, mae_dim),
+            nn.Dropout(dropout)
+        )
+        self.dense_ffn = nn.Sequential(
+            nn.LayerNorm(dense_dim),
+            nn.Linear(dense_dim, dense_dim * 2),
+            nn.GELU(),
+            nn.Dropout(dropout),
+            nn.Linear(dense_dim * 2, dense_dim),
+            nn.Dropout(dropout)
+        )
+    def forward(self, mae_tokens, dense_tokens):
+        # Cross attention
+        mae_out = self.mae_cross(mae_tokens, dense_tokens)
+        dense_out = self.dense_cross(dense_tokens, mae_tokens)
+        # FFN with residual
+        mae_out = mae_out + self.mae_ffn(mae_out)
+        dense_out = dense_out + self.dense_ffn(dense_out)
+        return mae_out, dense_out
+class LearnedLogitEnsemble(nn.Module):
+    def __init__(self, num_heads=7, num_classes=14, temperature_init=1.0, use_gate=False):
+        super().__init__()
+        self.num_classes = num_classes
+        self.num_heads = num_heads
+        # 1. Per-head temperature (very important!)
+        self.log_temps = nn.Parameter(torch.ones(num_heads) * math.log(temperature_init))
+        # 2. Learned head weights via tiny gating network (best version)
+        # Input = concatenated logits (or probs) → predicts soft weights
+        gate_input_dim = num_classes * num_heads   # concatenating raw logits works best
+        self.use_gate = use_gate
+        if use_gate:
+            self.gate = nn.Sequential(
+                nn.Linear(gate_input_dim, 256),
+                nn.GELU(),
+                nn.LayerNorm(256),
+                nn.Dropout(0.1),
+                nn.Linear(256, num_heads),
+            )
+        else:
+            # Simpler: just learn fixed weights + L2 regularization later
+            self.raw_weights = nn.Parameter(torch.ones(num_heads))
+    def forward(self, logits_list):
+        """
+        logits_list: list/tuple of 7 tensors, each (B, 14)
+        """
+        B = logits_list[0].size(0)
+        device = logits_list[0].device
+        # Step 1: Temperature scaling per head
+        scaled_logits = []
+        for i, logits in enumerate(logits_list):
+            T = torch.exp(self.log_temps[i])           # >0 guaranteed
+            scaled_logits.append(logits / (T + 1e-8))
+        # Stack → (B, num_heads, num_classes)
+        stacked = torch.stack(scaled_logits, dim=1)     # (B, 7, 14)
+        if self.use_gate:
+            # Step 2: Dynamic gating (sample-wise & class-wise aware)
+            gate_in = stacked.flatten(1)               # (B, 7*14)
+            raw_gate = self.gate(gate_in)              # (B, 7)
+            weights = torch.softmax(raw_gate, dim=-1).unsqueeze(-1)  # (B,7,1)
+        else:
+            # Step 2: Fixed learned weights (still strong!)
+            weights = torch.softmax(self.raw_weights, dim=0)        # (7,)
+            weights = weights.view(1, self.num_heads, 1).to(device) # (1,7,1)
+        # Step 3: Weighted average in logit space
+        fused_logits = (stacked * weights).sum(dim=1)               # (B, 14)
+        return fused_logits
+class XRAYClassifier(nn.Module):
+    def __init__(self, num_classes=14, c=1, mask_ratio=0, dropout=0.25, img_size=384,
+                 encoder_dim=768, mlp_dim=3072, decoder_dim=512, encoder_depth=12,
+                 encoder_head=8, decoder_depth=8, decoder_head=8, patch_size=8):
+        super().__init__()
+        # ---- MAE branch (frozen) ----
+        self.mae = MaskedAutoEncoder(
+            c=c, mask_ratio=0, dropout=dropout, img_size=img_size,
+            encoder_dim=encoder_dim, mlp_dim=mlp_dim, decoder_dim=decoder_dim,
+            encoder_depth=encoder_depth, encoder_head=encoder_head,
+            decoder_depth=decoder_depth, decoder_head=decoder_head, patch_size=patch_size
+        )
+        for p in self.mae.parameters():
+            p.requires_grad = False
+        self.token_ln = nn.LayerNorm(encoder_dim)
+        self.attn_selfpool_mae=AttentionPool(encoder_dim,1024)
+        # ---- DenseNet branch (pretrained by you) ----
+        # If your DenseNet supports 1 channel, set c=1 and remove the input duplication at forward.
+        self.dense = DenseNet(c=2, k=64, num_classes=num_classes)
+        self.dn_feat_dim = 2048
+        # ---- Cross-Attention Fusion (NEW) ----
+        self.cross_attn_layers = nn.ModuleList([
+            BidirectionalCrossAttention(
+                mae_dim=encoder_dim,      # 768
+                dense_dim=self.dn_feat_dim,  # 2048
+                num_heads=8,
+                dropout=0.1,
+                proj_dim=512
+            )
+            for _ in range(12)
+        ])
+        self.attn_pool_mae=AttentionPool(encoder_dim,1024)
+        self.classifier_mae=nn.Sequential(
+            nn.Linear(1024, 512),
+            nn.GELU(),
+            nn.Dropout(0.1),
+            nn.Linear(512, num_classes),
+        )
+        self.attn_pool_dense=AttentionPool(self.dn_feat_dim,1024)
+        self.classifier_attn=nn.Sequential(
+            nn.Linear(2048, 1024),
+            nn.GELU(),
+            nn.Dropout(0.2),
+            nn.Linear(1024, 512),
+            nn.GELU(),
+            nn.Dropout(0.1),
+            nn.Linear(512, num_classes),
+        )
+        #FPN
+        self.lateral5 = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0)  # feat4: 2048 ✅
+        self.lateral4 = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0)  # feat3: 2048 (CHANGED)
+        self.lateral3 = nn.Conv2d(1024, 256, kernel_size=1, stride=1, padding=0)  # feat2: 1024 ✅
+        self.lateral2 = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0)   # feat1: 512 (CHANGED)
+        self.output5 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        self.output4 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        self.output3 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        self.output2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
+        self.upsample = nn.Upsample(scale_factor=2, mode='nearest')
+        self._classify_out5 = nn.Linear(256, num_classes)
+        self._classify_out4 = nn.Linear(256, num_classes)
+        self._classify_out3 = nn.Linear(256, num_classes)
+        self._classify_out2 = nn.Linear(256, num_classes)
+        self.learned_logit_ensemble = LearnedLogitEnsemble(num_classes=num_classes)
+    def forward(self, x):
+        mae_tokens, _, _, _ = self.mae.encoder(x)
+        mae_tokens = self.token_ln(mae_tokens)
+        #self.generate_kmeans_mask(self.kmeans,mae_tokens,5)
+        doublex=torch.cat([x,x],dim=1)  # [B, 2, 384, 384]
+        # ---- DenseNet path - Extract multi-scale features ----
+        xdense = self.dense.initialconv(doublex)  # [B, 128, 192, 192]
+        # Layer 1 + ECA (BEFORE transition)
+        feat1 = self.dense.layer1(xdense)
+        feat1 = self.dense.dropout1(feat1)
+        feat1 = self.dense.eca1(feat1)            # [B, 512, 192, 192] ← Keep this!
+        xdense1 = self.dense.trans1(feat1)        # [B, 256, 96, 96]
+        # Layer 2 + ECA (BEFORE transition)
+        feat2 = self.dense.layer2(xdense1)
+        feat2 = self.dense.dropout2(feat2)
+        feat2 = self.dense.eca2(feat2)            # [B, 1024, 96, 96] ← Keep this!
+        xdense2 = self.dense.trans2(feat2)        # [B, 512, 48, 48]
+        # Layer 3 + ECA (BEFORE transition)
+        feat3 = self.dense.layer3(xdense2)
+        feat3 = self.dense.dropout3(feat3)
+        feat3 = self.dense.eca3(feat3)            # [B, 2048, 48, 48] ← Keep this!
+        xdense3 = self.dense.trans3(feat3)        # [B, 1024, 24, 24]
+        # Layer 4 (no transition)
+        feat4 = self.dense.layer4(xdense3)
+        feat4 = self.dense.dropout4(feat4)
+        feat4 = self.dense.eca4(feat4)            # [B, 2048, 24, 24]
+        xdense4 = feat4
+        # Global pooling for DenseNet classifier
+        xdense_pooled = self.dense.global_average_pool(xdense4)
+        xdense_pooled = xdense_pooled.view(xdense_pooled.size(0), -1)
+        xdense_pooled = self.dense.dropout(xdense_pooled)
+        classifier_xdense = self.dense.classifier(xdense_pooled)
+        # Dense tokens for cross-attention
+        dense_tokens = xdense4.flatten(2).transpose(1, 2)  # [B, 576, 2048]
+        # ---- FPN with CORRECT multi-scale features ----
+        c4 = self.lateral5(feat4)   # [B, 2048, 24, 24]  → [B, 256, 24, 24]
+        c3 = self.lateral4(feat3)   # [B, 2048, 48, 48]  → [B, 256, 48, 48]
+        c2 = self.lateral3(feat2)   # [B, 1024, 96, 96]  → [B, 256, 96, 96]
+        c1 = self.lateral2(feat1)   # [B, 512, 192, 192] → [B, 256, 192, 192]
+        # Top-down pathway
+        p4 = c4                         # 24×24
+        p4 = self.output5(p4)
+        p3 = self.upsample(p4) + c3     # 48×48 + 48×48 ✅
+        p3 = self.output4(p3)
+        p2 = self.upsample(p3) + c2     # 96×96 + 96×96 ✅
+        p2 = self.output3(p2)
+        p1 = self.upsample(p2) + c1     # 192×192 + 192×192 ✅
+        p1 = self.output2(p1)
+        # Classification heads
+        out4 = self._classify_out5(p4.mean([2, 3]))
+        out3 = self._classify_out4(p3.mean([2, 3]))
+        out2 = self._classify_out3(p2.mean([2, 3]))
+        out1 = self._classify_out2(p1.mean([2, 3]))
+        # ---- MAE path ----
+        mae_tokens_pooled = self.attn_selfpool_mae(mae_tokens)
+        classifier_mae = self.classifier_mae(mae_tokens_pooled)
+        # ---- Cross attention ----
+        for cross_layer in self.cross_attn_layers:
+            mae_cross, dense_cross = cross_layer(mae_tokens, dense_tokens)
+        mae_cross = self.attn_pool_mae(mae_cross)
+        dense_cross = self.attn_pool_dense(dense_cross)
+        out = torch.cat([mae_cross, dense_cross], dim=1)
+        classifier_attn = self.classifier_attn(out)
+        # ---- Ensemble ----
+        merged_classifier = self.learned_logit_ensemble([
+            classifier_mae,
+            classifier_xdense,
+            classifier_attn,
+            out4, out3, out2, out1  # 7 heads
+        ])
+        return merged_classifier

models/densenet.py ADDED Viewed

	@@ -0,0 +1,157 @@

+import torch
+import torch.nn as nn
+from torch.utils.checkpoint import checkpoint
+class ChannelAttention(nn.Module):
+    def __init__(self,channels,reduction=16):
+        super().__init__()
+        self.conv1=nn.Conv2d(channels,channels//reduction,kernel_size=1,bias=False)
+        self.relu=nn.ReLU(inplace=True)
+        self.conv2=nn.Conv2d(channels//reduction,channels,kernel_size=1,bias=False)
+        self.sigmoid=nn.Sigmoid()
+        self.avgpool=nn.AdaptiveAvgPool2d((1,1))
+        self.maxpool=nn.AdaptiveMaxPool2d((1,1))
+    def forward(self,x):
+        identity=x
+        avgpool=self.avgpool(x)
+        maxpool=self.maxpool(x)
+        avgpool=self.relu(self.conv1(avgpool))
+        maxpool=self.relu(self.conv1(maxpool))
+        avgpool=self.conv2(avgpool)
+        maxpool=self.conv2(maxpool)
+        out=self.sigmoid(avgpool+maxpool)
+        return identity*out
+class SpatialAttention(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv = nn.Conv2d(2, 1, kernel_size=7, padding=3)
+    def forward(self, x):
+        max_pool = torch.max(x, dim=1, keepdim=True)[0]
+        avg_pool = torch.mean(x, dim=1, keepdim=True)
+        attention = torch.cat([max_pool, avg_pool], dim=1)
+        attention = torch.sigmoid(self.conv(attention))
+        return x * attention
+class CBAM(nn.Module):
+    def __init__(self,channels):
+        super().__init__()
+        self.ca=ChannelAttention(channels)
+        self.sa=SpatialAttention()
+    def forward(self,x):
+        x=self.ca(x)
+        x=self.sa(x)
+        return x
+class InitialConv(nn.Module):
+    def __init__(self,input_channel=1,k=64):
+        super().__init__()
+        self.conv=nn.Conv2d(in_channels=input_channel,out_channels=2*k,kernel_size=7,stride=1,padding=3) # from B,1,384,384 to #B,128,384,384
+        self.bn=nn.BatchNorm2d(num_features=2*k)
+        self.relu=nn.ReLU(inplace=True)
+        self.pool=nn.MaxPool2d(kernel_size=3,stride=2,padding=1) #from 384 to 256 #output B,128,192,192
+    def forward(self,x):
+        return self.pool(self.relu(self.bn(self.conv(x))))
+class DenseLayer(nn.Module):
+    def __init__(self,c,k=64):
+        super().__init__()
+        self.bn1=nn.BatchNorm2d(num_features=c)
+        self.relu1=nn.ReLU(inplace=True)
+        self.conv1x1=nn.Conv2d(c,4*k,kernel_size=1)
+        self.bn2=nn.BatchNorm2d(num_features=4*k)
+        self.relu2=nn.ReLU(inplace=True)
+        self.conv3x3=nn.Conv2d(4*k,k,kernel_size=3, padding=1)
+    def forward(self,x):
+        identity=x
+        x=self.conv1x1(self.relu1(self.bn1(x)))
+        x=self.conv3x3(self.relu2(self.bn2(x)))
+        return torch.cat([identity,x],dim=1)
+class DenseBlock(nn.Module):
+    def __init__(self,c,k=64,layer_len=6):
+        super().__init__()
+        self.blks=nn.ModuleList()
+        current_c = c
+        for _ in range(layer_len):
+            self.blks.append(DenseLayer(current_c, k))
+            current_c += k
+    def forward(self,x):
+        for layer in self.blks:x=checkpoint(layer, x,use_reentrant=False)
+        return x
+class Transition(nn.Module):
+    def __init__(self,inchannels,down_factor=0.5):
+        super().__init__()
+        self.bn=nn.BatchNorm2d(num_features=inchannels)
+        self.relu=nn.ReLU(inplace=True)
+        self.conv1x1=nn.Conv2d(in_channels=inchannels,out_channels=int(down_factor*inchannels),kernel_size=1)
+        self.avgpool=nn.AvgPool2d(kernel_size=2,stride=2)
+    def forward(self,x):
+        return self.avgpool(self.conv1x1(self.relu(self.bn(x))))
+class DenseNet(nn.Module):
+    def __init__(self,c=2,k=64,num_classes=14):
+        super().__init__()
+        self.initialconv=InitialConv(input_channel=c,k=k) #output B,128,192,192
+        self.layer1=DenseBlock(c=128,k=k,layer_len=6) #output B,inchannels+(layer_len*k),192,192 i.e # B,512,192,192
+        self.dropout1 = nn.Dropout(p=0.05)
+        self.eca1=CBAM(512)
+        self.trans1=Transition(inchannels=512,down_factor=0.5) #output B,256,96,96
+        self.layer2=DenseBlock(c=256,k=k,layer_len=12) #output B,inchannels+(layer_len*k),96,96 i.e # B,1024,96,96
+        self.dropout2 = nn.Dropout(p=0.1)
+        self.eca2=CBAM(1024)
+        self.trans2=Transition(inchannels=1024,down_factor=0.5) #output B,512,48,48
+        self.layer3=DenseBlock(c=512,k=k,layer_len=24) #output B,inchannels+(layer_len*k),48,48 i.e # B,2048,48,48
+        self.dropout3 = nn.Dropout(p=0.1)
+        self.eca3=CBAM(2048)
+        self.trans3=Transition(inchannels=2048,down_factor=0.5) #output B,1024,24,24
+        self.layer4=DenseBlock(c=1024,k=k,layer_len=16) #output B,inchannels+(layer_len*k),24,24 i.e # B,2048,24,24
+        self.dropout4 = nn.Dropout(p=0.1)
+        self.eca4=CBAM(2048)
+        self.global_average_pool= nn.AdaptiveAvgPool2d((1,1)) #output B,2048,1,1
+        self.classifier = nn.Sequential(
+                nn.Linear(2048, 1024),
+                nn.BatchNorm1d(1024),
+                nn.ReLU(),
+                nn.Dropout(0.1),
+                nn.Linear(1024, 512),
+                nn.BatchNorm1d(512),
+                nn.ReLU(),
+                nn.Dropout(0.1),
+                nn.Linear(512, 256),
+                nn.BatchNorm1d(256),
+                nn.ReLU(),
+                nn.Dropout(0.1),
+                nn.Linear(256, num_classes)
+            )
+        self.dropout = nn.Dropout(p=0.2)
+        for lay in self.classifier:
+            if isinstance(lay, nn.Linear):
+                nn.init.xavier_uniform_(lay.weight, gain=1.0)
+                nn.init.constant_(lay.bias, 0.0)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
+    def forward(self,x):
+        x=self.initialconv(x)
+        x=self.trans1(self.eca1(self.dropout1(self.layer1(x))))
+        x=self.trans2(self.eca2(self.dropout2(self.layer2(x))))
+        x=self.trans3(self.eca3(self.dropout3(self.layer3(x))))
+        x=self.eca4(self.dropout4(self.layer4(x)))
+        #x1=self.attn(x)
+        x=self.global_average_pool(x)
+        x=x.view(x.size(0),-1)
+        #x=torch.cat([x1,x2],dim=1)
+        x=self.dropout(x)
+        x=self.classifier(x)
+        return x
+    @staticmethod
+    def testme():
+        model=DenseNet()
+        sample=torch.randn(2,2,384,384)
+        out=model(sample)
+        print(out.shape)

models/mae.py ADDED Viewed

	@@ -0,0 +1,177 @@

+import torch
+import torch.nn as nn
+import math
+def patchify(x,patch_size=8):
+    b,c,h,w=x.shape
+    th=h//patch_size
+    tw=w//patch_size
+    assert h%patch_size==0 and w%patch_size==0, "Image size must be divisible by patch_size"
+    out=x.reshape(b,c,th,patch_size,tw,patch_size)
+    out=out.permute(0,2,4,1,3,5).contiguous()
+    out=out.view(b,th*tw,c*(patch_size**2))
+    return out
+def unpatchify(x,patch_size=8):
+    b,z,p=x.shape
+    c=p//(patch_size**2)
+    th=int(math.sqrt(z))
+    tw=th
+    h=th*patch_size
+    w=tw*patch_size
+    x=x.view(b,th,tw,c,patch_size,patch_size)
+    x=x.permute(0,3,1,4,2,5).contiguous()
+    out=x.view(b,c,h,w)
+    return out
+def random_mask(x,mask_ratio=0.75):
+    b,n,p=x.shape
+    len_keep=int(n*(1-mask_ratio))
+    noise=torch.rand(b,n).to(x.device)
+    ids_shuffle=torch.argsort(noise,dim=1)
+    ids_restore=torch.argsort(ids_shuffle,dim=1)
+    ids_keep=ids_shuffle[:,:len_keep]
+    x_masked=torch.gather(x,dim=1,index=ids_keep.unsqueeze(-1).expand(-1,-1,p)).to(x.device)
+    mask=torch.ones(b,n).to(x.device)
+    mask[:,:len_keep]=0
+    mask=torch.gather(mask,dim=1,index=ids_restore).to(x.device)
+    return x_masked,mask,ids_restore,ids_keep
+def mae_loss(pred, target, mask):
+    # pred/target: (B, N, P), mask: (B, N) with 1=masked
+    B, N, P = pred.shape
+    mask = mask.unsqueeze(-1).float()  # (B, N, 1)
+    loss = (pred - target) ** 2
+    loss = (loss * mask).sum() / mask.sum().clamp_min(1.0)
+    return loss
+class PositionalEncoding(nn.Module):
+    def __init__(self,num_patches,hidden_dim=768):
+        super().__init__()
+        self.pos_embed=nn.Parameter(torch.empty(1,num_patches,hidden_dim))
+        nn.init.trunc_normal_(self.pos_embed, std=0.02)
+    def forward(self, x, visible_indices):
+        # x: (B, len_keep, D); visible_indices: (B, len_keep)
+        B, L, D = x.shape
+        # expand table to (B, N, D)
+        pos = self.pos_embed.expand(B, -1, -1)                 # (B, N, D)
+        # build gather index (B, L, D)
+        idx = visible_indices.unsqueeze(-1).expand(B, L, pos.size(-1))
+        visible_pos = torch.gather(pos, 1, idx)                # (B, L, D)
+        return x + visible_pos
+class TransformerBlock(nn.Module):
+    def __init__(self,hidden_dim,mlp_dim,num_heads,dropout):
+        super().__init__()
+        self.layernorm1=nn.LayerNorm(hidden_dim)
+        self.multihead=nn.MultiheadAttention(batch_first=True,embed_dim=hidden_dim,num_heads=num_heads,dropout=dropout)
+        self.layernorm2=nn.LayerNorm(hidden_dim)
+        self.mlp=nn.Sequential(
+            nn.Linear(hidden_dim,mlp_dim),nn.GELU(),nn.Dropout(dropout),nn.Linear(mlp_dim,hidden_dim),nn.Dropout(dropout)
+        )
+    def forward(self,x):
+        residual=x
+        x=self.layernorm1(x)
+        attn,_=self.multihead(x,x,x)
+        x=residual+attn
+        residual=x
+        x=self.layernorm2(x)
+        x=self.mlp(x)
+        x=residual+x
+        return x
+class MAEEncoder(nn.Module):
+    """
+    patch_dim-> % non-masked * no. of patches
+    """
+    def __init__(self,patch_dim,num_patches=(384//4)**2,hidden_dim=768,mlp_dim=768*4,num_heads=8,depth=12,dropout=0.25,mask_ratio=0.75,patch_size=8):
+        super().__init__()
+        self.mask_ratio=mask_ratio
+        self.patch_size=patch_size
+        self.patch_embed=nn.Linear(patch_dim,hidden_dim)
+        self.pos_embed=PositionalEncoding(num_patches=num_patches,hidden_dim=hidden_dim)
+        self.transformer=nn.ModuleList([TransformerBlock(hidden_dim=hidden_dim,mlp_dim=mlp_dim,num_heads=num_heads,dropout=dropout)
+                                  for _ in range(depth)])
+        self._init_weights()
+    def _init_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.trunc_normal_(m.weight, std=0.02)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+    def forward(self,x_in):
+        x_p=patchify(x_in,self.patch_size)
+        x_masked,mask,ids_restore,ids_keep=random_mask(x_p,self.mask_ratio)
+        x= self.patch_embed(x_masked)
+        x=self.pos_embed(x,ids_keep)
+        for attn_layer in self.transformer:x=attn_layer(x)
+        return x,mask,ids_keep,ids_restore
+class MAEDecoder(nn.Module):
+    def __init__(self,c,num_patches,patch_size,encoder_dim,decoder_dim,decoder_depth,mlp_dim,num_heads,dropout):
+        super().__init__()
+        self.num_patches=num_patches
+        self.encoder_dim=encoder_dim
+        self.decoder_dim=decoder_dim
+        self.mask_token=nn.Parameter(torch.empty(1,1,decoder_dim))
+        self.enc_to_dec=nn.Linear(encoder_dim,decoder_dim)
+        self.pos_embed=nn.Parameter(torch.empty(1,num_patches,decoder_dim))
+        self.transformer=nn.ModuleList([TransformerBlock(hidden_dim=decoder_dim,mlp_dim=mlp_dim,num_heads=num_heads,dropout=dropout)
+                                  for _ in range(decoder_depth)])
+        self.layernorm=nn.LayerNorm(decoder_dim)
+        self.pred=nn.Linear(decoder_dim,c*(patch_size**2))
+        self._init_weights()
+    def _init_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.trunc_normal_(m.weight, std=0.02)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+        nn.init.trunc_normal_(self.pos_embed, std=0.02)
+        nn.init.trunc_normal_(self.mask_token, std=0.02)
+    def forward(self,x,ids_keep,ids_restore):
+        b,n,p=x.shape
+        xdec=self.enc_to_dec(x)
+        len_keep=xdec.size(1)
+        num_patches=ids_restore.size(1)
+        num_mask=num_patches-len_keep
+        mask_token=self.mask_token.expand(b,num_mask,-1)
+        x_=torch.cat([xdec,mask_token],dim=1)
+        x_=torch.gather(x_,dim=1,index=ids_restore.unsqueeze(-1).expand(-1,-1,x_.size(-1)))
+        x_=x_+self.pos_embed
+        for block in self.transformer:x_=block(x_)
+        x_=self.layernorm(x_)
+        out=self.pred(x_)
+        return out
+class MaskedAutoEncoder(nn.Module):
+    def __init__(self,c=1,mask_ratio=0.75,dropout=0.25,img_size=384,encoder_dim=768,mlp_dim=3072,decoder_dim=512,encoder_depth=12,encoder_head=8,decoder_depth=8,decoder_head=8,patch_size=8):
+        super().__init__()
+        self.patch_size=patch_size
+        self.encoder=MAEEncoder(patch_dim=c*(patch_size**2),num_patches=(img_size//patch_size)**2
+                                ,hidden_dim=encoder_dim,mlp_dim=mlp_dim,num_heads=encoder_head
+                                ,depth=encoder_depth,dropout=dropout,mask_ratio=mask_ratio,patch_size=patch_size)
+        self.decoder=MAEDecoder(c,num_patches=(img_size//patch_size)**2,patch_size=patch_size
+                                ,encoder_dim=encoder_dim,decoder_dim=decoder_dim,decoder_depth=decoder_depth
+                                ,mlp_dim=mlp_dim,num_heads=decoder_head,dropout=dropout)
+    def forward(self,x):
+        b,c,h,w=x.shape
+        encoded,mask,ids_keep,ids_restore=self.encoder(x)
+        decoded=self.decoder(encoded,ids_keep,ids_restore)
+        xpatched=patchify(x,self.patch_size)
+        return xpatched,decoded,mask
+    @staticmethod
+    def testme():
+        img=torch.rand(1,1,384,384)
+        mae=MaskedAutoEncoder()
+        a,b,c=mae(img)
+        print(a.shape)
+        print(b.shape)
+        print(c.shape)

notebooks/chexpert_mae.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/chexpert_mae_mask_classifier.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,29 @@

+# Core Deep Learning
+torch>=2.0.0
+torchvision>=0.15.0
+# Data Processing
+numpy>=1.24.0
+pandas>=2.0.0
+scikit-learn>=1.3.0
+# Image Processing
+Pillow>=10.0.0
+opencv-python>=4.8.0
+albumentations>=1.3.1
+# Visualization
+matplotlib>=3.7.0
+seaborn>=0.12.0
+# Utilities
+tqdm>=4.65.0
+# Jupyter (optional - for notebooks)
+jupyter>=1.0.0
+ipykernel>=6.25.0
+ipywidgets>=8.1.0
+# Additional utilities (if needed)
+# lmdb>=1.4.0  # Uncomment if using LMDB for caching
+# tensorboard>=2.13.0  # Uncomment if using TensorBoard logging

results/test-results.docx ADDED Viewed

Binary file (7.54 kB). View file

trainer/__init__.py ADDED Viewed

File without changes

trainer/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (152 Bytes). View file

trainer/__pycache__/__init__.cpython-314.pyc ADDED Viewed

Binary file (154 Bytes). View file

trainer/__pycache__/trainer.cpython-313.pyc ADDED Viewed

Binary file (1.01 kB). View file

trainer/__pycache__/trainer.cpython-314.pyc ADDED Viewed

Binary file (713 Bytes). View file

trainer/__pycache__/utils.cpython-313.pyc ADDED Viewed

Binary file (48.4 kB). View file

trainer/test.py ADDED Viewed

	@@ -0,0 +1,15 @@

+from .utils import Trainer
+from configs.configs import root,config
+def main():
+    print("Testing classifier")
+    try:
+        tester=Trainer(config)
+        tester.test(model_path=config["resume"])
+    except:
+        import traceback
+        traceback.print_exc()
+if __name__=="__main__":main()

trainer/trainer.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from .utils import *
+from configs.configs import root,config,mae_config
+def main():
+    try:
+        decision=input("train mae or classifier? ")
+        if decision=="mae":
+            print(f"Training mae")
+            trainer=MAETrainer(mae_config)
+            trainer.train()
+        if decision=="classifier":
+            print(f"Training classifier")
+            trainer=Trainer(config)
+            trainer.train()
+    except:
+        import traceback
+        traceback.print_exc()
+if __name__=="__main__":main()

trainer/utils.py ADDED Viewed

	@@ -0,0 +1,837 @@

+from data.dataset import CheXpertDataset
+from loss.assymetric import AsymmetricLoss
+from models.mae import *
+from models.densenet import *
+from models.classifier import *
+from torch.utils.data import DataLoader
+import json
+import os
+import io
+import sys
+from sklearn.metrics import roc_auc_score,confusion_matrix
+class TeeFile:
+    """
+    File-like object that writes to multiple streams (e.g., stdout and a file)
+    Automatically handles string paths by opening them as files.
+    Usage:
+        # This now works with both file objects and paths
+        tee = TeeFile(sys.stdout, "/path/to/log.txt")
+        print("Hello", file=tee)  # Writes to both stdout and the file
+    """
+    def __init__(self, *file_objects_or_paths):
+        """
+        Args:
+            *file_objects_or_paths: Mix of file objects (like sys.stdout)
+                                   or string paths to log files
+        """
+        self.files = []
+        self.opened_files = []  # Track files we opened so we can close them later
+        for item in file_objects_or_paths:
+            if isinstance(item, str):
+                # It's a path string - open it as a file
+                f = open(item, 'a', buffering=1)  # Append mode, line buffered
+                self.files.append(f)
+                self.opened_files.append(f)
+            else:
+                # It's already a file-like object (e.g., sys.stdout)
+                self.files.append(item)
+    def write(self, data):
+        """Write data to all streams"""
+        for f in self.files:
+            try:
+                f.write(data)
+                f.flush()
+            except Exception as e:
+                # Handle closed file gracefully
+                print(f"Warning: Could not write to {f}: {e}", file=sys.stderr)
+    def flush(self):
+        """Flush all streams"""
+        for f in self.files:
+            try:
+                f.flush()
+            except:
+                pass
+    def isatty(self):
+        """Check if any stream is a terminal (for tqdm compatibility)"""
+        return any(getattr(f, "isatty", lambda: False)() for f in self.files)
+    def fileno(self):
+        """Get file descriptor from any real file-like stream"""
+        for f in self.files:
+            if hasattr(f, "fileno"):
+                try:
+                    return f.fileno()
+                except Exception:
+                    pass
+        raise io.UnsupportedOperation("No fileno available")
+    def close(self):
+        """Close any files we opened"""
+        for f in self.opened_files:
+            try:
+                f.close()
+            except:
+                pass
+        self.opened_files.clear()
+    def __del__(self):
+        """Cleanup on deletion"""
+        self.close()
+    def __enter__(self):
+        """Context manager support"""
+        return self
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        """Context manager cleanup"""
+        self.close()
+        return False
+class MAETrainer:
+    def __init__(self,configs={}):
+        self.configs=configs
+        os.makedirs(configs["logdir"],exist_ok=True)
+        log_path_train = os.path.join(configs["logdir"], "training_log.txt")
+        log_path_val = os.path.join(configs["logdir"], "val_log.txt")
+        log_path_test = os.path.join(configs["logdir"], "test_log.txt")
+        #self.log_file = open(log_path, 'w', buffering=1)
+        self.traintee = TeeFile(sys.stdout, log_path_train)
+        self.valtee = TeeFile(sys.stdout, log_path_val)
+        self.testtee = TeeFile(sys.stdout, log_path_test)
+        for dir in self.configs["dirsToMake"]: os.makedirs(dir,exist_ok=True)
+        self.model=MaskedAutoEncoder(
+            c=configs["channels"],
+            mask_ratio=configs["mask_ratio"],
+            dropout=configs["dropout"],
+            img_size=configs["img_size"],
+            encoder_dim=configs["encoder_dim"],
+            mlp_dim=configs["mlp_dim"],
+            decoder_dim=configs["decoder_dim"],
+            encoder_depth=configs["encoder_depth"],
+            encoder_head=configs["encoder_head"],
+            decoder_depth=configs["decoder_depth"],
+            decoder_head=configs["decoder_head"],
+            patch_size=configs["patch_size"]
+        ).to(configs["device"])
+        self.criterion=mae_loss
+        self.optimizer=torch.optim.AdamW(self.model.parameters(),configs["lr"], weight_decay=configs["weight_decay"])
+        self.schedular1=torch.optim.lr_scheduler.LinearLR(self.optimizer,start_factor=0.1,end_factor=1.0,total_iters=configs["warmup"])
+        self.schedular2=torch.optim.lr_scheduler.CosineAnnealingLR(self.optimizer,T_max=configs["num_epochs"]-configs["warmup"])
+        self.schedular=torch.optim.lr_scheduler.SequentialLR (self.optimizer,schedulers=[self.schedular1,self.schedular2],milestones=[configs["warmup"]])
+        self.scaler=torch.amp.GradScaler()
+        self.train_dataset= CheXpertDataset(zip_path=configs["zip_path"],csv_path=configs["train_csv"],root_dir=configs["datadir"],augment=True,use_frontal_only=True)
+        self.val_dataset= CheXpertDataset(zip_path=configs["zip_path"],csv_path=configs["val_csv"],root_dir=configs["datadir"],augment=False,use_frontal_only=True )
+        self.class_Weights=self.train_dataset.get_class_weights().to(self.configs["device"])
+        self.sample_Weights=self.train_dataset.get_sample_weights()
+        self.sampler=torch.utils.data.WeightedRandomSampler(self.sample_Weights,num_samples=len(self.sample_Weights))
+        self.trainloader=DataLoader(self.train_dataset,batch_size=configs["batch_size"],sampler=self.sampler,num_workers=8,pin_memory=True,persistent_workers=True)
+        self.valloader=DataLoader(self.val_dataset,batch_size=configs["batch_size"],shuffle=False,num_workers=8,pin_memory=True,persistent_workers=True)
+        self.history={"train_loss":[],"val_loss":[]}
+        self.current_epoch=0
+        if os.path.exists(self.configs["resume"]):
+            loadedpickle=torch.load(self.configs["resume"],map_location=self.configs["device"])
+            self.model.load_state_dict(loadedpickle["model"],strict=False)
+            self.optimizer.load_state_dict(loadedpickle["optimizer"])
+            self.schedular.load_state_dict(loadedpickle["schedular"])
+            self.schedular1.load_state_dict(loadedpickle["schedular1"])
+            self.schedular2.load_state_dict(loadedpickle["schedular2"])
+            self.scaler.load_state_dict(loadedpickle["scaler"])
+            self.current_epoch=loadedpickle["epoch"]+1
+        self.test_dataset = None
+        self.testloader   = None
+        if configs.get("test_csv"):
+            self.test_dataset = CheXpertDataset(
+                zip_path=configs["zip_path"],
+                csv_path=configs["test_csv"],
+                root_dir=configs["datadir"],
+                augment=False,
+                use_frontal_only=True
+            )
+            self.testloader = DataLoader(
+                self.test_dataset,
+                batch_size=configs["batch_size"],
+                shuffle=False,
+                num_workers=8,
+                pin_memory=True,
+                persistent_workers=True
+            )
+            print(f"Test loader ready – {len(self.test_dataset)} images")
+            torch.backends.cudnn.benchmark = True
+            torch.backends.cudnn.enabled = True
+            # FIX: Set memory allocator settings
+            os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
+            # FIX: Enable gradient checkpointing if model supports it
+            if hasattr(self.model, 'enable_gradient_checkpointing'):
+                self.model.enable_gradient_checkpointing()
+    @staticmethod
+    def plot_training_metrics(metrics, epoch,figs_path):
+        import matplotlib.pyplot as plt
+        """
+        Plot loss and AUC curves from training metrics.
+        Args:
+            metrics (dict): Dictionary containing lists for each metric key:
+                {
+                    "train_loss": [...],
+                    "val_loss": [...]
+                }
+            epoch (int): Current epoch number (used for title or axis scaling)
+        """
+        epochs = list(range(1, epoch + 1))
+        #Compute the common length across all series
+        keys = ["train_loss","val_loss"]
+        lengths = [len(metrics[k]) for k in keys if k in metrics]
+        if not lengths:
+            return
+        n = min(lengths)
+        # Slice everything to the same length
+        m = {k: metrics[k][:n] for k in keys if k in metrics}
+        epochs = list(range(1, n + 1))
+        plt.figure(figsize=(14, 6))
+        # ---- Loss subplot ----
+        plt.subplot(1, 2, 1)
+        plt.plot(epochs, metrics["train_loss"], label="Train Loss", marker='o')
+        plt.plot(epochs, metrics["val_loss"], label="Val Loss", marker='s')
+        plt.xlabel("Epoch")
+        plt.ylabel("Loss")
+        plt.title("Training & Validation Loss")
+        plt.legend()
+        plt.grid(True, linestyle='--', alpha=0.6)
+        plt.tight_layout()
+        os.makedirs(os.path.join(figs_path,str(epoch)),exist_ok=True)
+        plt.savefig(os.path.join(figs_path,str(epoch),"metrics.png"))
+        plt.show()
+    def train_epoch(self, epoch, looper):
+        self.model.train()
+        running_loss = 0.0
+        all_preds = []
+        all_targets = []
+        current_loss=0
+        total_batches = len(self.trainloader)
+        for batch_idx, data in looper:
+            image = data['image'].to(self.configs["device"], non_blocking=True)
+            target = data['labels'].to(self.configs["device"], non_blocking=True)
+            with torch.autocast(device_type=self.configs["device"].type, dtype=torch.float16):
+                img,preds,mask = self.model(image)
+                loss = self.criterion(img,preds,mask)
+            loss_back = loss / self.configs["accumulation"]
+            running_loss += loss.item()
+            if torch.isfinite(loss):
+                #loss_back.backward()
+                self.scaler.scale(loss_back).backward()
+            else:
+                self.optimizer.zero_grad(set_to_none=True)
+                continue
+            if (batch_idx + 1) % self.configs["accumulation"] == 0 or batch_idx == total_batches - 1:
+                self.scaler.unscale_(self.optimizer)
+                torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
+                self.scaler.step(self.optimizer)
+                self.scaler.update()
+                #self.optimizer.step()
+                self.schedular.step()
+                self.optimizer.zero_grad(set_to_none=True)
+            # === LIVE METRICS (every batch) ===
+            current_loss = running_loss / (batch_idx + 1)
+            if (batch_idx + 1) % 10 == 0:
+                current_lr = self.optimizer.param_groups[0]['lr']
+                looper.set_postfix({
+                    "lr": f"{current_lr:.2e}","batch":f"{batch_idx}/{total_batches}",
+                    "epoch": f"{epoch}/{self.configs['num_epochs']}",
+                    "loss": f"{current_loss:.3f}",
+                })
+        return current_loss
+    def validate(self, epoch, looper):
+        self.model.eval()
+        val_loss = 0.0
+        all_preds = []
+        all_targets = []
+        lenloader=len(self.valloader)
+        current_loss=0
+        with torch.no_grad():
+            for batch_idx, data in looper:
+                image = data["image"].to(self.configs["device"], non_blocking=True)
+                target = data["labels"].to(self.configs["device"], non_blocking=True)
+                with torch.autocast(device_type=self.configs["device"].type, dtype=torch.float16):
+                    img,preds,mask = self.model(image)
+                    loss = self.criterion(img,preds,mask)
+                val_loss += loss.item()
+                # === LIVE METRICS ===
+                current_loss = val_loss / (batch_idx + 1)
+                if (batch_idx + 1) % 10 == 0 :
+                    looper.set_postfix({
+                        "epoch": f"{epoch}/{self.configs['num_epochs']}",
+                        "batch":f"{batch_idx}/{lenloader}",
+                        "loss": f"{current_loss:.3f}",
+                    })
+        return current_loss
+    def train(self):
+        for epoch in range(self.current_epoch,self.configs["num_epochs"]):
+            trainlooper=tqdm(enumerate(self.trainloader),desc="training: ", leave=False,file=self.traintee)
+            vallooper=tqdm(enumerate(self.valloader),desc="validating: ",leave=False,file=self.valtee)
+            self.model.train()
+            self.optimizer.zero_grad(set_to_none=True)
+            running_loss=self.train_epoch(epoch,trainlooper)
+            torch.cuda.synchronize()
+            torch.cuda.empty_cache()
+            val_loss=self.validate(epoch,vallooper)
+            torch.cuda.synchronize()
+            torch.cuda.empty_cache()
+            gc.collect()
+            if (self.history["val_loss"] and (val_loss<min(self.history["val_loss"]))) :
+                checkpoint={"model":self.model.state_dict(),"optimizer":self.optimizer.state_dict(),"schedular":self.schedular.state_dict(),"schedular1":self.schedular1.state_dict(),"schedular2":self.schedular2.state_dict(),"scaler":self.scaler.state_dict(),"epoch":epoch}
+                torch.save(checkpoint, self.configs["resume"])
+            print(f"train loss {running_loss} val loss {val_loss}")
+            self.history["train_loss"].append(float(running_loss))
+            self.history["val_loss"].append(float(val_loss))
+            if epoch%10==0:
+                historyfile=os.path.join(self.configs["logdir"],"history.json")
+                if os.path.exists(historyfile):
+                    with open(historyfile,"r") as f:
+                      history=json.load(f)
+                      history["train_loss"]+=self.history["train_loss"]
+                      history["val_loss"]+=self.history["val_loss"]
+                with open(historyfile,"w") as f:
+                    json.dump(self.history,f)
+                    f.close()
+                MAETrainer.plot_training_metrics(self.history,epoch+1,self.configs["logdir"])
+            self.current_epoch=epoch
+class Trainer:
+    def __init__(self,configs={}):
+        self.configs=configs
+        os.makedirs(configs["logdir"],exist_ok=True)
+        log_path_train = os.path.join(configs["logdir"], "training_log.txt")
+        log_path_val = os.path.join(configs["logdir"], "val_log.txt")
+        log_path_test = os.path.join(configs["logdir"], "test_log.txt")
+        #self.log_file = open(log_path, 'w', buffering=1)
+        self.traintee = TeeFile(sys.stdout, log_path_train)
+        self.valtee = TeeFile(sys.stdout, log_path_val)
+        self.testtee = TeeFile(sys.stdout, log_path_test)
+        for dir in self.configs["dirsToMake"]: os.makedirs(dir,exist_ok=True)
+        self.model=XRAYClassifier(
+            c=configs["channels"],
+            num_classes=configs["num_classes"],
+            mask_ratio=configs["mask_ratio"],
+            dropout=configs["dropout"],
+            img_size=configs["img_size"],
+            encoder_dim=configs["encoder_dim"],
+            mlp_dim=configs["mlp_dim"],
+            decoder_dim=configs["decoder_dim"],
+            encoder_depth=configs["encoder_depth"],
+            encoder_head=configs["encoder_head"],
+            decoder_depth=configs["decoder_depth"],
+            decoder_head=configs["decoder_head"],
+            patch_size=configs["patch_size"]
+        ).to(configs["device"])
+        self.optimizer=torch.optim.AdamW(self.model.parameters(),configs["lr"], weight_decay=configs["weight_decay"])
+        self.schedular1=torch.optim.lr_scheduler.LinearLR(self.optimizer,start_factor=0.1,end_factor=1.0,total_iters=configs["warmup"])
+        self.schedular2=torch.optim.lr_scheduler.CosineAnnealingLR(self.optimizer,T_max=configs["num_epochs"]-configs["warmup"])
+        self.schedular=torch.optim.lr_scheduler.SequentialLR (self.optimizer,schedulers=[self.schedular1,self.schedular2],milestones=[configs["warmup"]])
+        self.scaler=torch.amp.GradScaler()
+        self.train_dataset= CheXpertDataset(zip_path=configs["zip_path"],csv_path=configs["train_csv"],root_dir=configs["datadir"],augment=True,use_frontal_only=True,mask_dir=configs["maskdir"])
+        self.val_dataset= CheXpertDataset(zip_path=configs["zip_path"],csv_path=configs["val_csv"],root_dir=configs["datadir"],augment=False,use_frontal_only=True,mask_dir=configs["maskdir"] )
+        self.class_Weights=self.train_dataset.get_class_weights().to(self.configs["device"])
+        self.sample_Weights=self.train_dataset.get_sample_weights()
+        self.sampler=torch.utils.data.WeightedRandomSampler(self.sample_Weights,num_samples=len(self.sample_Weights))
+        self.trainloader=DataLoader(self.train_dataset,batch_size=configs["batch_size"],sampler=self.sampler,num_workers=0,pin_memory=True,persistent_workers=False)
+        self.valloader=DataLoader(self.val_dataset,batch_size=configs["batch_size"],shuffle=False,num_workers=0,pin_memory=True,persistent_workers=False)
+        self.criterion=AsymmetricLoss(class_weights=self.class_Weights).to(self.configs["device"])
+        self.history={"train_loss":[],"val_loss":[],"train_macro_auc":[],"val_macro_auc":[],"train_micro_auc":[],"val_micro_auc":[]}
+        if os.path.exists(os.path.join(self.configs["logdir"],"history.json")):
+            with open(os.path.join(self.configs["logdir"],"history.json"),'r') as hf:
+                self.history=json.load(hf)
+                hf.close()
+        self.current_epoch=0
+        self.optimal_thresholds =[0.5]*14
+        if os.path.exists(self.configs["resume"]):
+            ckpt = torch.load(self.configs["resume"], map_location=self.configs["device"],weights_only=False)
+            self.model.load_state_dict(ckpt["model"], strict=False)
+            self.optimizer.load_state_dict(ckpt["optimizer"])
+            self.schedular.load_state_dict(ckpt["schedular"])
+            self.schedular1.load_state_dict(ckpt["schedular1"])
+            self.schedular2.load_state_dict(ckpt["schedular2"])
+            self.scaler.load_state_dict(ckpt["scaler"])
+            self.current_epoch = ckpt.get("epoch", -1) + 1
+            self.optimal_thresholds =ckpt.get("thresholds")
+        else:
+            # Load MAE backbone only (pretrained)
+            bb = torch.load(self.configs["backbone"], map_location=self.configs["device"],weights_only=False)
+            # Optional: strip 'module.' if present
+            state = bb["model"]
+            if any(k.startswith("module.") for k in state.keys()):
+                from collections import OrderedDict
+                state = OrderedDict((k.replace("module.", "", 1), v) for k, v in state.items())
+            missing, unexpected = self.model.mae.load_state_dict(state, strict=False)
+            print("loaded backbone")
+            if missing:    print(f"Missing keys: {len(missing)} (showing first 5): {missing[:5]}")
+            if unexpected: print(f"Unexpected keys: {len(unexpected)} (first 5): {unexpected[:5]}")
+            # (Optional) freeze backbone for warmup
+            for p in self.model.mae.parameters():
+                p.requires_grad = False
+            if os.path.exists(self.configs["densebackbone"]):
+                densebb=torch.load(self.configs["densebackbone"], map_location=self.configs["device"])
+                densestate = densebb["model"]
+                if any(k.startswith("module.") for k in state.keys()):
+                    from collections import OrderedDict
+                    state = OrderedDict((k.replace("module.", "", 1), v) for k, v in densestate.items())
+                densemissing, denseunexpected = self.model.dense.load_state_dict(densestate, strict=False)
+                print("loaded dense backbone")
+                if densemissing:    print(f"Missing keys: {len(densemissing)} (showing first 5): {densemissing[:5]}")
+                if denseunexpected: print(f"Unexpected keys: {len(denseunexpected)} (first 5): {denseunexpected[:5]}")
+        self.test_dataset = None
+        self.testloader   = None
+        if configs.get("test_csv"):
+            self.test_dataset = CheXpertDataset(
+                zip_path=configs["zip_path"],
+                csv_path=configs["test_csv"],
+                root_dir=configs["datadir"],
+                augment=False,
+                use_frontal_only=True
+            )
+            self.testloader = DataLoader(
+                self.test_dataset,
+                batch_size=configs["batch_size"],
+                shuffle=False,
+                num_workers=0,
+                pin_memory=True,
+                persistent_workers=False
+            )
+            print(f"Test loader ready – {len(self.test_dataset)} images")
+            torch.backends.cudnn.benchmark = True
+            torch.backends.cudnn.enabled = True
+            # FIX: Set memory allocator settings
+            os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
+            # FIX: Enable gradient checkpointing if model supports it
+            if hasattr(self.model, 'enable_gradient_checkpointing'):
+                self.model.enable_gradient_checkpointing()
+    @staticmethod
+    def plot_training_metrics(metrics, epoch,figs_path):
+        import matplotlib.pyplot as plt
+        """
+        Plot loss and AUC curves from training metrics.
+        Args:
+            metrics (dict): Dictionary containing lists for each metric key:
+                {
+                    "train_loss": [...],
+                    "val_loss": [...],
+                    "train_macro_auc": [...],
+                    "val_macro_auc": [...],
+                    "train_micro_auc": [...],
+                    "val_micro_auc": [...]
+                }
+            epoch (int): Current epoch number (used for title or axis scaling)
+        """
+        epochs = list(range(1, epoch + 1))
+        #Compute the common length across all series
+        keys = ["train_loss","val_loss","train_macro_auc","val_macro_auc","train_micro_auc","val_micro_auc"]
+        lengths = [len(metrics[k]) for k in keys if k in metrics]
+        if not lengths:
+            return
+        n = min(lengths)
+        # Slice everything to the same length
+        m = {k: metrics[k][:n] for k in keys if k in metrics}
+        epochs = list(range(1, n + 1))
+        plt.figure(figsize=(14, 6))
+        # ---- Loss subplot ----
+        plt.subplot(1, 2, 1)
+        plt.plot(epochs, metrics["train_loss"], label="Train Loss", marker='o')
+        plt.plot(epochs, metrics["val_loss"], label="Val Loss", marker='s')
+        plt.xlabel("Epoch")
+        plt.ylabel("Loss")
+        plt.title("Training & Validation Loss")
+        plt.legend()
+        plt.grid(True, linestyle='--', alpha=0.6)
+        # ---- AUC subplot ----
+        plt.subplot(1, 2, 2)
+        plt.plot(epochs, metrics["train_macro_auc"], label="Train Macro AUC", marker='o')
+        plt.plot(epochs, metrics["val_macro_auc"], label="Val Macro AUC", marker='s')
+        plt.plot(epochs, metrics["train_micro_auc"], label="Train Micro AUC", marker='^')
+        plt.plot(epochs, metrics["val_micro_auc"], label="Val Micro AUC", marker='v')
+        plt.xlabel("Epoch")
+        plt.ylabel("AUC")
+        plt.title("Training & Validation AUC (Macro/Micro)")
+        plt.legend()
+        plt.grid(True, linestyle='--', alpha=0.6)
+        plt.tight_layout()
+        os.makedirs(os.path.join(figs_path,str(epoch)),exist_ok=True)
+        plt.savefig(os.path.join(figs_path,str(epoch),"metrics.png"))
+        plt.show()
+    def train_epoch(self, epoch, looper):
+        self.model.train()
+        running_loss = 0.0
+        all_preds = []
+        all_targets = []
+        total_batches = len(self.trainloader)
+        for batch_idx, data in looper:
+            image = data['image'].to(self.configs["device"], non_blocking=True)
+            target = data['labels'].to(self.configs["device"], non_blocking=True)
+            #with torch.autocast(device_type=self.configs["device"].type, dtype=torch.float16):
+            logits = self.model(image)
+            #with torch.autocast(device_type=self.configs["device"].type, enabled=False):
+            preds = torch.sigmoid(logits.float())
+            loss = self.criterion(preds, target)
+            loss_back = loss / self.configs["accumulation"]
+            running_loss += loss.item()
+            if torch.isfinite(loss):
+                loss_back.backward()
+                #self.scaler.scale(loss_back).backward()
+            else:
+                self.optimizer.zero_grad(set_to_none=True)
+                continue
+            if (batch_idx + 1) % self.configs["accumulation"] == 0 or batch_idx == total_batches - 1:
+                #self.scaler.unscale_(self.optimizer)
+                torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
+                #self.scaler.step(self.optimizer)
+                #self.scaler.update()
+                self.optimizer.step()
+                self.optimizer.zero_grad(set_to_none=True)
+            # Store for AUC
+            all_preds.append(preds.detach().cpu())
+            all_targets.append(target.detach().cpu())
+            # === LIVE METRICS (every batch) ===
+            current_loss = running_loss / (batch_idx + 1)
+            if (batch_idx + 1) % 500 == 0 and len(all_preds) > 0:
+                preds_np = torch.cat(all_preds).numpy()
+                targets_np = torch.cat(all_targets).numpy()
+                macro_auc = roc_auc_score(targets_np, preds_np, average='macro')
+                micro_auc = roc_auc_score(targets_np, preds_np, average='micro')
+                current_lr = self.optimizer.param_groups[0]['lr']
+                looper.set_postfix({
+                    "lr": f"{current_lr:.2e}","batch":f"{batch_idx}/{total_batches}",
+                    "epoch": f"{epoch}/{self.configs['num_epochs']}",
+                    "loss": f"{current_loss:.3f}",
+                    "macro": f"{macro_auc:.3f}",
+                    "micro": f"{micro_auc:.3f}"
+                })
+        # === FINAL FULL EPOCH METRICS ===
+        preds_full = torch.cat(all_preds).numpy()
+        targets_full = torch.cat(all_targets).numpy()
+        final_loss = running_loss / total_batches
+        final_macro_auc = roc_auc_score(targets_full, preds_full, average='macro')
+        final_micro_auc = roc_auc_score(targets_full, preds_full, average='micro')
+        del all_preds, all_targets, preds_full, targets_full
+        return final_loss, final_macro_auc, final_micro_auc
+    def validate(self, epoch, looper):
+        self.model.eval()
+        val_loss = 0.0
+        all_preds = []
+        all_targets = []
+        lenloader=len(self.valloader)
+        with torch.no_grad():
+            for batch_idx, data in looper:
+                image = data["image"].to(self.configs["device"], non_blocking=True)
+                target = data["labels"].to(self.configs["device"], non_blocking=True)
+                logits = self.model(image)
+                preds = torch.sigmoid(logits.float())
+                loss = self.criterion(preds, target)
+                val_loss += loss.item()
+                all_preds.append(preds.detach().cpu())
+                all_targets.append(target.detach().cpu())
+                # === LIVE METRICS ===
+                current_loss = val_loss / (batch_idx + 1)
+                if (batch_idx + 1) % 200 == 0 and len(all_preds) > 0:
+                    preds_np = torch.cat(all_preds).numpy()
+                    targets_np = torch.cat(all_targets).numpy()
+                    macro_auc = roc_auc_score(targets_np, preds_np, average='macro')
+                    micro_auc = roc_auc_score(targets_np, preds_np, average='micro')
+                    looper.set_postfix({
+                        "epoch": f"{epoch}/{self.configs['num_epochs']}",
+                        "batch":f"{batch_idx}/{lenloader}",
+                        "loss": f"{current_loss:.3f}",
+                        "macro": f"{macro_auc:.3f}",
+                        "micro": f"{micro_auc:.3f}"
+                    })
+        # === FINAL FULL VALIDATION METRICS ===
+        preds_full = torch.cat(all_preds).numpy()
+        targets_full = torch.cat(all_targets).numpy()
+        num_classes = 14
+        new_thresholds = [0.5] * num_classes  # default
+        for class_idx in range(num_classes):
+            if targets_full[:, class_idx].sum() == 0:
+                # no positive samples, keep default 0.5
+                continue
+            thresholds = np.arange(0.1, 0.9, 0.02)
+            best_score = -1
+            best_threshold = 0.5
+            for threshold in thresholds:
+                preds_bin = (preds_full[:, class_idx] >= threshold).astype(int)
+                tn, fp, fn, tp = confusion_matrix(
+                    targets_full[:, class_idx].astype(int),
+                    preds_bin
+                ).ravel()
+                sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
+                specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
+                score = sensitivity + specificity - 1
+                if score > best_score:
+                    best_score = score
+                    best_threshold = threshold
+            new_thresholds[class_idx] = best_threshold
+        # after loop:
+        self.optimal_thresholds = new_thresholds
+        final_loss = val_loss / lenloader
+        final_macro_auc = roc_auc_score(targets_full, preds_full, average='macro')
+        final_micro_auc = roc_auc_score(targets_full, preds_full, average='micro')
+        del all_preds, all_targets, preds_full, targets_full
+        return final_loss, final_macro_auc, final_micro_auc
+    def train(self):
+        for epoch in range(self.current_epoch,self.configs["num_epochs"]):
+            trainlooper=tqdm(enumerate(self.trainloader),desc="training: ", leave=True,file=self.traintee)
+            vallooper=tqdm(enumerate(self.valloader),desc="validating: ",leave=True,file=self.valtee)
+            self.model.train()
+            self.schedular.step()
+            self.optimizer.zero_grad(set_to_none=True)
+            running_loss,macro_auc,micro_auc=self.train_epoch(epoch,trainlooper)
+            torch.cuda.synchronize()
+            torch.cuda.empty_cache()
+            val_loss,val_macro_auc,val_micro_auc=self.validate(epoch,vallooper)
+            torch.cuda.synchronize()
+            torch.cuda.empty_cache()
+            gc.collect()
+            if (self.history["val_macro_auc"] and (val_macro_auc>max(self.history["val_macro_auc"]))) or (self.history["val_micro_auc"] and val_micro_auc>max(self.history["val_micro_auc"])):
+                checkpoint={"model":self.model.state_dict(),"optimizer":self.optimizer.state_dict(),"schedular":self.schedular.state_dict(),
+                            "schedular1":self.schedular1.state_dict(),"schedular2":self.schedular2.state_dict(),"scaler":self.scaler.state_dict(),"epoch":epoch
+                            ,"thresholds":self.optimal_thresholds }
+                torch.save(checkpoint, self.configs["resume"])
+            print(f"epoch {epoch} train loss {running_loss} val loss {val_loss} val_macro_auc {val_macro_auc} val_micro_auc {val_micro_auc} train_macro_auc {macro_auc} train_micro_auc {micro_auc}")
+            self.history["train_loss"].append(float(running_loss))
+            self.history["val_loss"].append(float(val_loss))
+            self.history["train_macro_auc"].append(float(macro_auc))
+            self.history["val_macro_auc"].append(float(val_macro_auc))
+            self.history["train_micro_auc"].append(float(micro_auc))
+            self.history["val_micro_auc"].append(float(val_micro_auc))
+            historyfile=os.path.join(self.configs["logdir"],"history.json")
+            if os.path.exists(historyfile):
+                with open(historyfile,"r") as f:
+                  history=json.load(f)
+                  history["train_loss"]+=self.history["train_loss"]
+                  history["val_loss"]+=self.history["val_loss"]
+                  history["train_macro_auc"]+=self.history["train_macro_auc"]
+                  history["val_macro_auc"]+=self.history["val_macro_auc"]
+            with open(historyfile,"w") as f:
+                json.dump(self.history,f)
+                f.close()
+            if epoch%10==0:Trainer.plot_training_metrics(self.history,epoch+1,self.configs["logdir"])
+            self.current_epoch=epoch
+    def test(self, model_path=None, return_preds=False):
+        """
+        Run a complete test evaluation.
+        If `model_path` is given, load that checkpoint first.
+        Returns (macro_auc, micro_auc, per_class_auc_dict) or predictions if requested.
+        """
+        if model_path:
+            ckpt = torch.load(model_path, map_location=self.configs["device"])
+            self.model.load_state_dict(ckpt["model"])
+            print(f"Loaded checkpoint {model_path}")
+        if self.testloader is None:
+            raise RuntimeError("No test loader – provide `test_csv` in config")
+        self.model.eval()
+        all_preds, all_targets = [], []
+        test_loss = 0.0
+        looper = tqdm(enumerate(self.testloader), total=len(self.testloader),
+                      desc="Testing ",file=self.testtee)
+        with torch.inference_mode():
+            for batch_idx, data in looper:
+                img = data['image'].to(self.configs["device"], non_blocking=True)
+                tgt = data['labels'].to(self.configs["device"], non_blocking=True)
+                #image_1ch=data['image_1ch'].to(self.configs["device"], non_blocking=True)
+                logits = self.model(img)
+                if self.optimal_thresholds:
+                    # class-wise thresholds in probability-space, e.g. list/array length C
+                    # self.optimal_thresholds[c] = tau_c
+                    taus = torch.tensor(self.optimal_thresholds, device=logits.device).view(1, -1)
+                    # convert thresholds from prob to logit
+                    margins = torch.log(taus / (1 - taus))   # shape [1, C]
+                    # shift logits by the margin
+                    # now BCEWithLogitsLoss thinks the decision boundary is at logits == margins
+                    # equivalently: decision boundary in original logits is at 'margins'
+                    logits = logits - margins
+                probs  = torch.sigmoid(logits)
+                loss   = self.criterion(probs, tgt)
+                test_loss += loss.item()
+                all_preds.append(probs.cpu())
+                all_targets.append(tgt.cpu())
+                # live stats
+                cur_loss = test_loss / (batch_idx + 1)
+                if all_preds:
+                    p = torch.cat(all_preds).numpy()
+                    t = torch.cat(all_targets).numpy()
+                    macro = roc_auc_score(t, p, average='macro')
+                    micro = roc_auc_score(t, p, average='micro')
+                else:
+                    macro = micro = 0.0
+                looper.set_postfix(loss=f"{cur_loss:.4f}",
+                                  macro=f"{macro:.4f}",
+                                  micro=f"{micro:.4f}")
+        # ---- final metrics ----
+        preds   = torch.cat(all_preds).numpy()
+        targets = torch.cat(all_targets).numpy()
+        final_loss = test_loss / len(self.testloader)
+        macro_auc  = roc_auc_score(targets, preds, average='macro')
+        micro_auc  = roc_auc_score(targets, preds, average='micro')
+        # per-class AUC
+        per_class = {}
+        for i, name in enumerate(self.train_dataset.get_label_names()):
+            if targets[:, i].sum() > 0:   # avoid division-by-zero
+                per_class[name] = roc_auc_score(targets[:, i], preds[:, i])
+            else:
+                per_class[name] = float('nan')
+        # ---- pretty table ----
+        print("\n" + "="*80)
+        print(f"TEST RESULTS (loss={final_loss:.4f})")
+        print("="*80)
+        print(f"{'Pathology':<30} {'AUC':>8}")
+        print("-"*40)
+        for name, auc in per_class.items():
+            print(f"{name:<30} {auc:>8.4f}" if not np.isnan(auc) else f"{name:<30} {'N/A':>8}")
+        print("-"*40)
+        print(f"{'Macro AUC':<30} {macro_auc:>8.4f}")
+        print(f"{'Micro AUC':<30} {micro_auc:>8.4f}")
+        print("="*80)
+        if return_preds:
+            return macro_auc, micro_auc, per_class, (preds, targets)
+        return macro_auc, micro_auc, per_class

training logs/classifier/1/metrics.png ADDED Viewed

training logs/classifier/11/metrics.png ADDED Viewed

Git LFS Details

SHA256: 8209bfa73deef4eb2ada87a881bd5aec71579834c0a0df9ffea883be3d3a7cd8
Pointer size: 131 Bytes
Size of remote file: 103 kB

training logs/classifier/Events.docx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c683f158db1946053b66c0a5769962a8d51bad98058490cfa4aedba5582f45d
+size 1680108

training logs/classifier/history.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"train_loss": [2.451549026515934, 2.324592100605649, 2.2527450666496867, 2.2051324946319886, 2.159476125092837, 2.1111394616786736, 2.057536503908261, 1.9841906148749953, 1.9176961825764776, 1.8619107825900996, 1.7461218646035648, 1.6598046678827294, 0.14859689984745142, 0.14480384502754282, 0.1413286671667666, 0.13848335853005814], "val_loss": [1.4677110827817452, 1.414012653402026, 1.3714177432875805, 1.2308028351501579, 1.3173101589027862, 1.327703529975834, 1.3617598478416082, 1.3191542980376254, 1.2529189027799352, 1.5297267407960213, 1.5704542597879037, 1.768142464009117, 0.1340647471810548, 0.13415449232644355, 0.13759738845883238, 0.13678586165817191], "train_macro_auc": [0.5915813508746322, 0.6911077814868211, 0.7180677573787114, 0.7330628389066268, 0.7446141059156149, 0.7555600431027648, 0.7611192999386304, 0.768528268418801, 0.7723525292111644, 0.7753779303902529, 0.7868138491223837, 0.7944627804675177, 0.7901445091432617, 0.8022003143300758, 0.815119789621127, 0.8244893337730623], "val_macro_auc": [0.6697306681955048, 0.7073059079688858, 0.7307953052152676, 0.7387044904612824, 0.7454194006038561, 0.7500732498762482, 0.7486698915393023, 0.7534324811456612, 0.7528406138149186, 0.7499375597482462, 0.7467580969017149, 0.7441320142787182, 0.7530519050114038, 0.7533548440220946, 0.749456570265707, 0.7485324814519589], "train_micro_auc": [0.7215386383447403, 0.7813286014278719, 0.7966446631205264, 0.8059628819522804, 0.8134203450690528, 0.8203666769222709, 0.8240765555633897, 0.8301711223679948, 0.833068249934892, 0.8355527627001551, 0.8440327424520534, 0.8495436085077932, 0.8464713300561717, 0.8555554974362551, 0.8642513599137882, 0.8715251218243751], "val_micro_auc": [0.830096540126698, 0.8465147824166648, 0.85730031114022, 0.8612904284897909, 0.8638291274014742, 0.8621590439819964, 0.8642181574740424, 0.8699097719546053, 0.8705562840474825, 0.8696004009592124, 0.8691057071199912, 0.867407803196298, 0.869385951883187, 0.8707534276030608, 0.8618755289952147, 0.8671620953710033]}

training logs/classifier/test_log.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

training logs/classifier/training_log.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

training logs/classifier/val_log.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

training logs/mae/1/metrics.png ADDED Viewed

training logs/mae/101/metrics.png ADDED Viewed

training logs/mae/11/metrics.png ADDED Viewed