Model Card for AdaLoRA-QAT
AdaLoRA-QAT is an efficient, compact foundation model variant designed for accurate chest X-ray (CXR) lung segmentation.It adapts the Segment Anything Model (SAM) to meet strict clinical computational constraints by combining adaptive low-rank parameter fine-tuning with quantization-aware training.
Model Details
Model Description
AdaLoRA-QAT introduces a two-stage fine-tuning framework for medical image segmentation. Stage 1 utilizes Adaptive Low-Rank Adaptation (AdaLoRA) to dynamically allocate rank capacity to task-relevant transformer layers in full precision. Stage 2 implements full-model quantization-aware fine-tuning (QAT) using a selective mixed-precision strategy, achieving INT8 precision for select layers while preserving fine structural fidelity.
- Developed by: Prantik Deb, Srimanth Dhondy, N. Ramakrishna, Anu Kapoor, Raju S. Bapi, Tapabrata Chakraborti.
- Funded by: IHub-Data, International Institute of Information Technology Hyderabad. Tapabrata Chakraborti is supported by the Turing-Roche Strategic Partnership and the UCL NIHR Biomedical Research Center.
- Model type: Parameter-Efficient Fine-Tuned (PEFT) Foundation Model for Chest X-ray Segmentation.
- License: MIT License
- Finetuned from model: Segment Anything Model (SAM).
Model Sources
- Repository: https://prantik-pdeb.github.io/adaloraqat.github.io/
- Paper: ADALORA-QAT: ADAPTIVE LOW RANK AND QUANTIZATION AWARE SEGMENTATION.
Uses
Direct Use
- Accurate lung field segmentation for isolating pulmonary parenchyma.
- Enhancing abnormality visibility and enabling quantitative analysis in chest radiographs.
- Improving the reliability of computer-aided diagnosis (CAD) systems.
- Enabling deployable foundation models on resource-constrained clinical hardware.
Bias, Risks, and Limitations
- Robust generalization across deep learning models remains challenging due to anatomical variability.
- Generalization is also challenged by pathological distortions and imaging artifacts.
- The Structural Similarity Index (ASSIM) map indicates minor degradations primarily associated with severe motion artifacts or extreme pathologies.
Training Details
Training Data
- The model was trained on 64,590 chest X-rays spanning diverse thoracic pathologies.
- The data sources include JSRT, QaTa-COV19, COVID-19 Radiography, Chest X-Ray Pneumothorax, and COVID-QU-Ex datasets.
Training Procedure
- The model utilizes a unified two-stage framework coupling adaptive low-rank encoder tuning with full model quantization-aware fine-tuning.
- Stage 1 learns adaptive and orthogonal low-rank subspaces in full precision (FP32).
- Stage 1 prunes redundant components to identify an efficient task-specific parameter space.
- Stage 2 performs full-model quantization-aware fine-tuning while freezing rank masks.
Training Hyperparameters
- Training regime: Stage 1 uses FP32 precision. Stage 2 uses a selective mixed-precision strategy. Encoder feed-forward layers, the decoder, and the prompt encoder are quantized to INT8. Attention QKV projections and AdaLoRA parameters (P, Q, A) remain in FP32.
- Batch Size: 16 during Stage 1.
- Learning Rates: In Stage 1, 5e-5 for the encoder and 2e-5 for the decoder.In Stage 2, singular values are fine-tuned at 1e-6.
Speeds, Sizes, Times
- The model yields a 2.24x model compression compared to base-SAM fine-tuning.
- Trainable parameters are reduced by 16.6x, down to 5.4M.
Evaluation
Testing Data, Factors & Metrics
Testing Data
- The 64,590 CXR dataset was divided using an 80:10:10 split for experiments.
Metrics
- Dice Score (DSC).
- Intersection over Union (IOU).
- Normalized Surface Distance (NSD).
- Structural Similarity Index (SSIM) to evaluate structural agreement and localized improvements.
- Wilcoxon signed-rank test for statistical significance assessment.
Results
- AdaLoRA-QAT achieves a 95.6% Dice score.
- The model matches full-precision SAM decoder fine-tuning accuracy.
- Statistical analysis confirms that full INT8 quantization preserves segmentation accuracy without significant degradation.
- SSIM analysis exhibits strong structural agreement along lung boundaries and vascular regions.
Summary
AdaLoRA-QAT effectively balances accuracy, efficiency, and structural trustworthiness. It establishes a proof of concept for substantially compressing foundation models for scalable AI-assisted diagnosis without compromising diagnostic accuracy.
Model Examination
- Quantization error analysis shows that FP32-INT8 quantization noise follows an approximately zero-mean Gaussian distribution.
- There is a strong linear correlation between FP32 and INT8 weights.
- Errors are uniformly distributed across weight magnitudes, confirming preserved numerical fidelity under low-bit quantization.
Model Architecture and Objective
- The architecture is based on the Segment Anything Model (SAM) incorporating an Image Encoder (ViT), Prompt Encoder, and Mask Decoder.
- It uses Adaptive Low-Rank Adaptation (AdaLoRA) where the vision encoder rank is reduced from 48 to 32 via importance-based pruning.
Hardware
- NVIDIA RTX A6000 GPUs (48 GB).
Model Card Authors
Prantik Deb, Srimanth Dhondy, N. Ramakrishna, Anu Kapoor, Raju S. Bapi, Tapabrata Chakraborti.