zhouyuxuanyx commited on
Commit
09f809b
·
verified ·
1 Parent(s): fe05bae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +194 -3
README.md CHANGED
@@ -1,3 +1,194 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # MaxSup: Overcoming Representation Collapse in Label Smoothing
6
+
7
+ **Max Suppression (MaxSup)** is a novel regularization technique that overcomes the shortcomings of traditional **Label Smoothing (LS)**. While LS prevents overconfidence by softening one-hot labels, it inadvertently collapses intra-class feature diversity and can boost overconfident errors. In contrast, **MaxSup** applies a uniform smoothing penalty to the model’s top prediction—regardless of correctness—preserving richer per-sample information and improving both classification performance and downstream transfer.
8
+
9
+ ---
10
+
11
+ ## Table of Contents
12
+
13
+ 1. [Overview](#overview)
14
+ 2. [Methodology: MaxSup vs. Label Smoothing](#methodology-maxsup-vs-label-smoothing)
15
+ 3. [Enhanced Feature Representation](#enhanced-feature-representation)
16
+ - [Qualitative Evaluation](#qualitative-evaluation)
17
+ - [Quantitative Evaluation](#quantitative-evaluation)
18
+ 4. [Training Vision Transformers with MaxSup](#training-vision-transformers-with-maxsup)
19
+ - [Accelerated Data Loading via Caching (Optional)](#accelerated-data-loading-via-caching-optional)
20
+ - [Preparing Data and Annotations for Caching](#preparing-data-and-annotations-for-caching)
21
+ 5. [Pretrained Weights](#pretrained-weights)
22
+ 6. [Training ConvNets with MaxSup](#training-convnets-with-maxsup)
23
+ 7. [Logit Characteristic Visualization](#logit-characteristic-visualization)
24
+ 8. [Citation](#citation)
25
+ 9. [References](#references)
26
+
27
+ ---
28
+
29
+ ## Overview
30
+
31
+ Traditional Label Smoothing (LS) replaces one-hot labels with a smoothed version to reduce overconfidence. However, LS can over-tighten feature clusters within each class and may reinforce errors by making mispredictions overconfident. **MaxSup** tackles these issues by applying a smoothing penalty to the model's **top-1 logit** output regardless of whether the prediction is correct, thus preserving intra-class diversity and enhancing inter-class separation. The result is improved performance on both classification tasks and downstream applications such as linear transfer and image segmentation.
32
+
33
+ ---
34
+
35
+ ## Methodology: MaxSup vs. Label Smoothing
36
+
37
+ Label Smoothing softens the target distribution by blending the one-hot vector with a uniform distribution. Although effective at reducing overconfidence, LS inadvertently introduces two effects:
38
+ - A **regularization term** that limits the sharpness of predictions.
39
+ - An **error-enhancement term** that can cause overconfident wrong predictions.
40
+
41
+ **MaxSup** addresses this by uniformly penalizing the highest logit output, whether it corresponds to the true class or not. This approach enforces a consistent regularization effect across all samples. In formula form:
42
+
43
+ ```math
44
+ L_{\text{MaxSup}} = \alpha \left( z_{\max} - \frac{1}{K}\sum_{k=1}^{K} z_k \right),
45
+ ```
46
+
47
+ where \( z_{\max} \) is the highest logit among the \( K \) classes. This mechanism prevents the prediction distribution from becoming too peaky while preserving informative signals from non-target classes.
48
+
49
+ ---
50
+
51
+ ## Enhanced Feature Representation
52
+
53
+ ### Qualitative Evaluation
54
+
55
+ MaxSup-trained models display richer intra-class feature diversity compared to models trained with traditional LS. Feature embedding visualizations show that while LS forces features into tight clusters, MaxSup preserves finer-grained differences among samples. Grad-CAM analyses also demonstrate that MaxSup-trained models focus more precisely on relevant class-discriminative regions.
56
+
57
+ ![Improved Feature Representation](Improved_Feature.png)
58
+ **Figure 1:** Feature representations. MaxSup maintains greater intra-class diversity and clear inter-class boundaries.
59
+
60
+ ![Grad-CAM Analysis](gradcam.png)
61
+ **Figure 2:** Grad-CAM visualizations. The MaxSup model (row 2) accurately highlights target objects, whereas the LS model (row 3) and Baseline (row 4) show more diffuse activations.
62
+
63
+ ### Quantitative Evaluation
64
+
65
+ We evaluated feature representations on ResNet-50 trained on ImageNet-1K. Intra-class variation (reflecting the diversity within classes) and inter-class separability (indicating class distinctiveness) were measured. Additionally, a linear transfer learning task on CIFAR-10 was performed.
66
+
67
+ **Table 1: Feature Representation Metrics (ResNet-50 on ImageNet-1K)**
68
+
69
+ | Method | Intra-class Var. (Train) | Intra-class Var. (Val) | Inter-class Sep. (Train) | Inter-class Sep. (Val) |
70
+ |---------------------------|--------------------------|------------------------|--------------------------|------------------------|
71
+ | **Baseline** | 0.3114 | 0.3313 | 0.4025 | 0.4451 |
72
+ | **Label Smoothing** | 0.2632 | 0.2543 | 0.4690 | 0.4611 |
73
+ | **Online LS** | 0.2707 | 0.2820 | 0.5943 | 0.5708 |
74
+ | **Zipf’s LS** | 0.2611 | 0.2932 | 0.5522 | 0.4790 |
75
+ | **MaxSup (ours)** | **0.2926** | **0.2998** | 0.5188 | 0.4972 |
76
+
77
+ *Higher intra-class variation indicates more preserved sample-specific details, while higher inter-class separability suggests better class discrimination.*
78
+
79
+ **Table 2: Linear Transfer Accuracy on CIFAR-10**
80
+
81
+ | Pretraining Method | Accuracy (%) |
82
+ |----------------------|--------------|
83
+ | **Baseline** | 81.43 |
84
+ | **Label Smoothing** | 74.58 |
85
+ | **MaxSup** | **81.02** |
86
+
87
+ Label Smoothing degrades transfer accuracy due to its over-smoothing effect, whereas MaxSup nearly matches the baseline performance while still offering improved calibration.
88
+
89
+ ---
90
+
91
+ ## Training Vision Transformers with MaxSup
92
+
93
+ We integrated MaxSup into the training pipeline for Vision Transformers using the [DeiT](https://github.com/facebookresearch/deit) framework.
94
+
95
+ ### To Train a ViT with MaxSup:
96
+
97
+ ```bash
98
+ cd Deit
99
+ python train_with_MaxSup.sh
100
+ ```
101
+
102
+ This script trains a DeiT-Small model on ImageNet-1K with MaxSup regularization.
103
+
104
+ ### Accelerated Data Loading via Caching (Optional)
105
+
106
+ For improved data loading efficiency on systems with slow I/O, a caching mechanism is provided. This feature compresses the ImageNet dataset into ZIP files and loads them into memory. Enable caching by adding the `--cache` flag to the training script.
107
+
108
+ ### Preparing Data and Annotations for Caching
109
+
110
+ 1. **Create ZIP Archives:**
111
+ In your ImageNet data directory, run:
112
+ ```bash
113
+ cd data/ImageNet
114
+ zip -r train.zip train
115
+ zip -r val.zip val
116
+ ```
117
+
118
+ 2. **Mapping Files:**
119
+ Download `train_map.txt` and `val_map.txt` from our release assets and place them in the `data/ImageNet` directory. The directory should appear as follows:
120
+ ```
121
+ data/ImageNet/
122
+ ├── train_map.txt # Relative paths and labels for training images
123
+ ├── val_map.txt # Relative paths and labels for validation images
124
+ ├── train.zip # Compressed training images
125
+ └── val.zip # Compressed validation images
126
+ ```
127
+ - **train_map.txt:** Each line should be in the format `<class_folder>/<image_filename>\t<label>`.
128
+ - **val_map.txt:** Each line should be in the format `<image_filename>\t<label>`.
129
+
130
+ ---
131
+
132
+ ## Pretrained Weights
133
+
134
+ - **ConvNet (ResNet-50):** Pretrained weights can be downloaded from this page.
135
+
136
+ These checkpoints can be used for direct evaluation or fine-tuning on downstream tasks.
137
+
138
+ ---
139
+
140
+ ## Training ConvNets with MaxSup
141
+
142
+ The `Conv/` directory provides scripts for training convolutional networks with MaxSup:
143
+
144
+ - **Conv/ffcv:** Contains scripts to reproduce ImageNet results using FFCV for efficient data loading. See `Conv/ffcv/README.md` for details.
145
+ - **Conv/common_resnet:** Contains additional experiments with ResNet architectures. Refer to `Conv/common_resnet/README.md` for further instructions.
146
+
147
+ ---
148
+
149
+ ## Logit Characteristic Visualization
150
+
151
+ The `viz/` directory contains a toolkit to analyze the distribution of logits produced by models trained with LS versus MaxSup.
152
+
153
+ ### Step 1: Extract Logits
154
+
155
+ Run the following command to extract logits from your trained model:
156
+
157
+ ```bash
158
+ python viz/logits.py \
159
+ --checkpoint /path/to/model_checkpoint.pth \
160
+ --output /path/to/save/logits_labels.pt
161
+ ```
162
+
163
+ - `--checkpoint`: Path to your model checkpoint.
164
+ - `--output`: Destination file for the extracted logits and labels.
165
+
166
+ ### Step 2: Analyze Logits
167
+
168
+ After extraction, run:
169
+
170
+ ```bash
171
+ python viz/analysis.py --input /path/to/save/logits_labels.pt --output /path/to/analysis_results/
172
+ ```
173
+
174
+ This script generates:
175
+ - A histogram of near-zero logit proportions.
176
+ - A scatter plot comparing top-1 probabilities with near-zero proportions.
177
+ - Saved visualizations for side-by-side comparisons.
178
+
179
+ ![Logit Visualization](logit.png)
180
+ **Figure 3:** Logit distribution comparing LS and MaxSup.
181
+
182
+ ## References
183
+ - **DeiT (Vision Transformer):**
184
+ Touvron et al., *Training Data-Efficient Image Transformers & Distillation through Attention*, ICML 2021. [GitHub](https://github.com/facebookresearch/deit).
185
+ - **Grad-CAM:**
186
+ Selvaraju et al., *Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization*, ICCV 2017.
187
+ - **Online Label Smoothing:** See paper for details.
188
+ - **Zipf’s Label Smoothing:** See paper for details.
189
+
190
+ ---
191
+
192
+ This repository provides the official implementation of MaxSup. Contributions and discussions are welcome. For any questions or issues, please open an issue on GitHub or contact the authors directly.
193
+
194
+ ---