lemousehunter commited on
Commit
a65fa2d
·
verified ·
1 Parent(s): f8cc89b

Upload analysis/findings/verified_citations.md with huggingface_hub

Browse files
analysis/findings/verified_citations.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Verified Citations
2
+
3
+ ## Wu et al. (2024) - Multi-bit SCA
4
+ - **VERIFIED**: Table 3 shows TGE0 (traces to guessing entropy 0) for ASCAD_F dataset
5
+ - **VERIFIED**: k12 of ASCAD_F recovered by MMB at 218 traces (line 1027)
6
+ - **VERIFIED**: MB recovered k12 at 58 traces (line 973)
7
+ - **VERIFIED**: "none of the pre-defined leakage models could recover the k12 of ASCAD_F, but MMB and MB demonstrated remarkable capabilities by recovering it with just 218 and 58 traces" (lines 972-974)
8
+ - **NOTE**: The paper uses ASCAD_F (fixed key), not ASCADv1. Need to clarify in report.
9
+ - **NOTE**: The paper also says "multi-task learning helps generalize each task, leading to robust attack performance" (lines 1239-1240)
10
+
11
+ ## Marquet & Oswald (2023) - Multi-task vs Single-task SCA
12
+ - **NOTE**: This paper (eprint 2023/006) is actually "Exploring Multi-Task Learning in the Context of Masked AES Implementations"
13
+ - **NOTE**: They do NOT use "16 separate CNNs" — they compare multi-task vs single-task on ASCAD-r, ASCAD-v2, CHESCTF-2023
14
+ - **NOTE**: They do NOT work on ASCADv1 (the fixed-key dataset we use). They work on ASCAD-r and ASCAD-v2 (masked implementations)
15
+ - **WRONG CLAIM IN REPORT**: "Marquet & Oswald achieved full key recovery on ASCADv1 using 16 separate CNNs" — INCORRECT DATASET
16
+ - **ACTUAL FINDING**: They show multi-task learning is superior to single-task in masked implementations
17
+ - **ACTUAL FINDING**: Low-level parameter sharing (md) consistently outperforms baseline multi-task and single-task
18
+ - **ACTUAL FINDING**: "No seed allowed the single-task models to recover all bytes" (line 681)
19
+ - **ACTUAL FINDING**: Multi-task learning breaks through initial plateau more consistently
20
+ - **RELEVANT QUOTE**: "multi-task learning is a natural improvement of single-task learning in a scenario where the knowledge of randomness cannot be accessed" (line 885-886)
21
+
22
+ ## Suteu & Serban (2019) - Cited for TSBN
23
+ - **WRONG CITATION**: Paper title is "Regularizing Deep Multi-Task Networks using Orthogonal Gradients" (arXiv:1912.06844)
24
+ - This paper is about orthogonal gradient regularization, NOT task-specific batch normalization
25
+ - Need to find the actual TSBN reference
26
+
27
+ ## Pezeshki et al. (2021) - Cited for gradient interference
28
+ - Paper title: "Gradient Starvation: A Learning Priors Problem" (NeurIPS 2021)
29
+ - This is about gradient starvation (some features dominate learning), not gradient interference per se
30
+ - Still relevant but citation context needs to be more precise
31
+
32
+ ## Chen et al. (2018) - GradNorm
33
+ - Need to verify: "Section 5.4 high alpha causes weight collapse"
34
+ - Need to verify the exact GradNorm formula
35
+
36
+ ## Liu et al. (2019) - MTAN
37
+ - Paper: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
38
+ - Need to verify it exists and describes soft attention masks
39
+
40
+ ## Guo et al. (2018) - DTP
41
+ - Paper: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
42
+ - Need to verify the DTP formula matches
43
+
44
+ ## Chen et al. (2018) - GradNorm - VERIFIED
45
+ - **Paper exists**: ICML 2018, pages 793-802
46
+ - **Section 5.4 EXISTS**: "Effects of tuning the asymmetry α"
47
+ - **DOES NOT say "weight collapse"** — says: "at α=1.75 (not shown) w_depth(t) is suppressed to below 0.02 at no detriment to network performance"
48
+ - **Supplementary Section 7.1**: "No points past α > 2 are shown for the VGG16 backbone as GradNorm weights are unstable past this point"
49
+ - **Footnote 3**: "At large positive values of α, which in the NYUv2 case corresponded to α ≥ 3, some weights were pushed too close to zero and GradNorm updates became unstable."
50
+ - **CORRECTION NEEDED**: Report says "Section 5.4 high α causes weight collapse" — should say "Section 5.4 and Supplementary Section 7.1 show that high α pushes weights to near-zero and causes instability (α ≥ 3 for NYUv2)"
51
+ - **GradNorm formula**: Need to verify exact formula from paper
52
+
53
+ ## Suteu & Serban (2019) - NEED TO FIND ACTUAL TSBN REFERENCE
54
+ - The Suteu & Serban paper (arXiv:1912.06844) is about orthogonal gradients, NOT TSBN
55
+ - Need to find the actual paper that proposes task-specific batch normalization
56
+ - Likely: Bilen & Vedaldi (2017) "Integrated Perception with Recurrent Multi-Task Neural Networks" or similar
57
+
58
+ ## Suteu & Serban (2019) - VERIFIED
59
+ - **Paper**: "Regularizing Deep Multi-Task Networks using Orthogonal Gradients" (arXiv:1912.06844)
60
+ - **Content**: Proposes orthogonal gradient regularization to reduce task interference
61
+ - **NOT about TSBN** — this paper is about gradient orthogonality, not batch normalization
62
+ - **IMPORTANT**: Suteu has a 2025 paper (arXiv:2512.20420) "Simplifying Multi-Task Architectures Through Task-Specific Normalization" which proposes TSσBN (Task-Specific Sigmoid Batch Normalization)
63
+ - **CORRECTION**: If we cite Suteu for TSBN, we should cite the 2025 paper, not the 2019 one
64
+ - **HOWEVER**: Our TSBN may have been inspired by the general concept of task-specific BN from Rebuffi et al. (2017) "Learning multiple visual domains with residual adapters" (NeurIPS 2017, 1293 citations) which uses domain-specific BN layers
65
+
66
+ ## Actual TSBN lineage:
67
+ 1. Rebuffi, Bilen & Vedaldi (2017) - domain-specific BN layers for multi-domain learning
68
+ 2. Suteu & Serban (2019) - orthogonal gradients (different paper, same first author)
69
+ 3. Suteu et al. (2025) - TSσBN (task-specific sigmoid batch normalization)
70
+ - Our TSBN concept is closest to Rebuffi et al. (2017) or the general MTL literature on task-specific BN
71
+
72
+ ## Pezeshki et al. (2021) - VERIFIED
73
+ - **Paper**: "Gradient Starvation: A Learning Proclivity in Neural Networks" (NeurIPS 2021)
74
+ - **NOT "gradient interference"** — the paper is about gradient starvation
75
+ - **Definition**: "Gradient starvation occurs when the presence of easy-to-learn features in a dataset prevents the learning of other equally informative features"
76
+ - **Cited by**: 423 (highly cited)
77
+ - **CORRECTION NEEDED**: Report should cite this for "gradient starvation" not "gradient interference"
78
+ - **Relevance**: Directly relevant to our HPS failure where bytes 0,1 (easy features) dominated learning and starved bytes 2-15
79
+
80
+ ## Liu et al. (2019) - MTAN - NEED TO VERIFY
81
+ - Need to check: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
82
+
83
+ ## Guo et al. (2018) - DTP - NEED TO VERIFY
84
+ - Need to check: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
85
+
86
+ ## Kerkhof et al. - NEED TO VERIFY
87
+ - Need to check the spectral decoupling reference
88
+
89
+ ## Liu et al. (2019) - MTAN - VERIFIED
90
+ - **Paper**: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
91
+ - **Content**: Proposes Multi-Task Attention Network (MTAN) with soft-attention modules for task-specific feature learning from global features
92
+ - **Quote**: "consists of a single shared network containing a global feature pool, together with a soft-attention module for each task"
93
+ - **Code**: https://github.com/lorenmt/mtan
94
+
95
+ ## Guo et al. (2018) - DTP - VERIFIED
96
+ - **Paper**: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
97
+ - **Content**: Proposes dynamic task prioritization where difficulty is inversely proportional to performance
98
+ - **Quote**: "imbalances in task difficulty can lead to unnecessary emphasis on easier tasks, thus neglecting and slowing progress on difficult tasks"
99
+ - **Key insight**: "prioritizing difficult tasks first" — contrast to curriculum learning
100
+
101
+ ## Spectral Decoupling - VERIFIED
102
+ - **Paper**: Pohjonen et al. (2022), "Spectral decoupling for training transferable neural networks in medical imaging" (iScience)
103
+ - **Original**: arXiv:2103.17171 (March 2021)
104
+ - **Content**: "encourages the neural network to learn more features by simply regularising the networks' unnormalised prediction scores with an L2 penalty"
105
+ - **Key finding**: "increases the networks' robustness for data distribution shifts and prevents overfitting on easy-to-learn features"
106
+ - **NOT by Kerkhof** — Kerkhof's paper is about focal loss for SCA (eprint 2021/1408)
107
+ - **CORRECTION NEEDED**: The spectral decoupling reference should cite Pohjonen et al. (2022), not Kerkhof
108
+
109
+ ## Kerkhof et al. (2021) - Focal Loss for SCA - VERIFIED
110
+ - **Paper**: "A Focal Loss Function for Deep Learning-based Side-channel Analysis" (ePrint 2021/1408)
111
+ - **Content**: Proposes focal loss adapted for SCA context
112
+ - **NOT about spectral decoupling**
113
+
114
+ ## FULL CITATION CORRECTION TABLE:
115
+ | Report Claim | Current Citation | Correct Citation |
116
+ |---|---|---|
117
+ | TSBN | Suteu & Serban (2019) | Rebuffi et al. (2017) or Suteu et al. (2025) |
118
+ | Gradient interference | Pezeshki et al. (2021) | Pezeshki et al. (2021) — but should say "gradient starvation" not "interference" |
119
+ | High alpha weight collapse | Chen et al. (2018) Sec 5.4 | Chen et al. (2018) Sec 5.4 + Supplementary 7.1 — but "instability" not "collapse" |
120
+ | Spectral decoupling | Kerkhof et al. | Pohjonen et al. (2022) / arXiv:2103.17171 |
121
+ | Marquet & Oswald full key recovery on ASCADv1 | Wrong dataset | They work on ASCAD-r and ASCAD-v2, NOT ASCADv1 |