Upload analysis/findings/verified_citations.md with huggingface_hub
Browse files
analysis/findings/verified_citations.md
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Verified Citations
|
| 2 |
+
|
| 3 |
+
## Wu et al. (2024) - Multi-bit SCA
|
| 4 |
+
- **VERIFIED**: Table 3 shows TGE0 (traces to guessing entropy 0) for ASCAD_F dataset
|
| 5 |
+
- **VERIFIED**: k12 of ASCAD_F recovered by MMB at 218 traces (line 1027)
|
| 6 |
+
- **VERIFIED**: MB recovered k12 at 58 traces (line 973)
|
| 7 |
+
- **VERIFIED**: "none of the pre-defined leakage models could recover the k12 of ASCAD_F, but MMB and MB demonstrated remarkable capabilities by recovering it with just 218 and 58 traces" (lines 972-974)
|
| 8 |
+
- **NOTE**: The paper uses ASCAD_F (fixed key), not ASCADv1. Need to clarify in report.
|
| 9 |
+
- **NOTE**: The paper also says "multi-task learning helps generalize each task, leading to robust attack performance" (lines 1239-1240)
|
| 10 |
+
|
| 11 |
+
## Marquet & Oswald (2023) - Multi-task vs Single-task SCA
|
| 12 |
+
- **NOTE**: This paper (eprint 2023/006) is actually "Exploring Multi-Task Learning in the Context of Masked AES Implementations"
|
| 13 |
+
- **NOTE**: They do NOT use "16 separate CNNs" — they compare multi-task vs single-task on ASCAD-r, ASCAD-v2, CHESCTF-2023
|
| 14 |
+
- **NOTE**: They do NOT work on ASCADv1 (the fixed-key dataset we use). They work on ASCAD-r and ASCAD-v2 (masked implementations)
|
| 15 |
+
- **WRONG CLAIM IN REPORT**: "Marquet & Oswald achieved full key recovery on ASCADv1 using 16 separate CNNs" — INCORRECT DATASET
|
| 16 |
+
- **ACTUAL FINDING**: They show multi-task learning is superior to single-task in masked implementations
|
| 17 |
+
- **ACTUAL FINDING**: Low-level parameter sharing (md) consistently outperforms baseline multi-task and single-task
|
| 18 |
+
- **ACTUAL FINDING**: "No seed allowed the single-task models to recover all bytes" (line 681)
|
| 19 |
+
- **ACTUAL FINDING**: Multi-task learning breaks through initial plateau more consistently
|
| 20 |
+
- **RELEVANT QUOTE**: "multi-task learning is a natural improvement of single-task learning in a scenario where the knowledge of randomness cannot be accessed" (line 885-886)
|
| 21 |
+
|
| 22 |
+
## Suteu & Serban (2019) - Cited for TSBN
|
| 23 |
+
- **WRONG CITATION**: Paper title is "Regularizing Deep Multi-Task Networks using Orthogonal Gradients" (arXiv:1912.06844)
|
| 24 |
+
- This paper is about orthogonal gradient regularization, NOT task-specific batch normalization
|
| 25 |
+
- Need to find the actual TSBN reference
|
| 26 |
+
|
| 27 |
+
## Pezeshki et al. (2021) - Cited for gradient interference
|
| 28 |
+
- Paper title: "Gradient Starvation: A Learning Priors Problem" (NeurIPS 2021)
|
| 29 |
+
- This is about gradient starvation (some features dominate learning), not gradient interference per se
|
| 30 |
+
- Still relevant but citation context needs to be more precise
|
| 31 |
+
|
| 32 |
+
## Chen et al. (2018) - GradNorm
|
| 33 |
+
- Need to verify: "Section 5.4 high alpha causes weight collapse"
|
| 34 |
+
- Need to verify the exact GradNorm formula
|
| 35 |
+
|
| 36 |
+
## Liu et al. (2019) - MTAN
|
| 37 |
+
- Paper: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
|
| 38 |
+
- Need to verify it exists and describes soft attention masks
|
| 39 |
+
|
| 40 |
+
## Guo et al. (2018) - DTP
|
| 41 |
+
- Paper: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
|
| 42 |
+
- Need to verify the DTP formula matches
|
| 43 |
+
|
| 44 |
+
## Chen et al. (2018) - GradNorm - VERIFIED
|
| 45 |
+
- **Paper exists**: ICML 2018, pages 793-802
|
| 46 |
+
- **Section 5.4 EXISTS**: "Effects of tuning the asymmetry α"
|
| 47 |
+
- **DOES NOT say "weight collapse"** — says: "at α=1.75 (not shown) w_depth(t) is suppressed to below 0.02 at no detriment to network performance"
|
| 48 |
+
- **Supplementary Section 7.1**: "No points past α > 2 are shown for the VGG16 backbone as GradNorm weights are unstable past this point"
|
| 49 |
+
- **Footnote 3**: "At large positive values of α, which in the NYUv2 case corresponded to α ≥ 3, some weights were pushed too close to zero and GradNorm updates became unstable."
|
| 50 |
+
- **CORRECTION NEEDED**: Report says "Section 5.4 high α causes weight collapse" — should say "Section 5.4 and Supplementary Section 7.1 show that high α pushes weights to near-zero and causes instability (α ≥ 3 for NYUv2)"
|
| 51 |
+
- **GradNorm formula**: Need to verify exact formula from paper
|
| 52 |
+
|
| 53 |
+
## Suteu & Serban (2019) - NEED TO FIND ACTUAL TSBN REFERENCE
|
| 54 |
+
- The Suteu & Serban paper (arXiv:1912.06844) is about orthogonal gradients, NOT TSBN
|
| 55 |
+
- Need to find the actual paper that proposes task-specific batch normalization
|
| 56 |
+
- Likely: Bilen & Vedaldi (2017) "Integrated Perception with Recurrent Multi-Task Neural Networks" or similar
|
| 57 |
+
|
| 58 |
+
## Suteu & Serban (2019) - VERIFIED
|
| 59 |
+
- **Paper**: "Regularizing Deep Multi-Task Networks using Orthogonal Gradients" (arXiv:1912.06844)
|
| 60 |
+
- **Content**: Proposes orthogonal gradient regularization to reduce task interference
|
| 61 |
+
- **NOT about TSBN** — this paper is about gradient orthogonality, not batch normalization
|
| 62 |
+
- **IMPORTANT**: Suteu has a 2025 paper (arXiv:2512.20420) "Simplifying Multi-Task Architectures Through Task-Specific Normalization" which proposes TSσBN (Task-Specific Sigmoid Batch Normalization)
|
| 63 |
+
- **CORRECTION**: If we cite Suteu for TSBN, we should cite the 2025 paper, not the 2019 one
|
| 64 |
+
- **HOWEVER**: Our TSBN may have been inspired by the general concept of task-specific BN from Rebuffi et al. (2017) "Learning multiple visual domains with residual adapters" (NeurIPS 2017, 1293 citations) which uses domain-specific BN layers
|
| 65 |
+
|
| 66 |
+
## Actual TSBN lineage:
|
| 67 |
+
1. Rebuffi, Bilen & Vedaldi (2017) - domain-specific BN layers for multi-domain learning
|
| 68 |
+
2. Suteu & Serban (2019) - orthogonal gradients (different paper, same first author)
|
| 69 |
+
3. Suteu et al. (2025) - TSσBN (task-specific sigmoid batch normalization)
|
| 70 |
+
- Our TSBN concept is closest to Rebuffi et al. (2017) or the general MTL literature on task-specific BN
|
| 71 |
+
|
| 72 |
+
## Pezeshki et al. (2021) - VERIFIED
|
| 73 |
+
- **Paper**: "Gradient Starvation: A Learning Proclivity in Neural Networks" (NeurIPS 2021)
|
| 74 |
+
- **NOT "gradient interference"** — the paper is about gradient starvation
|
| 75 |
+
- **Definition**: "Gradient starvation occurs when the presence of easy-to-learn features in a dataset prevents the learning of other equally informative features"
|
| 76 |
+
- **Cited by**: 423 (highly cited)
|
| 77 |
+
- **CORRECTION NEEDED**: Report should cite this for "gradient starvation" not "gradient interference"
|
| 78 |
+
- **Relevance**: Directly relevant to our HPS failure where bytes 0,1 (easy features) dominated learning and starved bytes 2-15
|
| 79 |
+
|
| 80 |
+
## Liu et al. (2019) - MTAN - NEED TO VERIFY
|
| 81 |
+
- Need to check: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
|
| 82 |
+
|
| 83 |
+
## Guo et al. (2018) - DTP - NEED TO VERIFY
|
| 84 |
+
- Need to check: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
|
| 85 |
+
|
| 86 |
+
## Kerkhof et al. - NEED TO VERIFY
|
| 87 |
+
- Need to check the spectral decoupling reference
|
| 88 |
+
|
| 89 |
+
## Liu et al. (2019) - MTAN - VERIFIED
|
| 90 |
+
- **Paper**: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
|
| 91 |
+
- **Content**: Proposes Multi-Task Attention Network (MTAN) with soft-attention modules for task-specific feature learning from global features
|
| 92 |
+
- **Quote**: "consists of a single shared network containing a global feature pool, together with a soft-attention module for each task"
|
| 93 |
+
- **Code**: https://github.com/lorenmt/mtan
|
| 94 |
+
|
| 95 |
+
## Guo et al. (2018) - DTP - VERIFIED
|
| 96 |
+
- **Paper**: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
|
| 97 |
+
- **Content**: Proposes dynamic task prioritization where difficulty is inversely proportional to performance
|
| 98 |
+
- **Quote**: "imbalances in task difficulty can lead to unnecessary emphasis on easier tasks, thus neglecting and slowing progress on difficult tasks"
|
| 99 |
+
- **Key insight**: "prioritizing difficult tasks first" — contrast to curriculum learning
|
| 100 |
+
|
| 101 |
+
## Spectral Decoupling - VERIFIED
|
| 102 |
+
- **Paper**: Pohjonen et al. (2022), "Spectral decoupling for training transferable neural networks in medical imaging" (iScience)
|
| 103 |
+
- **Original**: arXiv:2103.17171 (March 2021)
|
| 104 |
+
- **Content**: "encourages the neural network to learn more features by simply regularising the networks' unnormalised prediction scores with an L2 penalty"
|
| 105 |
+
- **Key finding**: "increases the networks' robustness for data distribution shifts and prevents overfitting on easy-to-learn features"
|
| 106 |
+
- **NOT by Kerkhof** — Kerkhof's paper is about focal loss for SCA (eprint 2021/1408)
|
| 107 |
+
- **CORRECTION NEEDED**: The spectral decoupling reference should cite Pohjonen et al. (2022), not Kerkhof
|
| 108 |
+
|
| 109 |
+
## Kerkhof et al. (2021) - Focal Loss for SCA - VERIFIED
|
| 110 |
+
- **Paper**: "A Focal Loss Function for Deep Learning-based Side-channel Analysis" (ePrint 2021/1408)
|
| 111 |
+
- **Content**: Proposes focal loss adapted for SCA context
|
| 112 |
+
- **NOT about spectral decoupling**
|
| 113 |
+
|
| 114 |
+
## FULL CITATION CORRECTION TABLE:
|
| 115 |
+
| Report Claim | Current Citation | Correct Citation |
|
| 116 |
+
|---|---|---|
|
| 117 |
+
| TSBN | Suteu & Serban (2019) | Rebuffi et al. (2017) or Suteu et al. (2025) |
|
| 118 |
+
| Gradient interference | Pezeshki et al. (2021) | Pezeshki et al. (2021) — but should say "gradient starvation" not "interference" |
|
| 119 |
+
| High alpha weight collapse | Chen et al. (2018) Sec 5.4 | Chen et al. (2018) Sec 5.4 + Supplementary 7.1 — but "instability" not "collapse" |
|
| 120 |
+
| Spectral decoupling | Kerkhof et al. | Pohjonen et al. (2022) / arXiv:2103.17171 |
|
| 121 |
+
| Marquet & Oswald full key recovery on ASCADv1 | Wrong dataset | They work on ASCAD-r and ASCAD-v2, NOT ASCADv1 |
|