lemousehunter
/

ascad-v1-models

Model card Files Files and versions

xet

Community

lemousehunter commited on about 1 month ago

Commit

a65fa2d

verified ·

1 Parent(s): f8cc89b

Upload analysis/findings/verified_citations.md with huggingface_hub

Browse files

Files changed (1) hide show

analysis/findings/verified_citations.md +121 -0

analysis/findings/verified_citations.md ADDED Viewed

	@@ -0,0 +1,121 @@

+# Verified Citations
+## Wu et al. (2024) - Multi-bit SCA
+- **VERIFIED**: Table 3 shows TGE0 (traces to guessing entropy 0) for ASCAD_F dataset
+- **VERIFIED**: k12 of ASCAD_F recovered by MMB at 218 traces (line 1027)
+- **VERIFIED**: MB recovered k12 at 58 traces (line 973)
+- **VERIFIED**: "none of the pre-defined leakage models could recover the k12 of ASCAD_F, but MMB and MB demonstrated remarkable capabilities by recovering it with just 218 and 58 traces" (lines 972-974)
+- **NOTE**: The paper uses ASCAD_F (fixed key), not ASCADv1. Need to clarify in report.
+- **NOTE**: The paper also says "multi-task learning helps generalize each task, leading to robust attack performance" (lines 1239-1240)
+## Marquet & Oswald (2023) - Multi-task vs Single-task SCA
+- **NOTE**: This paper (eprint 2023/006) is actually "Exploring Multi-Task Learning in the Context of Masked AES Implementations"
+- **NOTE**: They do NOT use "16 separate CNNs" — they compare multi-task vs single-task on ASCAD-r, ASCAD-v2, CHESCTF-2023
+- **NOTE**: They do NOT work on ASCADv1 (the fixed-key dataset we use). They work on ASCAD-r and ASCAD-v2 (masked implementations)
+- **WRONG CLAIM IN REPORT**: "Marquet & Oswald achieved full key recovery on ASCADv1 using 16 separate CNNs" — INCORRECT DATASET
+- **ACTUAL FINDING**: They show multi-task learning is superior to single-task in masked implementations
+- **ACTUAL FINDING**: Low-level parameter sharing (md) consistently outperforms baseline multi-task and single-task
+- **ACTUAL FINDING**: "No seed allowed the single-task models to recover all bytes" (line 681)
+- **ACTUAL FINDING**: Multi-task learning breaks through initial plateau more consistently
+- **RELEVANT QUOTE**: "multi-task learning is a natural improvement of single-task learning in a scenario where the knowledge of randomness cannot be accessed" (line 885-886)
+## Suteu & Serban (2019) - Cited for TSBN
+- **WRONG CITATION**: Paper title is "Regularizing Deep Multi-Task Networks using Orthogonal Gradients" (arXiv:1912.06844)
+- This paper is about orthogonal gradient regularization, NOT task-specific batch normalization
+- Need to find the actual TSBN reference
+## Pezeshki et al. (2021) - Cited for gradient interference
+- Paper title: "Gradient Starvation: A Learning Priors Problem" (NeurIPS 2021)
+- This is about gradient starvation (some features dominate learning), not gradient interference per se
+- Still relevant but citation context needs to be more precise
+## Chen et al. (2018) - GradNorm
+- Need to verify: "Section 5.4 high alpha causes weight collapse"
+- Need to verify the exact GradNorm formula
+## Liu et al. (2019) - MTAN
+- Paper: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
+- Need to verify it exists and describes soft attention masks
+## Guo et al. (2018) - DTP
+- Paper: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
+- Need to verify the DTP formula matches
+## Chen et al. (2018) - GradNorm - VERIFIED
+- **Paper exists**: ICML 2018, pages 793-802
+- **Section 5.4 EXISTS**: "Effects of tuning the asymmetry α"
+- **DOES NOT say "weight collapse"** — says: "at α=1.75 (not shown) w_depth(t) is suppressed to below 0.02 at no detriment to network performance"
+- **Supplementary Section 7.1**: "No points past α > 2 are shown for the VGG16 backbone as GradNorm weights are unstable past this point"
+- **Footnote 3**: "At large positive values of α, which in the NYUv2 case corresponded to α ≥ 3, some weights were pushed too close to zero and GradNorm updates became unstable."
+- **CORRECTION NEEDED**: Report says "Section 5.4 high α causes weight collapse" — should say "Section 5.4 and Supplementary Section 7.1 show that high α pushes weights to near-zero and causes instability (α ≥ 3 for NYUv2)"
+- **GradNorm formula**: Need to verify exact formula from paper
+## Suteu & Serban (2019) - NEED TO FIND ACTUAL TSBN REFERENCE
+- The Suteu & Serban paper (arXiv:1912.06844) is about orthogonal gradients, NOT TSBN
+- Need to find the actual paper that proposes task-specific batch normalization
+- Likely: Bilen & Vedaldi (2017) "Integrated Perception with Recurrent Multi-Task Neural Networks" or similar
+## Suteu & Serban (2019) - VERIFIED
+- **Paper**: "Regularizing Deep Multi-Task Networks using Orthogonal Gradients" (arXiv:1912.06844)
+- **Content**: Proposes orthogonal gradient regularization to reduce task interference
+- **NOT about TSBN** — this paper is about gradient orthogonality, not batch normalization
+- **IMPORTANT**: Suteu has a 2025 paper (arXiv:2512.20420) "Simplifying Multi-Task Architectures Through Task-Specific Normalization" which proposes TSσBN (Task-Specific Sigmoid Batch Normalization)
+- **CORRECTION**: If we cite Suteu for TSBN, we should cite the 2025 paper, not the 2019 one
+- **HOWEVER**: Our TSBN may have been inspired by the general concept of task-specific BN from Rebuffi et al. (2017) "Learning multiple visual domains with residual adapters" (NeurIPS 2017, 1293 citations) which uses domain-specific BN layers
+## Actual TSBN lineage:
+1. Rebuffi, Bilen & Vedaldi (2017) - domain-specific BN layers for multi-domain learning
+2. Suteu & Serban (2019) - orthogonal gradients (different paper, same first author)
+3. Suteu et al. (2025) - TSσBN (task-specific sigmoid batch normalization)
+- Our TSBN concept is closest to Rebuffi et al. (2017) or the general MTL literature on task-specific BN
+## Pezeshki et al. (2021) - VERIFIED
+- **Paper**: "Gradient Starvation: A Learning Proclivity in Neural Networks" (NeurIPS 2021)
+- **NOT "gradient interference"** — the paper is about gradient starvation
+- **Definition**: "Gradient starvation occurs when the presence of easy-to-learn features in a dataset prevents the learning of other equally informative features"
+- **Cited by**: 423 (highly cited)
+- **CORRECTION NEEDED**: Report should cite this for "gradient starvation" not "gradient interference"
+- **Relevance**: Directly relevant to our HPS failure where bytes 0,1 (easy features) dominated learning and starved bytes 2-15
+## Liu et al. (2019) - MTAN - NEED TO VERIFY
+- Need to check: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
+## Guo et al. (2018) - DTP - NEED TO VERIFY
+- Need to check: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
+## Kerkhof et al. - NEED TO VERIFY
+- Need to check the spectral decoupling reference
+## Liu et al. (2019) - MTAN - VERIFIED
+- **Paper**: "End-to-End Multi-Task Learning with Attention" (CVPR 2019)
+- **Content**: Proposes Multi-Task Attention Network (MTAN) with soft-attention modules for task-specific feature learning from global features
+- **Quote**: "consists of a single shared network containing a global feature pool, together with a soft-attention module for each task"
+- **Code**: https://github.com/lorenmt/mtan
+## Guo et al. (2018) - DTP - VERIFIED
+- **Paper**: "Dynamic Task Prioritization for Multitask Learning" (ECCV 2018)
+- **Content**: Proposes dynamic task prioritization where difficulty is inversely proportional to performance
+- **Quote**: "imbalances in task difficulty can lead to unnecessary emphasis on easier tasks, thus neglecting and slowing progress on difficult tasks"
+- **Key insight**: "prioritizing difficult tasks first" — contrast to curriculum learning
+## Spectral Decoupling - VERIFIED
+- **Paper**: Pohjonen et al. (2022), "Spectral decoupling for training transferable neural networks in medical imaging" (iScience)
+- **Original**: arXiv:2103.17171 (March 2021)
+- **Content**: "encourages the neural network to learn more features by simply regularising the networks' unnormalised prediction scores with an L2 penalty"
+- **Key finding**: "increases the networks' robustness for data distribution shifts and prevents overfitting on easy-to-learn features"
+- **NOT by Kerkhof** — Kerkhof's paper is about focal loss for SCA (eprint 2021/1408)
+- **CORRECTION NEEDED**: The spectral decoupling reference should cite Pohjonen et al. (2022), not Kerkhof
+## Kerkhof et al. (2021) - Focal Loss for SCA - VERIFIED
+- **Paper**: "A Focal Loss Function for Deep Learning-based Side-channel Analysis" (ePrint 2021/1408)
+- **Content**: Proposes focal loss adapted for SCA context
+- **NOT about spectral decoupling**
+## FULL CITATION CORRECTION TABLE:
+| Report Claim | Current Citation | Correct Citation |
+|---|---|---|
+| TSBN | Suteu & Serban (2019) | Rebuffi et al. (2017) or Suteu et al. (2025) |
+| Gradient interference | Pezeshki et al. (2021) | Pezeshki et al. (2021) — but should say "gradient starvation" not "interference" |
+| High alpha weight collapse | Chen et al. (2018) Sec 5.4 | Chen et al. (2018) Sec 5.4 + Supplementary 7.1 — but "instability" not "collapse" |
+| Spectral decoupling | Kerkhof et al. | Pohjonen et al. (2022) / arXiv:2103.17171 |
+| Marquet & Oswald full key recovery on ASCADv1 | Wrong dataset | They work on ASCAD-r and ASCAD-v2, NOT ASCADv1 |