Buckets:
Title: Physical Adversarial Attack Meets Computer Vision: A Decade Survey
URL Source: https://arxiv.org/html/2209.15179
Published Time: Fri, 22 Nov 2024 01:14:41 GMT
Markdown Content: Adversarial Medium Description Method Victim Task Venue Year Manufacture Instrument Attack Type Sticker print, paste eyeglasses impersonation, dodging AdvEyeglass[14]Face Recognition ACM CCS 2016 cover car body hiding CAMOU[23]Vehicle Detection ICLR 2018 display screen hiding InvisibleCloak[50]Person Detection UEMCON 2018 paste camera lens misclassification ACS[51]General Classification PMLR 2019 print, paste eyeglasses impersonation, dodging AdvEyeglass+ [52]Face Recognition TOPS 2019 print, affix hat impersonation Advhat[24]Face Recognition ICPR 2020 print, paste eyeglasses misrecognition CLBAAttack [53]Face Recognition BIOSIG 2021 affix car body hiding DAS[54]Vehicle Detection CVPR 2021 affix road marking misdirection AdvMarkings[55]Lane Detection USENIX 2021 full cover car body hiding FCA[56]Vehicle Detection AAAI 2022 full cover car body hiding DTA[57]Vehicle Detection CVPR 2022 imprint face mask dodging AdvMask[58]Face Recognition ECML PKDD 2022 print, paste face impersonation AdvSticker[59]Face Recognition TPAMI 2022 cover car body hiding CAC[60]Vehicle Detection IJCAI 2022 print, paste face impersonation DOPatch[61]Face Recognition arXiv 2023 print, paste car body false estimation 3D 2 3 superscript 𝐷 2 3D^{2}3 italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Fool[62]Depth Estimation CVPR 2024 \hdashline Patch print, put image patch misclassification AdvPatch[21]General Classification NIPS 2017 print, paste traffic sign misclassification RP 2[63]Sign Classification CVPR 2018 print, paste traffic sign misdetection NestedAE[64]Sign Detection CCS 2019 print, paste traffic sign misclassification PS-GAN[65]Sign Classification AAAI 2019 display screen lose track PAT[66]Object Tracking ICCV 2019 print image patch hiding AdvYOLO[13]Person Detection CVPRW 2019 print, paste image patch mismatching AdvPattern[67]Person Re-ID ICCV 2019 imprint image patch false estimation FlowAttack[68]Flow Estimation ICCV 2019 print, paste image patch misclassification AdvACO[69]General Classification ECCV 2020 paste camera lens misdetection TransPatch[70]Sign Detection CVPR 2021 print, paste image patch lose track MTD[71]Object Tracking AAAI 2021 print, paste face impersonation TAP[72]Face Recognition CVPR 2021 print, paste image patch misclassification AdvACO+[73]General Classification TIP 2021 display screen misdetection AITP[74]Sign Detection ACM AISec 2022 print, paste image patch misclassification CPAttack[75]General Classification NIPS 2022 print, paste image patch misclassification TnTAttack[76]General Classification TIFS 2022 print, paste image patch false estimation OAP[77]Depth Estimation ECCV 2022 print image patch false segmentation RWAEs[78]Segmentation WACV 2022 print, paste image patch false estimation PAP[79]Crowd Counting ACM CCS 2022 print, put image patch misclassification DAPatch[80]General Classification ECCV 2022 print, paste face impersonation SOPP[81]Face Recognition TPAMI 2022 print, paste car hiding AerialAttack[82]Vehicle Detection WACV 2022 paste aerogel patch hiding AdvInfrared[83]Person Detection CVPR 2023 display screen hiding T-SEA[84]Person Detection CVPR 2023 paste aerogel patch hiding CMPatch[85]Person Detection ICCV 2023 print image patch misstatement CAPatch[86]Image Captioning USENIX 2023 paste aerogel patch hiding IAPatch[87]Person Detection IJCV 2023 print, paste image patch mis{det., cla.}TPatch[88]Sign Det. & Cla.USENIX 2023
TABLE IV: Physical adversarial attack methods that use clothing, Images, and lights as adversarial mediums. We list them by the adversarial medium and time order.
(a)Impersonation attack in face recognition task.
(b)Impersonation attack in person re-identification task.
Figure 10: Display of the physical adversarial attack in re-identification (Re-ID) tasks. Adapted from AdvEyeglass[14] (a) and AdvPattern[67] (b).
Lane detection is important for autonomous driving because it supports steering decisions. Jing et al.[55] pioneered the investigation into the security of lane detection modules in real vehicles, focusing on the Tesla Model S. We term their method ”Adversarial Markings” as it utilizes small markings on the road surface to mislead the vehicle’s visual system. Extensive experiments show that Tesla Autopilot is vulnerable to Adversarial Markings in the physical world and follows the fake lane into oncoming traffic.
Zolfi et al.[70] designed a universal perturbation, called TransPatch, to fool the detector for all instances of a specific object class while maintaining the detection of other objects. TransPatch is a type of colored translucent sticker, which performs attacks by attaching this special sticker to the camera lens, resulting in disturbing the camera’s imaging. In addition, Giulio et al.[111] introduced SLAP, a light-based technique enabling physical attacks in self-driving scenarios. Using a projector, SLAP projects a specific pattern onto a Stop Sign, causing YOLOv3[30] and Mask-RCNN[28] to misdetect the targeted object in moving vehicle environments. However, such methods may raise suspicion among human drivers. To address this, Zhu et al.[88] proposed TPatch, innovatively designing a triggered physical adversarial patch. TPatch exhibits malicious behavior only when triggered by acoustic signals; otherwise, it behaves benignly. This approach offers a novel perspective on enhancing the stealthiness of adversarial patches.
5.3 Attacks on Re-Identification Tasks
In this section, we review physical adversarial attacks on re-identification (Re-ID) tasks (see Fig. 10). TABLEX presents the comparative results of these methods based on the hiPAA metric.
5.3.1 Face Recognition
Face Recognition Systems (FRS) are widely used in surveillance and access control[158, 159]. It is valuable to explore the potential risks of FRS. Sharif et al.[14] developed a groundbreaking method to attack the face recognition algorithm by printing a pair of eyeglass frames. The person who wears the adversarial eyeglasses is able to evade being recognized or impersonate another individual. The non-printability score (NPS) is defined to ensure the perturbations are printable, and it is widely used as a loss function during adversarial perturbation optimization. They demonstrate how an attacker that is unaware of the system’s internals is able to achieve inconspicuous impersonation under a commercial FRS[160].
TABLE X: Comparison of the hiPPA metric among attack methods for both the face recognition task and person Re-ID task. We highlight the minimum and maximum values using blue and red, respectively.
Methods Hexagonal Score hiPAA Eff.Rob.Ste.Aes.Pra.Eco. AdvEyeglass[14]CCS16 1.00 0.33 0.65 0.67 0.89 0.99 0.75 AdvEyeglass+ [52]TOPS19 1.00 0.67 0.65 0.67 1.00 0.99 0.83 Advhat[24]ICRP20 1.00 0.83 0.25 0.47 0.69 0.99 0.73 ALPA[110]CVPR20 1.00 0.50 0.26 0.25 0.66 0.92 0.64 CLBAAttack [53]BIOSIG21 0.95 0.33 0.29 0.25 0.63 0.99 0.60 AdvMask[58]EP21 0.96 1.00 0.23 0.20 0.86 0.98 0.74 AdvMakeup[124]IJCAI21 0.40 0.33 0.88 0.68 0.65 0.98 0.59 TAP[72]CVPR21 1.00 0.17 0.43 0.63 0.61 0.99 0.64 AdvSticker[59]TPAMI22 0.98 0.67 0.69 0.88 0.65 0.99 0.82 SOPP[81]TPAMI22 0.96 0.83 0.42 0.63 0.65 0.99 0.76 SLAttack[118]CVPR23 0.65 0.67 0.43 0.46 0.65 0.29 0.56 AT3D[127]CVPR23 0.48 0.67 0.27 0.47 0.65 0.99 0.54 DOPatch[61]arXiv23 0.88 0.83 0.42 0.63 0.65 0.99 0.74 AdvPattern[67]ICCV19 0.69 0.50 0.26 0.25 0.69 0.99 0.55
Pautov et al.[161] explored physical attacks on ArcFace[162] using adversarial patches. They designed a cosine similarity loss to minimize the similarity between the patched photo and the ground truth. The generated gray patch is easily printable. They tested the patch in three forms: eyeglasses, and stickers on the nose and forehead. Numerical experiments demonstrated effective real-world attacks on ArcFace. Light-based attacks have shown feasibility in classification tasks[112]. For FRS, Nguyen et al.[110] designed a real-time adversarial light projection attack using an off-the-shelf camera-projector setup, targeting state-of-the-art FRS such as FaceNet[163] and SphereFace[164].
(a)Optical flow estimation
(b)Crowd counting
(c)Monocular depth estimation
(d)Semantic segmentation
Figure 11: Display of the physical adversarial attack in other tasks. Adapted from FlowAttack[68] (a), PAP[79] (b), OAP[77] (c), and RWAEs[78] (d).
Advhat[24] implements an easily reproducible physical adversarial attack on the state-of-the-art public Face ID system[165, 162]. In the digital space, Advhat uses Spatial Transformer Layer (STL)[166] to project the obtained sticker on the image of the face. In the physical space, Advhat launches attacks by wearing a hat with a special sticker on the forehead area, which significantly reduces the similarity to the ground truth class.
Yin et al.[124] proposed AdvMakeup, a unified adversarial face generation method that addresses a common and practical scenario: applying makeup to eye regions to deceive FRS while maintaining a visually inconspicuous appearance, resembling natural makeup. Concretely, AdvMakeup first introduces a makeup generation module, which can add natural eye shadow over the orbital region. Then, a task-driven fine-grained meta-learning adversarial attack strategy guarantees the attacking effectiveness of the generated makeup. Experimental results show that the AdvMakeup’ attack effectiveness is substantially higher than Advhat[24] and AdvEyeglass[14].
5.3.2 Person Re-Identification
Person Re-ID is the task of identifying and tracking an individual of interest across multiple non-overlapping cameras[167]. This task plays an important role in surveillance and security applications. Wang et al.[67] were the first and only ones to propose a physical attack on the Re-ID model, known as AdvPattern. They accomplished evasion and impersonation attacks by formulating distinct optimization objectives. As shown in Fig.10(b), AdvPattern employs adversarial patches featuring specially crafted patterns as the adversarial medium, which are affixed to a person’s chest. The method degrades the rank-1 accuracy of person Re-ID models from 87.9% to 27.1% and under impersonation attack. This easily implementable approach exposes the vulnerability of the DNNs-based Re-ID system.
5.4 Attacks on Other Tasks
In addition to the three aforementioned mainstream tasks, physical adversarial attacks extend to eight niche tasks, encompassing optical flow estimation[68], steering angle prediction[102], crowd counting[79], segmentation[78], object tracking[66, 71], monocular depth estimation[77], image captioning[86], and X-ray detection[128]. TableXI presents the comparative results based on the hiPAA metric.
Optical Flow Estimation (OFE) aims to measure the pixel 2D motion of an image sequence[168]. As shown in Fig.11(a), Ranjan et al.[68] proposed FlowAttack to perturb the OFE models. FlowAttack utilizes the gradients from pre-trained optical flow networks to update adversarial patches. Experimental results show that FlowAttack can cause large errors for encoder-decoder networks but not strongly affect spatial pyramid networks. This phenomenon demonstrates the correlation between network structure and vulnerability.
Crowd Counting aims to estimate the number of individuals within images or videos, with significant applications in public safety and traffic management[169]. Liu et al.[79] proposed a Perceptual Adversarial Patch (PAP) for attacking crowd-counting systems in the real world. PAP generates an adversarial patch by maximizing the model loss, leading the target victim model to overestimate the count by up to 100 on 80% of the samples (see Fig.11(b)).
Monocular Depth Estimation (MDE) aims to estimate the distance between the camera and a target object, which is crucial for autonomous driving[170]. Recently, Cheng et al.[77] introduced an attack named OAP against MDE models (see Fig.11(c)). OAP employs a rectangular patch region optimization method to find the optimal patch-pasting region, resulting in over 6 meters mean depth estimation error and 93% ASR in downstream tasks. Despite its effectiveness, OAP relies on 2D image patches and cannot achieve multi-viewpoint attacks. To address this limitation, Zheng et al.[62] proposed 3D 2 3 superscript 𝐷 2 3D^{2}3 italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Fool, which integrates UV mapping into adversarial texture optimization, creating robust 3D camouflage textures capable of making the car vanish.
Semantic Segmentation aims to classify each pixel into predefined categories without distinguishing between individual object instances[28]. Nesti et al.[78] crafted adversarial patches to perturb the semantic segmentation models. As shown in Fig.11(d), they created a large adversarial patch, measuring 1m ×\times× 2m, which disrupts the predictions of segmentation models in the physical world. The adversarial patch is optimized using pixel-wise cross-entropy loss on the pre-trained ICNet[171]. Meanwhile, they built abundant and diverse scenes by the CARLA Simulator[139] for scene-specific attacks. Experimental results show that their attack method can reduce the model’s accuracy in the digital space, but in the real world, the attack is greatly downgraded.
Steering Angle Prediction assists autonomous driving systems in making informed decisions[172, 173, 174]. To evaluate the safety and robustness of this task, Kong et al.[102] introduced PhysGAN, a method that generates physically resilient adversarial examples to deceive autonomous steering systems. As shown in Fig.12(a), by utilizing the discriminator within the GAN framework to assess the visual disparities between adversarial roadside signs and their original counterparts, PhysGAN can generate realistic adversarial examples. Meanwhile, it can maintain attack effectiveness continuously across all frames throughout the entire trajectory.
TABLE XI: Comparison of the hiPPA metric among attack methods for eight niche tasks.
Methods Hexagonal Score hiPAA Eff.Rob.Ste.Aes.Pra.Eco. PAT[66]ICCV19 0.60 0.67 0.27 0.27 0.87 0.29 0.51 MTD[71]AAAI21 0.74 0.67 0.27 0.40 0.69 0.99 0.62 OAP[77]ECCV22 0.94 1.00 0.87 0.61 0.65 0.98 0.88 3D 2 3 superscript 𝐷 2 3D^{2}3 italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Fool[62]CVPR24 0.95 1.00 0.59 0.73 0.71 0.79 0.83 FlowAttack[68]ICCV19 1.00 0.67 0.25 0.25 0.64 0.99 0.67 PhysGAN[102]CVPR20 1.00 0.67 0.65 0.89 0.65 0.95 0.81 RWAEs[78]WACV22 1.00 0.33 0.20 0.46 0.25 0.95 0.57 PAP[79]CCS22 1.00 0.33 0.64 0.47 0.61 0.99 0.70 CAPatch[86]USENIX23 1.00 1.00 0.20 0.20 0.63 0.99 0.72 X-Adv[128]USENIX23 0.74 0.33 0.93 0.86 0.69 0.93 0.72
Object Tracking aims to detect moving objects and track them from frame to frame[175]. Wiyatno et al.[66] proposed the first physical adversarial attack on this task. Specifically, they perform optimization to create a distinctive pattern, which is then presented on a large monitor as a background. When a person moves in front of the monitor, the tracker tends to prioritize locking onto the background and disregards the person. Subsequently, Ding et al.[71] proposed a patch-based attack method to launch universal physical attacks on single object tracking. As shown in Fig.12(b), in the presence of the patch, the tracker neglects the originally tracked object. These explorations raise security concerns for real-world visual tracking.
Image Captioning focuses on generating a description of an image, which requires recognizing the important objects, their attributes, and their relationships in an image[176]. Inspired by patch-based attacks, Zhang et al.[86] designed CAPatch, a method capable of inducing errors in final captions within real-world scenarios (see Fig.12(c)). CAPatch deceives image captioning systems, causing them to produce a specified caption or conceal certain keywords. In contrast to existing attack methods, this study represents the initial endeavor to employ an adversarial patch against multi-modal artificial intelligence systems.
X-ray Detection is widely used in security screening to identify prohibited items in safety-critical scenarios[177]. Liu et al.[128] pioneered the exploration of physical adversarial attacks in X-ray imaging. They introduced X-Adv, a technique for creating physically plausible adversarial metal objects. When positioned near the targeted prohibited item, these objects enable the item to evade detection. X-Adv exposes vulnerabilities in X-ray detection systems, emphasizing the necessity for enhanced robustness.
6 Discussion
During the development of this paper, we have observed the diversity and broad scope of physical adversarial attacks. Despite the growth in published works in recent years, there are still gaps to be explored. Here, we discuss the current challenges and opportunities in this field.
6.1 Current Challenges
6.1.1 Existing Domain Gaps
The workflow of physical adversarial attacks (see Fig.1) reveals a process where attackers first design in the digital space, deploy in the physical space, and ultimately execute attacks in the digital domain. This workflow involves the transformation between the digital and physical domains. Some researchers have recognized this; for instance, Jan et al.[100] designed a D2P network to model the transformation of images from the digital domain to the physical domain. However, the current research has paid limited attention to addressing domain gaps, resulting in the unstable attack performance of many methods. The present practices in physical adversarial attacks face significant challenges in reliability and reproducibility.
(a)Steering Angle Prediction
(b)Object Tracking
(c)Image Captioning
Figure 12: Display of the physical adversarial attack across three CV tasks: steering angle prediction, object tracking, and image captioning. Adapted from PhysGAN[102] (a), MTD[71] (b), and CAPatch[86] (c).
6.1.2 Uncontrollable Evaluation Settings
Most existing works evaluate their physical adversarial attack methods in the real world using the adversarial mediums they manufacture. The real-world environment is dynamic, and the process of crafting adversarial mediums involves subjective factors, e.g., the material of the clothing, the quality of the printing, etc, all of which are uncontrollable. Future work with reliable and controllable evaluation setups is anticipated.
6.2 Future Work
6.2.1 New Adversarial Medium
In this survey, we define the adversarial medium as the object that carries the adversarial perturbations in the physical world. The adversarial medium plays a significant role in performing a physical attack. From the above discussion, we see that different tasks have different requirements for the adversarial medium, and the suitable adversarial medium can improve the performance of the attack in solving the trilemma, i.e., effectiveness, robustness, and stealthiness. The attacks using patches[21], light[110], camera ISP[121], makeup[124], 3D-printed object[125], clothing[89], etc, emerged in turn. Recently, the Laser Beam[113] and small lighting bulbs have been used to deceive the DNNs-based models, which inspire novel attack methods and expose the potential risks of DNNs-based applications.
6.2.2 Cross-Domain Physical Adversarial Attacks
To address the challenges mentioned in Sec.6.1.1, attackers need to consider two domain gaps.
Digital-to-physical domain gap arises in Step 2 (see Fig.1), specifically denoting the procedure in which attackers manufacture physical perturbations based on digital perturbations. A typical example is the printing loss proposed by Sharif et al.[14], which specifically refers to the inability to accurately and reliably reproduce colors due to the smaller color space of printing devices compared to the RGB color space. They introduced the non-printability score (NPS) to address this issue. Additionally, Jan et al.[100] employed an image-to-image translation network to model the digital-to-physical transformation. Moreover, the adversarial medium, materials, and certain physical constraints all influence this domain gap. A more detailed consideration will facilitate effective cross-domain attacks.
Physical-to-digital domain gap arises in Step 3 (see Fig.1). Adversarial perturbations carried by the adversarial medium are captured by cameras in the real world, converted into digital images, and then used to attack DNNs-based models. Throughout this process, there exists a domain gap between the transformations from physical perturbations to digital images. Phan et al.[121] have studied physical adversarial attacks under specific ISP conditions, but they did not explore the performance of attacks across different ISPs. Differentiable ISP simulation or camera simulation is an ideal solution. Combining these simulators will enable the generated adversarial perturbations to maintain attack stability across various hardware imaging devices.
6.2.3 Reproducible Evaluation
To address the challenges mentioned in Sec.6.1.2, researchers should be required to disclose more experimental details. On one hand, this includes production details such as materials, size, and manufacturing processes of the adversarial medium. On the other hand, it involves the environmental conditions of physical experiments, such as lighting, background, and shooting distance. While these details may seem trivial and easily overlooked, they are indispensable for a comprehensive evaluation. Otani et al.[178] have proposed a template for researchers to reference, thereby promoting fairness and reproducibility in evaluation. In the field of physical adversarial attacks, a template for disclosing experimental details is anticipated in the future.
6.2.4 Physical World Simulation
The key characteristic of physical adversarial attacks is their real-world feasibility. Precisely simulating the physical environment enhances attacks’ robustness in dynamic settings. Simulation engines like Unreal Engine and Unity offer various conditions for attacks, such as lighting, backgrounds, camera distances, and view angles. Most existing methods use these simulators to assess attack efficacy[23, 60]. However, due to non-differentiability, they cannot be used in end-to-end optimization for adversarial perturbations. Beyond basic operations like rotation, noise addition, affine transformations, and occlusions[13, 22, 93], integrating advanced physical scene simulation methods into the attack pipeline is crucial for considering dynamic settings during adversarial perturbation design.
6.2.5 Hexagonal Physical Adversarial Attacks
The proposed evaluation metric, hiPAA, evaluates physical adversarial attacks from six perspectives. However, during the evaluation, we observed that most existing methods lack a comprehensive consideration[68, 94]. They tend to focus on individual perspectives while neglecting others. Some approaches involve tradeoffs between individual dimensions, such as effectiveness and stealthiness[22, 92]. In real-world applications, physical adversarial attacks often need to excel from all six perspectives. Future work with holistic considerations is expected, advancing and evaluating their methods across various perspectives.
6.2.6 Physical Adversarial Attacks on New Tasks
As described in this survey, the current mainstream physical adversarial attack methods are oriented to tasks such as person detection[90, 22], traffic sign detection[99, 111], face recognition[14, 161], etc. Although many fields have been covered, some tasks have not yet been explored. For example, Cheng et al.[77] recently proposed an adversarial patch attack against monocular depth estimation (MDE), which is a critical vision task in real-world driving scenarios. It is the first time to propose an attack on MDE. Besides, we consider that domains with the following two characteristics can be explored for physical adversarial attacks: 1) using the DNNs techniques, and 2) applying in the physical world. Such as trajectory prediction[179], pose estimation[180], action recognition[181], etc.
7 Conclusion
Physical adversarial attacks have cast a shadow over the reliability of deep neural networks, raising security concerns. Consequently, extensive research has proposed various methods for real-world attacks across multiple tasks. We have provided an overview of the field of physical adversarial attacks on computer vision tasks, covering classification, detection, re-identification, and some niche tasks, with a focus on the adversarial mediums and a comprehensive evaluation. We first propose a general workflow for launching a physical adversarial attack, underlining the important role of the adversarial medium. Additionally, we have devised a new metric termed hiPPA, systematically quantifying and assessing attack methods from six distinct perspectives. Correspondingly, we present comparative results for existing methods, offering valuable insights for future improvements. Many challenges remain ahead, and we hope that this paper can motivate further discussion in this field and provides important guidance for future research, ultimately advancing the safety and reliability of machine vision systems.
8 Acknowledgements
This work was supported by Hubei Key R&D Project (2022BAA033), National Natural Science Foundation of China (62171325), the Fundamental Research Funds for the Central Universities, Peking University, JSPS KAKENHI (JP23K24876), and JST ASPIRE (JPMJAP2303).
References
- [1] A.Voulodimos, N.Doulamis, A.Doulamis, and E.Protopapadakis, “Deep learning for computer vision: A brief review,” Computational intelligence and neuroscience, vol. 2018, 2018.
- [2] D.W. Otter, J.R. Medina, and J.K. Kalita, “A survey of the usages of deep learning for natural language processing,” TNNLS, vol.32, no.2, pp. 604–624, 2020.
- [3] A.B. Nassif, I.Shahin, I.Attili, M.Azzeh, and K.Shaalan, “Speech recognition using deep neural networks: A systematic review,” IEEE access, vol.7, pp. 19 143–19 165, 2019.
- [4] C.Szegedy, W.Zaremba, I.Sutskever, J.Bruna, D.Erhan, I.J. Goodfellow, and R.Fergus, “Intriguing properties of neural networks,” in ICLR, Y.Bengio and Y.LeCun, Eds., 2014.
- [5] N.Carlini and D.Wagner, “Towards evaluating the robustness of neural networks,” in 2017 ieee symposium on security and privacy (sp).Ieee, 2017, pp. 39–57.
- [6] C.Xiao, B.Li, J.-Y. Zhu, W.He, M.Liu, and D.Song, “Generating adversarial examples with adversarial networks,” arXiv preprint arXiv:1801.02610, 2018.
- [7] N.Inkawhich, W.Wen, H.H. Li, and Y.Chen, “Feature space perturbations yield more transferable adversarial examples,” in CVPR, 2019, pp. 7066–7074.
- [8] X.Yuan, P.He, Q.Zhu, and X.Li, “Adversarial examples: Attacks and defenses for deep learning,” TNNLS, vol.30, no.9, pp. 2805–2824, 2019.
- [9] Z.Zhao, Z.Liu, and M.Larson, “Towards large yet imperceptible adversarial image perturbations with perceptual color distance,” in CVPR, 2020, pp. 1039–1048.
- [10] Y.Diao, T.Shao, Y.-L. Yang, K.Zhou, and H.Wang, “Basar: Black-box attack on skeletal action recognition,” in CVPR, 2021, pp. 7597–7607.
- [11] Z.Cai, S.Rane, A.E. Brito, C.Song, S.V. Krishnamurthy, A.K. Roy-Chowdhury, and M.S. Asif, “Zero-query transfer attacks on context-aware object detectors,” in CVPR, 2022, pp. 15 024–15 034.
- [12] Z.Wei, J.Chen, M.Goldblum, Z.Wu, T.Goldstein, and Y.-G. Jiang, “Towards transferable adversarial attacks on vision transformers,” in AAAI, vol.36, no.3, 2022, pp. 2668–2676.
- [13] S.Thys, W.Van Ranst, and T.Goedemé, “Fooling automated surveillance cameras: adversarial patches to attack person detection,” in CVPR workshops, 2019, pp. 0–0.
- [14] M.Sharif, S.Bhagavatula, L.Bauer, and M.K. Reiter, “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,” in Proceedings of the 2016 acm sigsac conference on computer and communications security, 2016, pp. 1528–1540.
- [15] L.Sun, M.Tan, and Z.Zhou, “A survey of practical adversarial example attacks,” Cybersecurity, vol.1, pp. 1–9, 2018.
- [16] X.Wei, B.Pu, J.Lu, and B.Wu, “Physically adversarial attacks and defenses in computer vision: A survey,” arXiv preprint arXiv:2211.01671, 2022.
- [17] D.Wang, W.Yao, T.Jiang, G.Tang, and X.Chen, “A survey on physical adversarial attack in computer vision,” arXiv preprint arXiv:2209.14262, 2022.
- [18] K.Nguyen, T.Fernando, C.Fookes, and S.Sridharan, “Physical adversarial attacks for surveillance: A survey,” arXiv preprint arXiv:2305.01074, 2023.
- [19] Q.Xu, G.Tao, S.Cheng, and X.Zhang, “Towards feature space adversarial attack by style perturbation,” in AAAI, vol.35, no.12, 2021, pp. 10 523–10 531.
- [20] Z.Cai, X.Xie, S.Li, M.Yin, C.Song, S.V. Krishnamurthy, A.K. Roy-Chowdhury, and M.S. Asif, “Context-aware transfer attacks for object detection,” in AAAI, vol.36, no.1, 2022, pp. 149–157.
- [21] T.B. Brown, D.Mané, A.Roy, M.Abadi, and J.Gilmer, “Adversarial patch,” in NIPS workshop, 2017.
- [22] J.Tan, N.Ji, H.Xie, and X.Xiang, “Legitimate adversarial patches: Evading human eyes and detection models in the physical world,” in ACM MM, 2021, pp. 5307–5315.
- [23] Y.Zhang, H.Foroosh, P.David, and B.Gong, “Camou: Learning physical vehicle camouflages to adversarially attack detectors in the wild,” in ICLR, 2018.
- [24] S.Komkov and A.Petiushko, “Advhat: Real-world adversarial attack on arcface face id system,” in ICPR.IEEE, 2021, pp. 819–826.
- [25] O.Russakovsky, J.Deng, H.Su, J.Krause, S.Satheesh, S.Ma, Z.Huang, A.Karpathy, A.Khosla, M.Bernstein et al., “Imagenet large scale visual recognition challenge,” IJCV, vol. 115, no.3, pp. 211–252, 2015.
- [26] Z.Liu, Y.Lin, Y.Cao, H.Hu, Y.Wei, Z.Zhang, S.Lin, and B.Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, 2021, pp. 10 012–10 022.
- [27] J.Yu, Z.Wang, V.Vasudevan, L.Yeung, M.Seyedhosseini, and Y.Wu, “Coca: Contrastive captioners are image-text foundation models,” Transactions on Machine Learning Research, 2022.
- [28] K.He, G.Gkioxari, P.Dollár, and R.Girshick, “Mask r-cnn,” in ICCV, 2017, pp. 2961–2969.
- [29] V.Badrinarayanan, A.Kendall, and R.Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” TPAMI, vol.39, no.12, pp. 2481–2495, 2017.
- [30] J.Redmon and A.Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
- [31] H.Zhang, F.Li, S.Liu, L.Zhang, H.Su, J.Zhu, L.M. Ni, and H.-Y. Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
- [32] M.Ye, Z.Wang, X.Lan, and P.C. Yuen, “Visible thermal person re-identification via dual-constrained top-ranking.” in IJCAI, vol.1, 2018, p.2.
- [33] Z.Wang, M.Ye, F.Yang, X.Bai, and S.S. 0001, “Cascaded sr-gan for scale-adaptive low resolution person re-identification.” in IJCAI, vol.1, no.2, 2018, p.4.
- [34] Q.Meng, S.Zhao, Z.Huang, and F.Zhou, “Magface: A universal representation for face recognition and quality assessment,” in CVPR, 2021, pp. 14 225–14 234.
- [35] T.Gu, B.Dolan-Gavitt, and S.Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017.
- [36] Y.Liu, S.Ma, Y.Aafer, W.-C. Lee, J.Zhai, W.Wang, and X.Zhang, “Trojaning attack on neural networks,” 2017.
- [37] E.Wenger, J.Passananti, A.N. Bhagoji, Y.Yao, H.Zheng, and B.Y. Zhao, “Backdoor attacks against deep learning systems in the physical world,” in CVPR, 2021, pp. 6206–6215.
- [38] X.Qi, T.Xie, R.Pan, J.Zhu, Y.Yang, and K.Bu, “Towards practical deployment-stage backdoor attack on deep neural networks,” in CVPR, 2022, pp. 13 347–13 357.
- [39] M.Barreno, B.Nelson, R.Sears, A.D. Joseph, and J.D. Tygar, “Can machine learning be secure?” in Proceedings of the 2006 ACM Symposium on Information, computer and communications security, 2006, pp. 16–25.
- [40] A.Shafahi, W.R. Huang, M.Najibi, O.Suciu, C.Studer, T.Dumitras, and T.Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neural networks,” NIPS, vol.31, 2018.
- [41] X.Zhang, X.Zhu, and L.Lessard, “Online data poisoning attacks,” in Learning for Dynamics and Control.PMLR, 2020, pp. 201–210.
- [42] A.Oprea, A.Singhal, and A.Vassilev, “Poisoning attacks against machine learning: Can machine learning be trustworthy?” Computer, vol.55, no.11, pp. 94–99, 2022.
- [43] Y.Gao, B.G. Doan, Z.Zhang, S.Ma, J.Zhang, A.Fu, S.Nepal, and H.Kim, “Backdoor attacks and countermeasures on deep learning: A comprehensive review,” arXiv preprint arXiv:2007.10760, 2020.
- [44] E.Bagdasaryan and V.Shmatikov, “Blind backdoors in deep learning models,” in USENIX Security, 2021, pp. 1505–1521.
- [45] K.Doan, Y.Lao, W.Zhao, and P.Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” in ICCV, 2021, pp. 11 966–11 976.
- [46] J.Su, D.V. Vargas, and K.Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Transactions on Evolutionary Computation, vol.23, no.5, pp. 828–841, 2019.
- [47] H.Ma, Y.Li, Y.Gao, Z.Zhang, A.Abuadbba, A.Fu, S.F. Al-Sarawi, N.Surya, and D.Abbott, “Macab: Model-agnostic clean-annotation backdoor to object detection with natural trigger in real-world,” 2022.
- [48] H.Wei, H.Yu, K.Zhang, Z.Wang, J.Zhu, and Z.Wang, “Moiré backdoor attack (mba): A novel trigger for pedestrian detectors in the physical world,” in ACM MM, 2023, pp. 8828–8838.
- [49] M.P. Van Albada and A.Lagendijk, “Observation of weak localization of light in a random medium,” Physical review letters, vol.55, no.24, p. 2692, 1985.
- [50] D.Y. Yang, J.Xiong, X.Li, X.Yan, J.Raiti, Y.Wang, H.Wu, and Z.Zhong, “Building towards” invisible cloak”: Robust physical adversarial attack on yolo object detector,” in UEMCON.IEEE, 2018, pp. 368–374.
- [51] J.Li, F.Schmidt, and Z.Kolter, “Adversarial camera stickers: A physical camera-based attack on deep learning systems,” in ICML.PMLR, 2019, pp. 3896–3904.
- [52] M.Sharif, S.Bhagavatula, L.Bauer, and M.K. Reiter, “A general framework for adversarial examples with objectives,” ACM TOPS, vol.22, no.3, pp. 1–30, 2019.
- [53] I.Singh, S.Momiyama, K.Kakizaki, and T.Araki, “On brightness agnostic adversarial examples against face recognition systems,” in BIOSIG.IEEE, 2021, pp. 1–5.
- [54] J.Wang, A.Liu, Z.Yin, S.Liu, S.Tang, and X.Liu, “Dual attention suppression attack: Generate adversarial camouflage in physical world,” in CVPR, 2021, pp. 8565–8574.
- [55] P.Jing, Q.Tang, Y.Du, L.Xue, X.Luo, T.Wang, S.Nie, and S.Wu, “Too good to be safe: Tricking lane detection in autonomous driving with crafted perturbations,” in USENIX Security, 2021, pp. 3237–3254.
- [56] D.Wang, T.Jiang, J.Sun, W.Zhou, Z.Gong, X.Zhang, W.Yao, and X.Chen, “Fca: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack,” in AAAI, vol.36, no.2, 2022, pp. 2414–2422.
- [57] N.Suryanto, Y.Kim, H.Kang, H.T. Larasati, Y.Yun, T.-T.-H. Le, H.Yang, S.-Y. Oh, and H.Kim, “Dta: Physical camouflage attacks using differentiable transformation network,” in CVPR, 2022, pp. 15 305–15 314.
- [58] A.Zolfi, S.Avidan, Y.Elovici, and A.Shabtai, “Adversarial mask: Real-world adversarial attack against face recognition models,” arXiv preprint arXiv:2111.10759, 2021.
- [59] X.Wei, Y.Guo, and J.Yu, “Adversarial sticker: A stealthy attack method in the physical world,” TPAMI, 2022.
- [60] Y.Duan, J.Chen, X.Zhou, J.Zou, Z.He, J.Zhang, W.Zhang, and Z.Pan, “Learning coated adversarial camouflages for object detectors,” in IJCAI, 2022, pp. 891–897.
- [61] X.Wei, S.Ruan, Y.Dong, and H.Su, “Distributional modeling for location-aware adversarial patches,” arXiv preprint arXiv:2306.16131, 2023.
- [62] J.Zheng, C.Lin, J.Sun, Z.Zhao, Q.Li, and C.Shen, “Physical 3d adversarial attacks against monocular depth estimation in autonomous driving,” in CVPR, 2024, pp. 24 452–24 461.
- [63] K.Eykholt, I.Evtimov, E.Fernandes, B.Li, A.Rahmati, C.Xiao, A.Prakash, T.Kohno, and D.Song, “Robust physical-world attacks on deep learning visual classification,” in CVPR, 2018, pp. 1625–1634.
- [64] Y.Zhao, H.Zhu, R.Liang, Q.Shen, S.Zhang, and K.Chen, “Seeing isn’t believing: Towards more robust adversarial attack against real world object detectors,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019, pp. 1989–2004.
- [65] A.Liu, X.Liu, J.Fan, Y.Ma, A.Zhang, H.Xie, and D.Tao, “Perceptual-sensitive gan for generating adversarial patches,” in AAAI, vol.33, no.01, 2019, pp. 1028–1035.
- [66] R.R. Wiyatno and A.Xu, “Physical adversarial textures that fool visual object tracking,” in ICCV, 2019, pp. 4822–4831.
- [67] Z.Wang, S.Zheng, M.Song, Q.Wang, A.Rahimpour, and H.Qi, “advpattern: physical-world attacks on deep person re-identification via adversarially transformable patterns,” in ICCV, 2019, pp. 8341–8350.
- [68] A.Ranjan, J.Janai, A.Geiger, and M.J. Black, “Attacking optical flow,” in ICCV, 2019, pp. 2404–2413.
- [69] A.Liu, J.Wang, X.Liu, B.Cao, C.Zhang, and H.Yu, “Bias-based universal adversarial patch attack for automatic check-out,” in ECCV.Springer, 2020, pp. 395–410.
- [70] A.Zolfi, M.Kravchik, Y.Elovici, and A.Shabtai, “The translucent patch: A physical and universal attack on object detectors,” in CVPR, 2021, pp. 15 232–15 241.
- [71] L.Ding, Y.Wang, K.Yuan, M.Jiang, P.Wang, H.Huang, and Z.J. Wang, “Towards universal physical attacks on single object tracking,” in AAAI, vol.35, no.2, 2021, pp. 1236–1245.
- [72] Z.Xiao, X.Gao, C.Fu, Y.Dong, W.Gao, X.Zhang, J.Zhou, and J.Zhu, “Improving transferability of adversarial patches on face recognition with generative models,” in CVPR, 2021, pp. 11 845–11 854.
- [73] J.Wang, A.Liu, X.Bai, and X.Liu, “Universal adversarial patch attack for automatic checkout using perceptual and attentional bias,” TIP, vol.31, pp. 598–611, 2021.
- [74] P.A. Sava, J.-P. Schulze, P.Sperl, and K.Böttinger, “Assessing the impact of transformations on physical adversarial attacks,” in Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, 2022, pp. 79–90.
- [75] S.Casper, M.Nadeau, D.Hadfield-Menell, and G.Kreiman, “Robust feature-level adversaries are interpretability tools,” NIPS, vol.35, pp. 33 093–33 106, 2022.
- [76] B.G. Doan, M.Xue, S.Ma, E.Abbasnejad, and D.C. Ranasinghe, “Tnt attacks! universal naturalistic adversarial patches against deep neural network systems,” TIFS, vol.17, pp. 3816–3830, 2022.
- [77] Z.Cheng, J.Liang, H.Choi, G.Tao, Z.Cao, D.Liu, and X.Zhang, “Physical attack on monocular depth estimation with optimal adversarial patches,” in ECCV.Springer, 2022, pp. 514–532.
- [78] F.Nesti, G.Rossolini, S.Nair, A.Biondi, and G.Buttazzo, “Evaluating the robustness of semantic segmentation for autonomous driving against real-world adversarial patch attacks,” in WACV, 2022, pp. 2280–2289.
- [79] S.Liu, J.Wang, A.Liu, Y.Li, Y.Gao, X.Liu, and D.Tao, “Harnessing perceptual adversarial patches for crowd counting,” in Proceedings of the 2022 ACM SIGSAC Conference on CCS.New York, NY, USA: Association for Computing Machinery, 2022, p. 2055–2069.
- [80] Z.Chen, B.Li, S.Wu, J.Xu, S.Ding, and W.Zhang, “Shape matters: deformable patch attack,” in ECCV.Springer, 2022, pp. 529–548.
- [81] X.Wei, Y.Guo, J.Yu, and B.Zhang, “Simultaneously optimizing perturbations and positions for black-box adversarial patch attacks,” TPAMI, 2022.
- [82] A.Du, B.Chen, T.-J. Chin, Y.W. Law, M.Sasdelli, R.Rajasegaran, and D.Campbell, “Physical adversarial attacks on an aerial imagery object detector,” in WACV, 2022, pp. 1796–1806.
- [83] X.Wei, J.Yu, and Y.Huang, “Physically adversarial infrared patches with learnable shapes and locations,” in CVPR, June 2023, pp. 12 334–12 342.
- [84] H.Huang, Z.Chen, H.Chen, Y.Wang, and K.Zhang, “T-sea: Transfer-based self-ensemble attack on object detection,” in CVPR, 2023.
- [85] X.Wei, Y.Huang, Y.Sun, and J.Yu, “Unified adversarial patch for visible-infrared cross-modal attacks in the physical world,” 2023.
- [86] S.Zhang, Y.Cheng, W.Zhu, X.Ji, and W.Xu, “{{{{CAPatch}}}}: Physical adversarial patch against image captioning systems,” in USENIX Security, 2023, pp. 679–696.
- [87] X.Wei, J.Yu, and Y.Huang, “Infrared adversarial patches with learnable shapes and locations in the physical world,” IJCV, pp. 1–17, 2023.
- [88] W.Zhu, X.Ji, Y.Cheng, S.Zhang, and W.Xu, “Tpatch: A triggered physical adversarial patch,” arXiv preprint arXiv:2401.00148, 2023.
- [89] K.Xu, G.Zhang, S.Liu, Q.Fan, M.Sun, H.Chen, P.-Y. Chen, Y.Wang, and X.Lin, “Adversarial t-shirt! evading person detectors in a physical world,” in ECCV.Springer, 2020, pp. 665–681.
- [90] L.Huang, C.Gao, Y.Zhou, C.Xie, A.L. Yuille, C.Zou, and N.Liu, “Universal physical camouflage attacks on object detectors,” in CVPR, 2020, pp. 720–729.
- [91] Z.Wu, S.-N. Lim, L.S. Davis, and T.Goldstein, “Making an invisibility cloak: Real world adversarial attacks on object detectors,” in ECCV.Springer, 2020, pp. 1–17.
- [92] Y.-C.-T. Hu, B.-H. Kung, D.S. Tan, J.-C. Chen, K.-L. Hua, and W.-H. Cheng, “Naturalistic physical adversarial patch for object detectors,” in ICCV, 2021, pp. 7848–7857.
- [93] Z.Hu, S.Huang, X.Zhu, F.Sun, B.Zhang, and X.Hu, “Adversarial texture for fooling person detectors in the physical world,” in CVPR, 2022, pp. 13 307–13 316.
- [94] X.Zhu, Z.Hu, S.Huang, J.Li, and X.Hu, “Infrared invisible clothing: Hiding from infrared detectors at multiple angles in real world,” in CVPR, 2022, pp. 13 317–13 326.
- [95] J.Sun, W.Yao, T.Jiang, D.Wang, and X.Chen, “Differential evolution based dual adversarial camouflage: Fooling human eyes and object detectors,” Neural Networks, vol. 163, pp. 256–271, 2023.
- [96] Z.Hu, W.Chu, X.Zhu, H.Zhang, B.Zhang, and X.Hu, “Physically realizable natural-looking clothing textures evade person detectors via 3d modeling,” in CVPR, June 2023, pp. 16 975–16 984.
- [97] A.Kurakin, I.J. Goodfellow, and S.Bengio, “Adversarial examples in the physical world,” 2017. [Online]. Available: https://openreview.net/forum?id=HJGU3Rodl
- [98] S.-T. Chen, C.Cornelius, J.Martin, and D.H.P. Chau, “Shapeshifter: Robust physical adversarial attack on faster r-cnn object detector,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer, 2018, pp. 52–68.
- [99] D.Song, K.Eykholt, I.Evtimov, E.Fernandes, B.Li, A.Rahmati, F.Tramèr, A.Prakash, and T.Kohno, “Physical adversarial examples for object detectors,” in 12th USENIX Workshop on Offensive Technologies (WOOT 18), 2018.
- [100] S.T. Jan, J.Messou, Y.-C. Lin, J.-B. Huang, and G.Wang, “Connecting the digital and physical world: Improving the robustness of adversarial attacks,” in AAAI, vol.33, no.01, 2019, pp. 962–969.
- [101] Q.Guo, F.Juefei-Xu, X.Xie, L.Ma, J.Wang, B.Yu, W.Feng, and Y.Liu, “Watch out! motion is blurring the vision of your deep neural networks,” NIPS, vol.33, pp. 975–985, 2020.
- [102] Z.Kong, J.Guo, A.Li, and C.Liu, “Physgan: Generating physical-world-resilient adversarial examples for autonomous driving,” in CVPR, 2020, pp. 14 254–14 263.
- [103] R.Duan, X.Ma, Y.Wang, J.Bailey, A.K. Qin, and Y.Yang, “Adversarial camouflage: Hiding physical-world attacks with natural styles,” in CVPR, 2020, pp. 1000–1008.
- [104] K.Yang, T.Tsai, H.Yu, T.-Y. Ho, and Y.Jin, “Beyond digital domain: Fooling deep learning based recognition system in physical world,” in AAAI, vol.34, no.01, 2020, pp. 1088–1095.
- [105] W.Feng, B.Wu, T.Zhang, Y.Zhang, and Y.Zhang, “Meta-attack: Class-agnostic and model-agnostic physical adversarial attack,” in ICCV, 2021, pp. 7787–7796.
- [106] Y.Dong, S.Ruan, H.Su, C.Kang, X.Wei, and J.Zhu, “Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints,” NIPS, vol.35, pp. 36 789–36 803, 2022.
- [107] W.Feng, N.Xu, T.Zhang, B.Wu, and Y.Zhang, “Robust and generalized physical adversarial attacks via meta-gan,” TIFS, 2023.
- [108] N.Nichols and R.Jasper, “Projecting trouble: Light based adversarial attacks on deep learning classifiers,” 2018.
- [109] Y.Man, M.Li, and R.Gerdes, “Poster: Perceived adversarial examples,” in IEEE Symposium on Security and Privacy, no. 2019, 2019.
- [110] D.-L. Nguyen, S.S. Arora, Y.Wu, and H.Yang, “Adversarial light projection attacks on face recognition systems: A feasibility study,” in CVPR workshops, 2020, pp. 814–815.
- [111] G.Lovisotto, H.Turner, I.Sluganovic, M.Strohmeier, and I.Martinovic, “SLAP: Improving physical adversarial examples with Short-Lived adversarial perturbations,” in USENIX Security, 2021, pp. 1865–1882.
- [112] A.Gnanasambandam, A.M. Sherman, and S.H. Chan, “Optical adversarial attack,” in ICCV, 2021, pp. 92–101.
- [113] R.Duan, X.Mao, A.K. Qin, Y.Chen, S.Ye, Y.He, and Y.Yang, “Adversarial laser beam: Effective physical-world attack to dnns in a blink,” in CVPR, 2021, pp. 16 062–16 071.
- [114] Y.Zhong, X.Liu, D.Zhai, J.Jiang, and X.Ji, “Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon,” in CVPR, 2022, pp. 15 345–15 354.
- [115] B.Huang and H.Ling, “Spaa: Stealthy projector-based adversarial attacks on deep image classifiers,” in 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).IEEE, 2022, pp. 534–542.
- [116] H.Wen, S.Chang, and L.Zhou, “Light projection-based physical-world vanishing attack against car detection,” in ICASSP, 2023, pp. 1–5.
- [117] C.Hu, Y.Wang, K.Tiliwalidi, and W.Li, “Adversarial laser spot: Robust and covert physical-world attack to dnns,” in Asian Conference on Machine Learning.PMLR, 2023, pp. 483–498.
- [118] Y.Li, Y.Li, X.Dai, S.Guo, and B.Xiao, “Physical-world optical adversarial attacks on 3d face recognition,” in CVPR, June 2023, pp. 24 699–24 708.
- [119] D.Wang, W.Yao, T.Jiang, C.Li, and X.Chen, “Rfla: A stealthy reflected light adversarial attack in the physical world,” in ICCV, 2023, pp. 4455–4465.
- [120] A.Sayles, A.Hooda, M.Gupta, R.Chatterjee, and E.Fernandes, “Invisible perturbations: Physical adversarial examples exploiting the rolling shutter effect,” in CVPR, 2021, pp. 14 666–14 675.
- [121] B.Phan, F.Mannan, and F.Heide, “Adversarial imaging pipelines,” in CVPR, 2021, pp. 16 051–16 061.
- [122] X.Zhu, X.Li, J.Li, Z.Wang, and X.Hu, “Fooling thermal infrared pedestrian detectors in real world using small bulbs,” in AAAI, vol.35, no.4, 2021, pp. 3616–3624.
- [123] H.Wei, Z.Wang, X.Jia, Y.Zheng, H.Tang, S.Satoh, and Z.Wang, “Hotcold block: Fooling thermal infrared detectors with a novel wearable design,” in AAAI, vol.37, no.12, 2023, pp. 15 233–15 241.
- [124] B.Yin, W.Wang, T.Yao, J.Guo, Z.Kong, S.Ding, J.Li, and C.Liu, “Adv-makeup: A new imperceptible and transferable attack on face recognition,” 2021.
- [125] A.Athalye, L.Engstrom, A.Ilyas, and K.Kwok, “Synthesizing robust adversarial examples,” in ICML.PMLR, 2018, pp. 284–293.
- [126] X.Zeng, C.Liu, Y.-S. Wang, W.Qiu, L.Xie, Y.-W. Tai, C.-K. Tang, and A.L. Yuille, “Adversarial attacks beyond the image space,” in CVPR, 2019, pp. 4302–4311.
- [127] X.Yang, C.Liu, L.Xu, Y.Wang, Y.Dong, N.Chen, H.Su, and J.Zhu, “Towards effective adversarial textured 3d meshes on physical face recognition,” in CVPR, June 2023, pp. 4119–4128.
- [128] A.Liu, J.Guo, J.Wang, S.Liang, R.Tao, W.Zhou, C.Liu, X.Liu, and D.Tao, “{{{{X-Adv}}}}: Physical adversarial object attacks against x-ray prohibited item detection,” in USENIX Security, 2023, pp. 3781–3798.
- [129] Y.Huang, Y.Dong, S.Ruan, X.Yang, H.Su, and X.Wei, “Towards transferable targeted 3d adversarial attack in the physical world,” in CVPR, 2024, pp. 24 512–24 522.
- [130] N.Hingun, C.Sitawarin, J.Li, and D.Wagner, “Reap: A large-scale realistic adversarial patch benchmark,” ICCV, 2023.
- [131] N.Wang, Y.Luo, T.Sato, K.Xu, and Q.A. Chen, “Does physical adversarial example really matter to autonomous driving? towards system-level effect of adversarial object evasion attack,” ICCV, 2023.
- [132] S.Li, S.Zhang, G.Chen, D.Wang, P.Feng, J.Wang, A.Liu, X.Yi, and X.Liu, “Towards benchmarking and assessing visual naturalness of physical world adversarial attacks,” in CVPR, 2023, pp. 12 324–12 333.
- [133] I.Goodfellow, J.Pouget-Abadie, M.Mirza, B.Xu, D.Warde-Farley, S.Ozair, A.Courville, and Y.Bengio, “Generative adversarial networks,” Communications of the ACM, vol.63, no.11, pp. 139–144, 2020.
- [134] J.Deng, W.Dong, R.Socher, L.-J. Li, K.Li, and L.Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR.Ieee, 2009, pp. 248–255.
- [135] C.Szegedy, V.Vanhoucke, S.Ioffe, J.Shlens, and Z.Wojna, “Rethinking the inception architecture for computer vision,” in CVPR, 2016, pp. 2818–2826.
- [136] J.-Y. Zhu, T.Park, P.Isola, and A.A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017, pp. 2223–2232.
- [137] J.Wang, Z.Yin, P.Hu, A.Liu, R.Tao, H.Qin, X.Liu, and D.Tao, “Defensive patches for robust recognition in the physical world,” in CVPR, 2022, pp. 2456–2465.
- [138] T.Wu, X.Ning, W.Li, R.Huang, H.Yang, and Y.Wang, “Physical adversarial attack on vehicle detector in the carla simulator,” arXiv preprint arXiv:2007.16118, 2020.
- [139] A.Dosovitskiy, G.Ros, F.Codevilla, A.Lopez, and V.Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning.PMLR, 2017, pp. 1–16.
- [140] H.Kato, Y.Ushiku, and T.Harada, “Neural 3d mesh renderer,” in CVPR, 2018, pp. 3907–3916.
- [141] J.Thies, M.Zollhöfer, and M.Nießner, “Deferred neural rendering: Image synthesis using neural textures,” TOG, vol.38, no.4, pp. 1–12, 2019.
- [142] K.Rematas and V.Ferrari, “Neural voxel renderer: Learning an accurate and controllable rendering tool,” in CVPR, 2020, pp. 5417–5427.
- [143] J.Redmon, S.Divvala, R.Girshick, and A.Farhadi, “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
- [144] J.Redmon and A.Farhadi, “Yolo9000: better, faster, stronger,” in CVPR, 2017, pp. 7263–7271.
- [145] N.Dalal and B.Triggs, “Histograms of oriented gradients for human detection,” in CVPR, vol.1.Ieee, 2005, pp. 886–893.
- [146] Z.Wang, Q.She, and T.E. Ward, “Generative adversarial networks in computer vision: A survey and taxonomy,” ACM Computing Surveys, vol.54, no.2, pp. 1–38, 2021.
- [147] A.Brock, J.Donahue, and K.Simonyan, “Large scale gan training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018.
- [148] T.Karras, M.Aittala, S.Laine, E.Härkönen, J.Hellsten, J.Lehtinen, and T.Aila, “Alias-free generative adversarial networks,” NIPS, vol.34, pp. 852–863, 2021.
- [149] M.Andriluka, L.Pishchulin, P.Gehler, and B.Schiele, “2d human pose estimation: New benchmark and state of the art analysis,” in CVPR, 2014, pp. 3686–3693.
- [150] J.Long, E.Shelhamer, and T.Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
- [151] J.T. Springenberg, A.Dosovitskiy, T.Brox, and M.Riedmiller, “Striving for simplicity: The all convolutional net,” arXiv preprint arXiv:1412.6806, 2014.
- [152] A.Hatcher, Algebraic topology.Cambridge University Press, 2002.
- [153] M.Martin, A.Roitberg, M.Haurilet, M.Horne, S.Reiß, M.Voit, and R.Stiefelhagen, “Drive&act: A multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles,” in ICCV, 2019, pp. 2801–2810.
- [154] M.Ye, C.Chen, J.Shen, and L.Shao, “Dynamic tri-level relation mining with attentive graph for visible infrared re-identification,” TIFS, vol.17, pp. 386–398, 2021.
- [155] FLIR, “Teledyne flir free adas thermal datasets v2,” [EB/OL], 2022, https://adas-dataset-v2.flirconservator.com/.
- [156] S.Ren, K.He, R.Girshick, and J.Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” NIPS, vol.28, 2015.
- [157] G.Jocher, “Yolov5 detector,” https://github.com/ultralytics/yolov5, 2020, accessed: 2023-08-15.
- [158] MobileSec, “Mobilesec android authentication framework,” https://github.com/mobilesec/authentication-framework-module-face, 2022.
- [159] N.Technology, “Sentiveillance sdk,” http://www.neurotechnology.com/sentiveillance.html, 2022.
- [160] M.Inc, “Face++,” http://www.faceplusplus.com/, 2022.
- [161] M.Pautov, G.Melnikov, E.Kaziakhmedov, K.Kireev, and A.Petiushko, “On adversarial patches: Real-world attack on arcface-100 face recognition system,” in 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), 2019, pp. 0391–0396.
- [162] J.Deng, J.Guo, N.Xue, and S.Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in CVPR, June 2019.
- [163] F.Schroff, D.Kalenichenko, and J.Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, June 2015.
- [164] W.Liu, Y.Wen, Z.Yu, M.Li, B.Raj, and L.Song, “Sphereface: Deep hypersphere embedding for face recognition,” in CVPR, July 2017.
- [165] P.Grother and M.Ngan, “Face recognition vendor test ( frvt ) performance of face identification algorithms,” NIST Interagency/Internal Report (NISTIR) - 8009, 2014.
- [166] M.Jaderberg, K.Simonyan, A.Zisserman et al., “Spatial transformer networks,” vol.28, 2015.
- [167] Z.Wang, Z.Wang, Y.Zheng, Y.-Y. Chuang, and S.Satoh, “Learning to reduce dual-level discrepancy for infrared-visible person re-identification,” in CVPR, 2019, pp. 618–626.
- [168] E.Ilg, N.Mayer, T.Saikia, M.Keuper, A.Dosovitskiy, and T.Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in CVPR, 2017, pp. 2462–2470.
- [169] X.Jiang, L.Zhang, M.Xu, T.Zhang, P.Lv, B.Zhou, X.Yang, and Y.Pang, “Attention scaling for crowd counting,” in CVPR, 2020, pp. 4706–4715.
- [170] Y.Wang, W.-L. Chao, D.Garg, B.Hariharan, M.Campbell, and K.Q. Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving,” in CVPR, June 2019.
- [171] H.Zhao, X.Qi, X.Shen, J.Shi, and J.Jia, “Icnet for real-time semantic segmentation on high-resolution images,” in ECCV, 2018, pp. 405–420.
- [172] C.Chen, A.Seff, A.Kornhauser, and J.Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in ICCV, 2015, pp. 2722–2730.
- [173] A.Prakash, K.Chitta, and A.Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” in CVPR, 2021, pp. 7077–7087.
- [174] K.Chitta, A.Prakash, and A.Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” in ICCV, 2021, pp. 15 793–15 803.
- [175] A.Yilmaz, O.Javed, and M.Shah, “Object tracking: A survey,” Acm computing surveys, vol.38, no.4, pp. 13–es, 2006.
- [176] M.Z. Hossain, F.Sohel, M.F. Shiratuddin, and H.Laga, “A comprehensive survey of deep learning for image captioning,” ACM Computing Surveys, vol.51, no.6, pp. 1–36, 2019.
- [177] R.Tao, Y.Wei, X.Jiang, H.Li, H.Qin, J.Wang, Y.Ma, L.Zhang, and X.Liu, “Towards real-world x-ray security inspection: A high-quality benchmark and lateral inhibition module for prohibited items detection,” in ICCV, 2021, pp. 10 923–10 932.
- [178] M.Otani, R.Togashi, Y.Sawai, R.Ishigami, Y.Nakashima, E.Rahtu, J.Heikkilä, and S.Satoh, “Toward verifiable and reproducible human evaluation for text-to-image generation,” in CVPR, 2023, pp. 14 277–14 286.
- [179] J.Gu, C.Sun, and H.Zhao, “Densetnt: End-to-end trajectory prediction from dense goal sets,” in ICCV, 2021, pp. 15 303–15 312.
- [180] W.Liu and T.Mei, “Recent advances of monocular 2d and 3d human pose estimation: A deep learning perspective,” ACM Computing Surveys, 2022.
- [181] Z.Wang, Q.She, and A.Smolic, “Action-net: Multipath excitation for action recognition,” in CVPR, 2021, pp. 13 214–13 223.
Xet Storage Details
- Size:
- 70.5 kB
- Xet hash:
- a560deef2d49e0a0147c6c38407ca67d80c3525b11faec2255ceaca9879160fe
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.








