LEGION-8B-replicate / README.md
fanqiNO1's picture
Upload folder using huggingface_hub
96ea080 verified
---
license: mit
---
# LEGION-8B-replicate
## Overview
Since the project [LEGION: Learning to Ground and Explain for Synthetic Image Detection](https://arxiv.org/abs/2503.15264) open-sourced its code repository but did not provide pre-trained weights, we replicated the model by referring to the open-source code and the paper, and are now releasing our replicated weights.
> [!NOTE]
> Due to potential discrepancies in the replication process, the released weights may achieve lower scores than officially reported results on certain benchmarks.
### Training Details
We conducted training on 4x A100 40G GPUs.
For the first training stage, the official configuration uses 8 GPUs with a global batch size of 16 (batch size per device = 2). To maintain the same global batch size, we used 4 GPUs with a per-device batch size of 4.
For the second training stage, the official configuration uses 8 GPUs with a global batch size of 512 (batch size per device = 64). We used 4 GPUs with a per-device batch size of 8 and a gradient accumulation step of 16. This results in an effective per-device batch size of 128, maintaining an equivalent global batch size of 512.
### Inference Usage
A simple inference script is provided at [infer.py](./infer.py).
Usage instructions are as follows:
```bash
cp infer.py /path/to/LEGION
python infer.py --model_path /path/to/LEGION-8B-replicate --image_root /path/to/images --save_root /path/to/results
```
### Examples
<table>
<tr>
<td><img src="./examples/image.png" alt="Original Image" style="max-width:100%;"></td>
<td><img src="./examples/image_mask.png" alt="Mask generated by LEGION-8B-replicate" style="max-width:100%;"></td>
</tr>
</table>
Upon examining the image. I have found: A cat sits on a rooftop at sunset, with its right front paw missing and the left front paw appearing deformed. To elaborate, I have found the following artifacts. Cat's right front paw :The cat's right front paw is missing. Cat's left front paw :The cat's left front paw is deformed.
## Performance
> [!NOTE]
> Due to the evaluation and metric-related code not being open-sourced, the test results may be inaccurate.
> The IoU evaluation metric for masks may be affected by mask processing during inference, resulting in lower scores.
### Localization
<table>
<tr>
<th rowspan="2">Method</th>
<th colspan="2">SynthScars</th>
<th colspan="2">LOKI</th>
<th colspan="2">RichHF-18K</th>
</tr>
<tr>
<th>mIoU</th>
<th>F1</th>
<th>mIoU</th>
<th>F1</th>
<th>mIoU</th>
<th>F1</th>
</tr>
<tr>
<td>HiFi-Net</td>
<td>45.65</td>
<td>0.57</td>
<td>39.60</td>
<td>2.41</td>
<td>44.96</td>
<td>0.39</td>
</tr>
<tr>
<td>TruFor</td>
<td>48.60</td>
<td>15.29</td>
<td>46.55</td>
<td>16.70</td>
<td>48.41</td>
<td>18.03</td>
</tr>
<tr>
<td>PAL4VST</td>
<td>56.10</td>
<td>29.21</td>
<td>47.34</td>
<td>11.58</td>
<td>49.88</td>
<td>14.78</td>
</tr>
<tr>
<td>Ferret</td>
<td>27.09</td>
<td>15.24</td>
<td>24.50</td>
<td>18.88</td>
<td>26.52</td>
<td>16.22</td>
</tr>
<tr>
<td>Griffon</td>
<td>27.68</td>
<td>16.67</td>
<td>21.96</td>
<td>20.41</td>
<td>28.13</td>
<td>18.19</td>
</tr>
<tr>
<td>LISA-v1-7B</td>
<td>34.51</td>
<td>18.77</td>
<td>31.10</td>
<td>9.29</td>
<td>35.90</td>
<td>21.94</td>
</tr>
<tr>
<td>InternVL2-8B</td>
<td>41.25</td>
<td>6.39</td>
<td>42.03</td>
<td>10.06</td>
<td>39.90</td>
<td>9.58</td>
</tr>
<tr>
<td>Qwen2-VL-72B</td>
<td>30.20</td>
<td>17.50</td>
<td>26.62</td>
<td>20.99</td>
<td>27.58</td>
<td>19.02</td>
</tr>
<tr style="background-color: #e6ffe6;">
<td>LEGION (Official)</td>
<td>58.13</td>
<td>34.54</td>
<td>48.66</td>
<td>16.71</td>
<td>50.07</td>
<td>17.41</td>
</tr>
<tr style="background-color: #e6ffe6;">
<td>LEGION (Replicate)</td>
<td>23.92</td>
<td>33.47</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</table>
### Explanation
<table>
<tr>
<th rowspan="2">Method</th>
<th rowspan="2">Params</th>
<th colspan="2">SynthScars</th>
<th colspan="2">LOKI</th>
</tr>
<tr>
<th>ROUGE-L ↑</th>
<th>CSS ↑</th>
<th>ROUGE-L ↑</th>
<th>CSS ↑</th>
</tr>
<tr>
<td>Qwen2-VL</td>
<td>72B</td>
<td>25.84</td>
<td>58.15</td>
<td>11.80</td>
<td>37.64</td>
</tr>
<tr>
<td>LLaVA-v1.6</td>
<td>7B</td>
<td>29.61</td>
<td>61.75</td>
<td>16.07</td>
<td>41.07</td>
</tr>
<tr>
<td>InternVL2</td>
<td>8B</td>
<td>25.93</td>
<td>56.89</td>
<td>10.10</td>
<td>39.62</td>
</tr>
<tr>
<td>Deepseek-VL2</td>
<td>27B</td>
<td>25.50</td>
<td>47.77</td>
<td>6.70</td>
<td>28.76</td>
</tr>
<tr>
<td>GPT-4o</td>
<td>-</td>
<td>22.43</td>
<td>53.55</td>
<td>9.61</td>
<td>38.98</td>
</tr>
<tr style="background-color: #e6ffe6;">
<td>LEGION (Official)</td>
<td>8B</td>
<td>39.50</td>
<td>72.60</td>
<td>18.55</td>
<td>45.96</td>
</tr>
<tr style="background-color: #e6ffe6;">
<td>LEGION (Replicate)</td>
<td>8B</td>
<td>50.57</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</table>
### Detection
<table>
<tr>
<th rowspan="2">Method</th>
<th rowspan="2">GANs</th>
<th rowspan="2">Deepfakes</th>
<th colspan="2">Perceptual Loss</th>
<th colspan="2">Low Level Vision</th>
<th rowspan="2">Diffusion</th>
</tr>
<tr>
<th>CRN</th>
<th>IMLE</th>
<th>SITD</th>
<th>SAN</th>
</tr>
<tr>
<td>Co-occurence</td>
<td>75.17</td>
<td>59.14</td>
<td>73.06</td>
<td>87.21</td>
<td>68.98</td>
<td>60.42</td>
<td>85.53</td>
</tr>
<tr>
<td>Freq-spec</td>
<td>75.28</td>
<td>45.18</td>
<td>53.61</td>
<td>50.98</td>
<td>47.46</td>
<td>57.12</td>
<td>69.00</td>
</tr>
<tr>
<td>CNNSpot</td>
<td>85.29</td>
<td>53.47</td>
<td>86.31</td>
<td>86.26</td>
<td>66.67</td>
<td>48.69</td>
<td>58.63</td>
</tr>
<tr>
<td>Patchfor</td>
<td>69.97</td>
<td>75.54</td>
<td>72.33</td>
<td>55.30</td>
<td>75.14</td>
<td>75.28</td>
<td>72.54</td>
</tr>
<tr>
<td>UniFD</td>
<td>95.25</td>
<td>66.60</td>
<td>59.50</td>
<td>72.00</td>
<td>63.00</td>
<td>57.50</td>
<td>82.02</td>
</tr>
<tr>
<td>LDGard</td>
<td>89.17</td>
<td>58.00</td>
<td>50.74</td>
<td>50.78</td>
<td>62.50</td>
<td>50.00</td>
<td>89.79</td>
</tr>
<tr>
<td>FreqNet</td>
<td>94.23</td>
<td>97.40</td>
<td>71.92</td>
<td>67.35</td>
<td>88.92</td>
<td>59.04</td>
<td>83.34</td>
</tr>
<tr>
<td>NPR</td>
<td>94.16</td>
<td>76.89</td>
<td>50.00</td>
<td>50.00</td>
<td>66.94</td>
<td>98.63</td>
<td>94.54</td>
</tr>
<tr style="background-color: #e6ffe6;">
<td>LEGION (Official)</td>
<td>97.01</td>
<td>63.37</td>
<td>90.78</td>
<td>98.93</td>
<td>79.44</td>
<td>57.76</td>
<td>83.10</td>
</tr>
<tr style="background-color: #e6ffe6;">
<td>LEGION (Replicate)</td>
<td>91.48</td>
<td>79.16</td>
<td>84.73</td>
<td>96.71</td>
<td>78.06</td>
<td>53.70</td>
<td>-</td>
</tr>
</table>
## Acknowledgements
Thanks to [Gennadiyev](https://github.com/Gennadiyev) for providing computational resources and moral support, and for helping me complete the reproduction.
Thanks to [draw-your-dream/LEGION](https://github.com/draw-your-dream/LEGION/tree/main) for fixing bugs in the first-stage training.