|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# LEGION-8B-replicate |
|
|
|
|
|
## Overview |
|
|
|
|
|
Since the project [LEGION: Learning to Ground and Explain for Synthetic Image Detection](https://arxiv.org/abs/2503.15264) open-sourced its code repository but did not provide pre-trained weights, we replicated the model by referring to the open-source code and the paper, and are now releasing our replicated weights. |
|
|
|
|
|
> [!NOTE] |
|
|
> Due to potential discrepancies in the replication process, the released weights may achieve lower scores than officially reported results on certain benchmarks. |
|
|
|
|
|
### Training Details |
|
|
|
|
|
We conducted training on 4x A100 40G GPUs. |
|
|
|
|
|
For the first training stage, the official configuration uses 8 GPUs with a global batch size of 16 (batch size per device = 2). To maintain the same global batch size, we used 4 GPUs with a per-device batch size of 4. |
|
|
|
|
|
For the second training stage, the official configuration uses 8 GPUs with a global batch size of 512 (batch size per device = 64). We used 4 GPUs with a per-device batch size of 8 and a gradient accumulation step of 16. This results in an effective per-device batch size of 128, maintaining an equivalent global batch size of 512. |
|
|
|
|
|
### Inference Usage |
|
|
|
|
|
A simple inference script is provided at [infer.py](./infer.py). |
|
|
|
|
|
Usage instructions are as follows: |
|
|
|
|
|
```bash |
|
|
cp infer.py /path/to/LEGION |
|
|
python infer.py --model_path /path/to/LEGION-8B-replicate --image_root /path/to/images --save_root /path/to/results |
|
|
``` |
|
|
|
|
|
### Examples |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td><img src="./examples/image.png" alt="Original Image" style="max-width:100%;"></td> |
|
|
<td><img src="./examples/image_mask.png" alt="Mask generated by LEGION-8B-replicate" style="max-width:100%;"></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
Upon examining the image. I have found: A cat sits on a rooftop at sunset, with its right front paw missing and the left front paw appearing deformed. To elaborate, I have found the following artifacts. Cat's right front paw :The cat's right front paw is missing. Cat's left front paw :The cat's left front paw is deformed. |
|
|
|
|
|
## Performance |
|
|
|
|
|
> [!NOTE] |
|
|
> Due to the evaluation and metric-related code not being open-sourced, the test results may be inaccurate. |
|
|
> The IoU evaluation metric for masks may be affected by mask processing during inference, resulting in lower scores. |
|
|
|
|
|
### Localization |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<th rowspan="2">Method</th> |
|
|
<th colspan="2">SynthScars</th> |
|
|
<th colspan="2">LOKI</th> |
|
|
<th colspan="2">RichHF-18K</th> |
|
|
</tr> |
|
|
<tr> |
|
|
<th>mIoU</th> |
|
|
<th>F1</th> |
|
|
<th>mIoU</th> |
|
|
<th>F1</th> |
|
|
<th>mIoU</th> |
|
|
<th>F1</th> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>HiFi-Net</td> |
|
|
<td>45.65</td> |
|
|
<td>0.57</td> |
|
|
<td>39.60</td> |
|
|
<td>2.41</td> |
|
|
<td>44.96</td> |
|
|
<td>0.39</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>TruFor</td> |
|
|
<td>48.60</td> |
|
|
<td>15.29</td> |
|
|
<td>46.55</td> |
|
|
<td>16.70</td> |
|
|
<td>48.41</td> |
|
|
<td>18.03</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>PAL4VST</td> |
|
|
<td>56.10</td> |
|
|
<td>29.21</td> |
|
|
<td>47.34</td> |
|
|
<td>11.58</td> |
|
|
<td>49.88</td> |
|
|
<td>14.78</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Ferret</td> |
|
|
<td>27.09</td> |
|
|
<td>15.24</td> |
|
|
<td>24.50</td> |
|
|
<td>18.88</td> |
|
|
<td>26.52</td> |
|
|
<td>16.22</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Griffon</td> |
|
|
<td>27.68</td> |
|
|
<td>16.67</td> |
|
|
<td>21.96</td> |
|
|
<td>20.41</td> |
|
|
<td>28.13</td> |
|
|
<td>18.19</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>LISA-v1-7B</td> |
|
|
<td>34.51</td> |
|
|
<td>18.77</td> |
|
|
<td>31.10</td> |
|
|
<td>9.29</td> |
|
|
<td>35.90</td> |
|
|
<td>21.94</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>InternVL2-8B</td> |
|
|
<td>41.25</td> |
|
|
<td>6.39</td> |
|
|
<td>42.03</td> |
|
|
<td>10.06</td> |
|
|
<td>39.90</td> |
|
|
<td>9.58</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Qwen2-VL-72B</td> |
|
|
<td>30.20</td> |
|
|
<td>17.50</td> |
|
|
<td>26.62</td> |
|
|
<td>20.99</td> |
|
|
<td>27.58</td> |
|
|
<td>19.02</td> |
|
|
</tr> |
|
|
<tr style="background-color: #e6ffe6;"> |
|
|
<td>LEGION (Official)</td> |
|
|
<td>58.13</td> |
|
|
<td>34.54</td> |
|
|
<td>48.66</td> |
|
|
<td>16.71</td> |
|
|
<td>50.07</td> |
|
|
<td>17.41</td> |
|
|
</tr> |
|
|
<tr style="background-color: #e6ffe6;"> |
|
|
<td>LEGION (Replicate)</td> |
|
|
<td>23.92</td> |
|
|
<td>33.47</td> |
|
|
<td>-</td> |
|
|
<td>-</td> |
|
|
<td>-</td> |
|
|
<td>-</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
### Explanation |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<th rowspan="2">Method</th> |
|
|
<th rowspan="2">Params</th> |
|
|
<th colspan="2">SynthScars</th> |
|
|
<th colspan="2">LOKI</th> |
|
|
</tr> |
|
|
<tr> |
|
|
<th>ROUGE-L ↑</th> |
|
|
<th>CSS ↑</th> |
|
|
<th>ROUGE-L ↑</th> |
|
|
<th>CSS ↑</th> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Qwen2-VL</td> |
|
|
<td>72B</td> |
|
|
<td>25.84</td> |
|
|
<td>58.15</td> |
|
|
<td>11.80</td> |
|
|
<td>37.64</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>LLaVA-v1.6</td> |
|
|
<td>7B</td> |
|
|
<td>29.61</td> |
|
|
<td>61.75</td> |
|
|
<td>16.07</td> |
|
|
<td>41.07</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>InternVL2</td> |
|
|
<td>8B</td> |
|
|
<td>25.93</td> |
|
|
<td>56.89</td> |
|
|
<td>10.10</td> |
|
|
<td>39.62</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Deepseek-VL2</td> |
|
|
<td>27B</td> |
|
|
<td>25.50</td> |
|
|
<td>47.77</td> |
|
|
<td>6.70</td> |
|
|
<td>28.76</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>GPT-4o</td> |
|
|
<td>-</td> |
|
|
<td>22.43</td> |
|
|
<td>53.55</td> |
|
|
<td>9.61</td> |
|
|
<td>38.98</td> |
|
|
</tr> |
|
|
<tr style="background-color: #e6ffe6;"> |
|
|
<td>LEGION (Official)</td> |
|
|
<td>8B</td> |
|
|
<td>39.50</td> |
|
|
<td>72.60</td> |
|
|
<td>18.55</td> |
|
|
<td>45.96</td> |
|
|
</tr> |
|
|
<tr style="background-color: #e6ffe6;"> |
|
|
<td>LEGION (Replicate)</td> |
|
|
<td>8B</td> |
|
|
<td>50.57</td> |
|
|
<td>-</td> |
|
|
<td>-</td> |
|
|
<td>-</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
### Detection |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<th rowspan="2">Method</th> |
|
|
<th rowspan="2">GANs</th> |
|
|
<th rowspan="2">Deepfakes</th> |
|
|
<th colspan="2">Perceptual Loss</th> |
|
|
<th colspan="2">Low Level Vision</th> |
|
|
<th rowspan="2">Diffusion</th> |
|
|
</tr> |
|
|
<tr> |
|
|
<th>CRN</th> |
|
|
<th>IMLE</th> |
|
|
<th>SITD</th> |
|
|
<th>SAN</th> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Co-occurence</td> |
|
|
<td>75.17</td> |
|
|
<td>59.14</td> |
|
|
<td>73.06</td> |
|
|
<td>87.21</td> |
|
|
<td>68.98</td> |
|
|
<td>60.42</td> |
|
|
<td>85.53</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Freq-spec</td> |
|
|
<td>75.28</td> |
|
|
<td>45.18</td> |
|
|
<td>53.61</td> |
|
|
<td>50.98</td> |
|
|
<td>47.46</td> |
|
|
<td>57.12</td> |
|
|
<td>69.00</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>CNNSpot</td> |
|
|
<td>85.29</td> |
|
|
<td>53.47</td> |
|
|
<td>86.31</td> |
|
|
<td>86.26</td> |
|
|
<td>66.67</td> |
|
|
<td>48.69</td> |
|
|
<td>58.63</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Patchfor</td> |
|
|
<td>69.97</td> |
|
|
<td>75.54</td> |
|
|
<td>72.33</td> |
|
|
<td>55.30</td> |
|
|
<td>75.14</td> |
|
|
<td>75.28</td> |
|
|
<td>72.54</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>UniFD</td> |
|
|
<td>95.25</td> |
|
|
<td>66.60</td> |
|
|
<td>59.50</td> |
|
|
<td>72.00</td> |
|
|
<td>63.00</td> |
|
|
<td>57.50</td> |
|
|
<td>82.02</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>LDGard</td> |
|
|
<td>89.17</td> |
|
|
<td>58.00</td> |
|
|
<td>50.74</td> |
|
|
<td>50.78</td> |
|
|
<td>62.50</td> |
|
|
<td>50.00</td> |
|
|
<td>89.79</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>FreqNet</td> |
|
|
<td>94.23</td> |
|
|
<td>97.40</td> |
|
|
<td>71.92</td> |
|
|
<td>67.35</td> |
|
|
<td>88.92</td> |
|
|
<td>59.04</td> |
|
|
<td>83.34</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>NPR</td> |
|
|
<td>94.16</td> |
|
|
<td>76.89</td> |
|
|
<td>50.00</td> |
|
|
<td>50.00</td> |
|
|
<td>66.94</td> |
|
|
<td>98.63</td> |
|
|
<td>94.54</td> |
|
|
</tr> |
|
|
<tr style="background-color: #e6ffe6;"> |
|
|
<td>LEGION (Official)</td> |
|
|
<td>97.01</td> |
|
|
<td>63.37</td> |
|
|
<td>90.78</td> |
|
|
<td>98.93</td> |
|
|
<td>79.44</td> |
|
|
<td>57.76</td> |
|
|
<td>83.10</td> |
|
|
</tr> |
|
|
<tr style="background-color: #e6ffe6;"> |
|
|
<td>LEGION (Replicate)</td> |
|
|
<td>91.48</td> |
|
|
<td>79.16</td> |
|
|
<td>84.73</td> |
|
|
<td>96.71</td> |
|
|
<td>78.06</td> |
|
|
<td>53.70</td> |
|
|
<td>-</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
Thanks to [Gennadiyev](https://github.com/Gennadiyev) for providing computational resources and moral support, and for helping me complete the reproduction. |
|
|
|
|
|
Thanks to [draw-your-dream/LEGION](https://github.com/draw-your-dream/LEGION/tree/main) for fixing bugs in the first-stage training. |
|
|
|