zzha6204 commited on
Commit
47ac4a4
·
verified ·
1 Parent(s): 1f3cb2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -1
README.md CHANGED
@@ -4,4 +4,153 @@ tags:
4
  - multimodal
5
  - classification
6
  - content detection
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - multimodal
5
  - classification
6
  - content detection
7
+ ---
8
+
9
+ # LanguageBind-MLP Model
10
+
11
+ ## Model Description
12
+
13
+ This is a fine-tuned LanguageBind model for detecting machine-generated content across multiple modalities (text, image, and audio). The model is part of the **RU-AI** project, which introduces a large multimodal dataset for AI-generated content detection.
14
+
15
+ This model leverages LanguageBind's multi-modal semantic alignment capabilities to identify whether content is human-generated or machine-generated across different modalities.
16
+
17
+ ## Model Details
18
+
19
+ - **Model Type:** Multi-modal classification model based on LanguageBind
20
+ - **Architecture:** LanguageBind with MLP classifier head
21
+ - **Paper:** [RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection](https://arxiv.org/abs/2406.04906)
22
+ - **GitHub Repository:** [ZhihaoZhang97/RU-AI](https://github.com/ZhihaoZhang97/RU-AI)
23
+ - **Accepted at:** WWW'25 Resource Track
24
+ - **Modalities Supported:** Text, Image, and Audio
25
+
26
+ ## Intended Use
27
+
28
+ This model is designed for detecting AI-generated content in:
29
+ - **Text:** Identifying AI-written articles, essays, responses, and general text
30
+ - **Images:** Detecting images generated by models like Stable Diffusion, DALL-E, etc.
31
+ - **Audio:** Identifying synthetic speech from TTS models
32
+
33
+ ### Use Cases
34
+ - Content moderation and authenticity verification
35
+ - Academic integrity checking
36
+ - Media forensics and fact-checking
37
+ - Research on AI-generated content detection
38
+
39
+ ## Training Data
40
+
41
+ The model was trained on the **RU-AI dataset**, which includes:
42
+ - **245,895** real/human-generated samples
43
+ - **1,229,475** machine-generated samples
44
+ - Multiple data sources: COCO, Flickr8k, Places dataset
45
+ - AI-generated content from various models:
46
+ - Images: Stable Diffusion (v1.5, v6.0, XL v3.0, AbsoluteReality, EpicRealism)
47
+ - Audio: EfficientSpeech, StyleTTS2, VITS, XTTS2, YourTTS
48
+ - Text: Various LLM-generated captions and descriptions
49
+
50
+ Dataset is publicly available at [Zenodo](https://zenodo.org/records/11406538).
51
+
52
+ ## Requirements
53
+
54
+ ### Hardware
55
+ - NVIDIA GPU with at least **16GB VRAM** (RTX 3090 24GB or higher recommended)
56
+ - At least **500GB** disk space for the full dataset
57
+
58
+ ### Software
59
+ - Python >= 3.8
60
+ - PyTorch >= 1.13.1
61
+ - CUDA >= 11.6
62
+
63
+ ## Installation
64
+
65
+ ```bash
66
+ # Clone the repository
67
+ git clone https://github.com/ZhihaoZhang97/RU-AI.git
68
+ cd RU-AI
69
+
70
+ # Create virtual environment
71
+ conda create -n ruai python=3.8
72
+ conda activate ruai
73
+
74
+ # Install dependencies
75
+ pip3 install -r requirements.txt
76
+ ```
77
+
78
+ ## Usage
79
+
80
+ ### Model Inference
81
+
82
+ ```python
83
+ # See infer_languagebind_model.py in the GitHub repository
84
+ python infer_languagebind_model.py
85
+ ```
86
+
87
+ Before running inference, you need to:
88
+ 1. Download the dataset or prepare your own data
89
+ 2. Update the data paths in `infer_languagebind_model.py`:
90
+ - `image_data_paths`
91
+ - `audio_data_paths`
92
+ - `text_data`
93
+
94
+ ### Quick Start with Sample Data
95
+
96
+ ```bash
97
+ # Download Flickr8k sample data
98
+ python ./download_flickr.py
99
+
100
+ # Or download the full dataset (157GB compressed, 500GB uncompressed)
101
+ python ./download_all.py
102
+ ```
103
+
104
+ ## Model Performance
105
+
106
+ This model is designed to detect AI-generated content across multiple modalities simultaneously, leveraging LanguageBind's language-based semantic alignment to create unified representations.
107
+
108
+ For detailed performance metrics and evaluation results, please refer to the [paper](https://arxiv.org/abs/2406.04906).
109
+
110
+ ## Limitations
111
+
112
+ - The model's performance depends on the quality and diversity of training data
113
+ - May not generalize well to AI models or techniques not represented in the training set
114
+ - Detection accuracy may vary across different modalities
115
+ - Requires significant computational resources for inference
116
+
117
+ ## Ethical Considerations
118
+
119
+ This model is intended for research and legitimate content verification purposes. Users should:
120
+ - Consider privacy implications when analyzing user-generated content
121
+ - Be aware of potential biases in training data
122
+ - Use the model responsibly and not for censorship without human oversight
123
+ - Understand that detection is probabilistic and may produce false positives/negatives
124
+
125
+ ## Citation
126
+
127
+ If you use this model in your research, please cite:
128
+
129
+ ```bibtex
130
+ @misc{huang2024ruai,
131
+ title={RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection},
132
+ author={Liting Huang and Zhihao Zhang and Yiran Zhang and Xiyue Zhou and Shoujin Wang},
133
+ year={2024},
134
+ eprint={2406.04906},
135
+ archivePrefix={arXiv},
136
+ primaryClass={cs.CV}
137
+ }
138
+ ```
139
+
140
+ ## Acknowledgments
141
+
142
+ This work builds upon:
143
+ - [LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment](https://arxiv.org/abs/2310.01852)
144
+ - [ImageBind: One Embedding Space To Bind Them All](https://openaccess.thecvf.com/content/CVPR2023/papers/Girdhar_ImageBind_One_Embedding_Space_To_Bind_Them_All_CVPR_2023_paper.pdf)
145
+
146
+ We appreciate the open-source community for the datasets and models that made this work possible.
147
+
148
+ ## License
149
+
150
+ Please refer to the [GitHub repository](https://github.com/ZhihaoZhang97/RU-AI) for license information.
151
+
152
+ ## Contact
153
+
154
+ For questions and issues:
155
+ - Open an issue on the [GitHub repository](https://github.com/ZhihaoZhang97/RU-AI)
156
+ - Refer to the paper for contact information of the authors