File size: 6,608 Bytes
5707448
 
 
4283759
 
 
612c930
fc5039b
cb9b2d8
 
 
 
 
 
 
 
 
 
 
 
 
612c930
f10c203
 
9877905
612c930
6ee39fe
 
 
 
 
 
612c930
6ee39fe
 
 
 
 
 
612c930
6ee39fe
 
 
 
 
 
612c930
6ee39fe
612c930
6ee39fe
612c930
6ee39fe
 
 
 
 
 
612c930
f10c203
 
4283759
5707448
785672a
82d3772
785672a
a5680b9
612c930
22c1be9
a5680b9
 
22c1be9
 
 
612c930
22c1be9
612c930
785672a
 
4283759
 
 
 
 
 
 
 
5707448
4283759
 
 
 
 
 
 
5707448
4283759
 
 
 
 
 
5707448
 
4283759
 
fad30a5
4283759
e1c430d
02122ae
fc5039b
e1c430d
612c930
efb3668
e1c430d
 
bd55ce5
f32f09f
e1c430d
 
b1a5e5e
e1c430d
 
2754762
3a47659
e1c430d
 
2754762
3a47659
e1c430d
 
4283759
 
 
 
99114c1
4283759
5707448
4283759
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
import gradio as gr
import inference_2 as inference

title = "Multimodal Deepfake Detector"
description = """
Deepfake detection for videos, images, and audio modalities.

**Example Workflow:**
1. **Upload Your Media File:**
   - Choose and upload a video, image, or audio file by clicking the "Upload" button in the respective tab.

2. **Select the Tab for Analysis:**
   - **Image Inference**: For analyzing images for face swapping or facial manipulation.
   - **Video Inference**: For detecting deepfakes in videos (e.g., face swaps or expressions).
   - **Audio Inference**: For detecting voice cloning or other audio manipulations.

3. **Review Results:**
   - The tool will process your file and provide a result, indicating if the media is real or fake with details on detected manipulations.

4. **Test Example Files:**
   - You can also try the preloaded example files to see how the model works with real and fake samples.

---

**Types of Deepfakes**

1. **Face Swapping**
   - **Purpose:** Replaces one person’s face with another.
   - **Process:** Extracts facial features (e.g., eyes, nose, mouth) from a source face and blends them into the target while maintaining expressions and structure.
   - **Applications:**
     - Creating deepfake videos where someone appears to be performing actions they didn’t.
     - Used in movies or for entertainment purposes.
     
2. **Facial Manipulation**
   - **Purpose:** Alters or modifies the expressions or movements of the face without changing the person’s identity.
   - **Process:** AI detects facial landmarks and adjusts them to create new appearances or expressions (e.g., changing mouth movements or eye positions).
   - **Applications:**
     - Lip-syncing to match audio in dubbing.
     - Adjusting expressions for storytelling in video or animation.
     
3. **Voice Cloning (Audio Deepfake)**
   - **Purpose:** Replicates a person’s voice, allowing AI to generate speech that sounds exactly like the target person.
   - **Process:** AI models are trained on samples of a person’s voice to mimic tone, pitch, accent, and speech patterns. Text-to-speech tools (e.g., WaveNet, Tacotron) are commonly used.
   - **Applications:**
     - **Positive:** Voiceovers for audiobooks or films, enhancing digital assistants, or helping individuals with speech loss.
     - **Negative:** Impersonation for scams, fake phone calls, or spreading misinformation.
     
---

### Comparison Table of Deepfake Techniques

| **Feature**           | **Face Swapping**                          | **Facial Manipulation**                   | **Voice Cloning**                     |
|------------------------|--------------------------------------------|-------------------------------------------|----------------------------------------|
| **Primary Goal**       | Replace a face entirely with another.      | Modify facial expressions or movements.   | Replicate a person’s voice.           |
| **Identity Impact**    | Changes the person’s identity.             | Retains the same identity.                | Imitates speech, not appearance.      |
| **Complexity**         | Requires blending two separate faces.      | Alters one face.                          | Needs accurate voice data input.      |
| **Example Use Case**   | Fake celebrity videos.                     | Lip-syncing or adjusting emotions.        | Fraudulent calls or voiceovers.       |

---

"""

description_bottom = """

**Acknowledgments:**
This tool is powered by advanced AI algorithms for deepfake detection. 

**Team Project Contribution:**
This project is a collaborative effort by a dedicated team of engineers and AI enthusiasts.  
We contributed significantly to:
- Developing the detection pipeline for images and audio deepfakes.
- Testing and optimizing the model for real-world datasets.
- Designing the user interface to ensure an intuitive experience for users.

We are proud of our combined efforts to create a reliable tool for identifying deepfakes across multiple modalities.

"""

# Define interfaces for each modality
video_interface = gr.Interface(
    inference.deepfakes_video_predict,
    gr.Video(),
    "text",
    examples=["videos/celeb_synthesis.mp4", "videos/real-1.mp4"],
    cache_examples=False
)

image_interface = gr.Interface(
    inference.deepfakes_image_predict,
    gr.Image(),
    "text",
    examples=["images/lady.jpg", "images/fake_image.jpg"],
    cache_examples=False
)

audio_interface = gr.Interface(
    inference.deepfakes_spec_predict,
    gr.Audio(),
    "text",
    examples=["audios/DF_E_2000027.flac", "audios/DF_E_2000031.flac"],
    cache_examples=False
)

# Combine into a Blocks container to include the description
with gr.Blocks() as app:
    gr.Markdown(f"# <center>{title}</center>")
    gr.Markdown(description)
    
    # Display images and descriptions with larger sizes
    with gr.Row():
        gr.Column([
            gr.Image("images/Deepfake 1.png", label="Real Example", elem_id="real-image", show_label=False, interactive=False),
            gr.Markdown("**Description:** A deepfake example where the face has been swapped with another. It demonstrates facial manipulation, where emotions are altered.")
        ])
        gr.Column([
            gr.Image("images/fakeface.jpg", label="Deepfake Example", elem_id="fake-image", show_label=False, interactive=False),
            gr.Markdown("**Description:** The process of detecting deepfake images by splitting the dataset into real and fake faces, training a hyper-parameterized AI model.")
        ])
    
    # Additional images and descriptions with larger sizes
    with gr.Row():
        gr.Column([
            gr.Image("images/fakeaudio1.png", label="Additional Real Image", elem_id="extra-image-1", show_label=False, interactive=False),
            gr.Markdown("**Description:** Two-phase approach for synthetic speech detection: the Sound Segmentation Phase and the Synthetic Speech Detection Phase.")
        ])
        gr.Column([
            gr.Image("images/fakeaudio.png", label="Additional Deepfake Image", elem_id="extra-image-2", show_label=False, interactive=False),
            gr.Markdown("**Description:** Audio deepfake detection: training learns from real/fake data, while detection classifies audio using extracted features.")
        ])
    
    gr.TabbedInterface(
        interface_list=[image_interface, video_interface, audio_interface],
        tab_names=['Image Inference', 'Video Inference', 'Audio Inference']
    )
    gr.Markdown(description_bottom)

if __name__ == '__main__':
    app.launch(share=False)