Spaces:
Runtime error
Runtime error
File size: 6,608 Bytes
5707448 4283759 612c930 fc5039b cb9b2d8 612c930 f10c203 9877905 612c930 6ee39fe 612c930 6ee39fe 612c930 6ee39fe 612c930 6ee39fe 612c930 6ee39fe 612c930 6ee39fe 612c930 f10c203 4283759 5707448 785672a 82d3772 785672a a5680b9 612c930 22c1be9 a5680b9 22c1be9 612c930 22c1be9 612c930 785672a 4283759 5707448 4283759 5707448 4283759 5707448 4283759 fad30a5 4283759 e1c430d 02122ae fc5039b e1c430d 612c930 efb3668 e1c430d bd55ce5 f32f09f e1c430d b1a5e5e e1c430d 2754762 3a47659 e1c430d 2754762 3a47659 e1c430d 4283759 99114c1 4283759 5707448 4283759 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | import gradio as gr
import inference_2 as inference
title = "Multimodal Deepfake Detector"
description = """
Deepfake detection for videos, images, and audio modalities.
**Example Workflow:**
1. **Upload Your Media File:**
- Choose and upload a video, image, or audio file by clicking the "Upload" button in the respective tab.
2. **Select the Tab for Analysis:**
- **Image Inference**: For analyzing images for face swapping or facial manipulation.
- **Video Inference**: For detecting deepfakes in videos (e.g., face swaps or expressions).
- **Audio Inference**: For detecting voice cloning or other audio manipulations.
3. **Review Results:**
- The tool will process your file and provide a result, indicating if the media is real or fake with details on detected manipulations.
4. **Test Example Files:**
- You can also try the preloaded example files to see how the model works with real and fake samples.
---
**Types of Deepfakes**
1. **Face Swapping**
- **Purpose:** Replaces one person’s face with another.
- **Process:** Extracts facial features (e.g., eyes, nose, mouth) from a source face and blends them into the target while maintaining expressions and structure.
- **Applications:**
- Creating deepfake videos where someone appears to be performing actions they didn’t.
- Used in movies or for entertainment purposes.
2. **Facial Manipulation**
- **Purpose:** Alters or modifies the expressions or movements of the face without changing the person’s identity.
- **Process:** AI detects facial landmarks and adjusts them to create new appearances or expressions (e.g., changing mouth movements or eye positions).
- **Applications:**
- Lip-syncing to match audio in dubbing.
- Adjusting expressions for storytelling in video or animation.
3. **Voice Cloning (Audio Deepfake)**
- **Purpose:** Replicates a person’s voice, allowing AI to generate speech that sounds exactly like the target person.
- **Process:** AI models are trained on samples of a person’s voice to mimic tone, pitch, accent, and speech patterns. Text-to-speech tools (e.g., WaveNet, Tacotron) are commonly used.
- **Applications:**
- **Positive:** Voiceovers for audiobooks or films, enhancing digital assistants, or helping individuals with speech loss.
- **Negative:** Impersonation for scams, fake phone calls, or spreading misinformation.
---
### Comparison Table of Deepfake Techniques
| **Feature** | **Face Swapping** | **Facial Manipulation** | **Voice Cloning** |
|------------------------|--------------------------------------------|-------------------------------------------|----------------------------------------|
| **Primary Goal** | Replace a face entirely with another. | Modify facial expressions or movements. | Replicate a person’s voice. |
| **Identity Impact** | Changes the person’s identity. | Retains the same identity. | Imitates speech, not appearance. |
| **Complexity** | Requires blending two separate faces. | Alters one face. | Needs accurate voice data input. |
| **Example Use Case** | Fake celebrity videos. | Lip-syncing or adjusting emotions. | Fraudulent calls or voiceovers. |
---
"""
description_bottom = """
**Acknowledgments:**
This tool is powered by advanced AI algorithms for deepfake detection.
**Team Project Contribution:**
This project is a collaborative effort by a dedicated team of engineers and AI enthusiasts.
We contributed significantly to:
- Developing the detection pipeline for images and audio deepfakes.
- Testing and optimizing the model for real-world datasets.
- Designing the user interface to ensure an intuitive experience for users.
We are proud of our combined efforts to create a reliable tool for identifying deepfakes across multiple modalities.
"""
# Define interfaces for each modality
video_interface = gr.Interface(
inference.deepfakes_video_predict,
gr.Video(),
"text",
examples=["videos/celeb_synthesis.mp4", "videos/real-1.mp4"],
cache_examples=False
)
image_interface = gr.Interface(
inference.deepfakes_image_predict,
gr.Image(),
"text",
examples=["images/lady.jpg", "images/fake_image.jpg"],
cache_examples=False
)
audio_interface = gr.Interface(
inference.deepfakes_spec_predict,
gr.Audio(),
"text",
examples=["audios/DF_E_2000027.flac", "audios/DF_E_2000031.flac"],
cache_examples=False
)
# Combine into a Blocks container to include the description
with gr.Blocks() as app:
gr.Markdown(f"# <center>{title}</center>")
gr.Markdown(description)
# Display images and descriptions with larger sizes
with gr.Row():
gr.Column([
gr.Image("images/Deepfake 1.png", label="Real Example", elem_id="real-image", show_label=False, interactive=False),
gr.Markdown("**Description:** A deepfake example where the face has been swapped with another. It demonstrates facial manipulation, where emotions are altered.")
])
gr.Column([
gr.Image("images/fakeface.jpg", label="Deepfake Example", elem_id="fake-image", show_label=False, interactive=False),
gr.Markdown("**Description:** The process of detecting deepfake images by splitting the dataset into real and fake faces, training a hyper-parameterized AI model.")
])
# Additional images and descriptions with larger sizes
with gr.Row():
gr.Column([
gr.Image("images/fakeaudio1.png", label="Additional Real Image", elem_id="extra-image-1", show_label=False, interactive=False),
gr.Markdown("**Description:** Two-phase approach for synthetic speech detection: the Sound Segmentation Phase and the Synthetic Speech Detection Phase.")
])
gr.Column([
gr.Image("images/fakeaudio.png", label="Additional Deepfake Image", elem_id="extra-image-2", show_label=False, interactive=False),
gr.Markdown("**Description:** Audio deepfake detection: training learns from real/fake data, while detection classifies audio using extracted features.")
])
gr.TabbedInterface(
interface_list=[image_interface, video_interface, audio_interface],
tab_names=['Image Inference', 'Video Inference', 'Audio Inference']
)
gr.Markdown(description_bottom)
if __name__ == '__main__':
app.launch(share=False)
|