import gradio as gr import inference_2 as inference title = "Multimodal Deepfake Detector" description = """ Deepfake detection for videos, images, and audio modalities. **Example Workflow:** 1. **Upload Your Media File:** - Choose and upload a video, image, or audio file by clicking the "Upload" button in the respective tab. 2. **Select the Tab for Analysis:** - **Image Inference**: For analyzing images for face swapping or facial manipulation. - **Video Inference**: For detecting deepfakes in videos (e.g., face swaps or expressions). - **Audio Inference**: For detecting voice cloning or other audio manipulations. 3. **Review Results:** - The tool will process your file and provide a result, indicating if the media is real or fake with details on detected manipulations. 4. **Test Example Files:** - You can also try the preloaded example files to see how the model works with real and fake samples. --- **Types of Deepfakes** 1. **Face Swapping** - **Purpose:** Replaces one person’s face with another. - **Process:** Extracts facial features (e.g., eyes, nose, mouth) from a source face and blends them into the target while maintaining expressions and structure. - **Applications:** - Creating deepfake videos where someone appears to be performing actions they didn’t. - Used in movies or for entertainment purposes. 2. **Facial Manipulation** - **Purpose:** Alters or modifies the expressions or movements of the face without changing the person’s identity. - **Process:** AI detects facial landmarks and adjusts them to create new appearances or expressions (e.g., changing mouth movements or eye positions). - **Applications:** - Lip-syncing to match audio in dubbing. - Adjusting expressions for storytelling in video or animation. 3. **Voice Cloning (Audio Deepfake)** - **Purpose:** Replicates a person’s voice, allowing AI to generate speech that sounds exactly like the target person. - **Process:** AI models are trained on samples of a person’s voice to mimic tone, pitch, accent, and speech patterns. Text-to-speech tools (e.g., WaveNet, Tacotron) are commonly used. - **Applications:** - **Positive:** Voiceovers for audiobooks or films, enhancing digital assistants, or helping individuals with speech loss. - **Negative:** Impersonation for scams, fake phone calls, or spreading misinformation. --- ### Comparison Table of Deepfake Techniques | **Feature** | **Face Swapping** | **Facial Manipulation** | **Voice Cloning** | |------------------------|--------------------------------------------|-------------------------------------------|----------------------------------------| | **Primary Goal** | Replace a face entirely with another. | Modify facial expressions or movements. | Replicate a person’s voice. | | **Identity Impact** | Changes the person’s identity. | Retains the same identity. | Imitates speech, not appearance. | | **Complexity** | Requires blending two separate faces. | Alters one face. | Needs accurate voice data input. | | **Example Use Case** | Fake celebrity videos. | Lip-syncing or adjusting emotions. | Fraudulent calls or voiceovers. | --- """ description_bottom = """ **Acknowledgments:** This tool is powered by advanced AI algorithms for deepfake detection. **Team Project Contribution:** This project is a collaborative effort by a dedicated team of engineers and AI enthusiasts. We contributed significantly to: - Developing the detection pipeline for images and audio deepfakes. - Testing and optimizing the model for real-world datasets. - Designing the user interface to ensure an intuitive experience for users. We are proud of our combined efforts to create a reliable tool for identifying deepfakes across multiple modalities. """ # Define interfaces for each modality video_interface = gr.Interface( inference.deepfakes_video_predict, gr.Video(), "text", examples=["videos/celeb_synthesis.mp4", "videos/real-1.mp4"], cache_examples=False ) image_interface = gr.Interface( inference.deepfakes_image_predict, gr.Image(), "text", examples=["images/lady.jpg", "images/fake_image.jpg"], cache_examples=False ) audio_interface = gr.Interface( inference.deepfakes_spec_predict, gr.Audio(), "text", examples=["audios/DF_E_2000027.flac", "audios/DF_E_2000031.flac"], cache_examples=False ) # Combine into a Blocks container to include the description with gr.Blocks() as app: gr.Markdown(f"#