Spaces:

jonloporto
/

ImageToVoiceForClass

Sleeping

App Files Files Community

jonloporto commited on 27 days ago

Commit

3c5d69c

verified ·

1 Parent(s): f1a84e4

Upload 3 files

Browse files

Files changed (3) hide show

README.md +28 -12
app.py +84 -0
requirements.txt +6 -0

README.md CHANGED Viewed

@@ -1,12 +1,28 @@
----
-title: ImageToVoiceForClass
-emoji: 🐠
-colorFrom: yellow
-colorTo: green
-sdk: gradio
-sdk_version: 6.2.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Image to Voice
+emoji: 🎤
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: false
+---
+# Image to Voice Converter
+Convert images to text descriptions and then to speech audio!
+## How it works
+1. Upload an image
+2. The AI analyzes the image and generates a text description
+3. The text is converted to speech using a text-to-speech model
+4. Download the audio file
+## Technologies Used
+- **Hugging Face Transformers**: For image-to-text conversion
+- **Supertonic TTS**: For text-to-speech synthesis
+- **Gradio**: For the web interface

app.py ADDED Viewed

	@@ -0,0 +1,84 @@

+# -*- coding: utf-8 -*-
+"""
+Image to Voice - Hugging Face Spaces
+Converts images to text and then to speech
+"""
+import gradio as gr
+from supertonic import TTS
+from transformers import pipeline
+# Initialize the image-to-text pipeline
+image_to_text = pipeline("image-to-text")
+# Initialize TTS (will be loaded on first use)
+tts = None
+def get_tts():
+    """Lazy load TTS to avoid loading on startup"""
+    global tts
+    if tts is None:
+        tts = TTS(auto_download=True)
+    return tts
+def image_to_voice(image):
+    """
+    Convert image to text and then to speech
+    Args:
+        image: PIL Image or numpy array from Gradio
+    Returns:
+        tuple: (audio_file_path, text_description)
+    """
+    if image is None:
+        return None, "Please upload an image."
+    try:
+        # Convert image to text
+        result = image_to_text(image)
+        text = result[0]['generated_text']
+        # Convert text to speech
+        tts_model = get_tts()
+        style = tts_model.get_voice_style(voice_name="M5")
+        wav, duration = tts_model.synthesize(text, voice_style=style)
+        # Save audio to a temporary file
+        output_path = "output.wav"
+        tts_model.save_audio(wav, output_path)
+        return output_path, text
+    except Exception as e:
+        return None, f"Error: {str(e)}"
+# Create Gradio interface
+with gr.Blocks(title="Image to Voice") as demo:
+    gr.Markdown("# 🖼️ Image to Voice Converter")
+    gr.Markdown("Upload an image and get an audio description of it!")
+    with gr.Row():
+        with gr.Column():
+            image_input = gr.Image(type="pil", label="Upload Image")
+            generate_btn = gr.Button("Generate Audio", variant="primary")
+        with gr.Column():
+            audio_output = gr.Audio(label="Generated Audio", type="filepath")
+            text_output = gr.Textbox(label="Image Description", lines=5)
+    generate_btn.click(
+        fn=image_to_voice,
+        inputs=image_input,
+        outputs=[audio_output, text_output]
+    )
+    gr.Examples(
+        examples=[],
+        inputs=image_input,
+        label="Example Images (add your own examples)"
+    )
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+transformers
+supertonic
+gradio
+torch
+torchaudio