VoiceToImage / README.md
niddijoris's picture
Update README.md
ae7fe56 verified
metadata
license: afl-3.0
title: Voice to Image
sdk: docker
emoji: ๐Ÿ’ป
colorFrom: red
colorTo: green

๐ŸŽ™๏ธ Voice into Imagination

Voice into Imagination is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.


๐Ÿ› ๏ธ How It Works

The application utilizes a multi-stage pipeline to process your input:

  1. Voice Capture: High-fidelity audio recording through Streamlit's interface.
  2. Transcription: Speech-to-text conversion powered by OpenAI Whisper.
  3. Prompt Engineering: GPT-4o-mini refines your transcript into a detailed, visually-rich image prompt.
  4. Image Synthesis: DALLยทE 3 generates a high-resolution 1024x1024 image based on the refined prompt.

๐Ÿ“ธ Real Usage Example & Workflow

Follow this step-by-step workflow to see the magic in action.

1. Speak Your Idea

Open the app and use the Recorder at the bottom. Simply click the microphone icon and describe the image you want to see.

Recording Stage Status: Capturing audio and preparing for transcription.

2. Live Processing

The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.

Processing Stage Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.

3. Behold the Result

The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.

Result Stage Example: "Create me a snow leopard" transformed into a stunning mountain scene.

4. Continuous Interaction

You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.

Multi-turn Interaction Example: Following up with "A blue robot."


๐Ÿš€ Getting Started

Prerequisites

  • Python 3.9+
  • OpenAI API Key

Installation

  1. Clone the repository (or download the source).
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Configure environment: Copy .env.example to .env and add your OPENAI_API_KEY:
    cp .env.example .env
    
  4. Run the application:
    streamlit run app.py
    

๐Ÿ“‚ Project Structure

  • app.py: Main Streamlit interface with custom chat UI.
  • agent.py: Core logic for Whisper, GPT-4, and DALLยทE integration.
  • .screenshots/: Documentation assets showing the app in action.