Spaces:
Sleeping
license: afl-3.0
title: Voice to Image
sdk: docker
emoji: ๐ป
colorFrom: red
colorTo: green
๐๏ธ Voice into Imagination
Voice into Imagination is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.
๐ ๏ธ How It Works
The application utilizes a multi-stage pipeline to process your input:
- Voice Capture: High-fidelity audio recording through Streamlit's interface.
- Transcription: Speech-to-text conversion powered by OpenAI Whisper.
- Prompt Engineering: GPT-4o-mini refines your transcript into a detailed, visually-rich image prompt.
- Image Synthesis: DALLยทE 3 generates a high-resolution 1024x1024 image based on the refined prompt.
๐ธ Real Usage Example & Workflow
Follow this step-by-step workflow to see the magic in action.
1. Speak Your Idea
Open the app and use the Recorder at the bottom. Simply click the microphone icon and describe the image you want to see.
Status: Capturing audio and preparing for transcription.
2. Live Processing
The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.
Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.
3. Behold the Result
The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.
Example: "Create me a snow leopard" transformed into a stunning mountain scene.
4. Continuous Interaction
You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.
Example: Following up with "A blue robot."
๐ Getting Started
Prerequisites
- Python 3.9+
- OpenAI API Key
Installation
- Clone the repository (or download the source).
- Install dependencies:
pip install -r requirements.txt - Configure environment:
Copy
.env.exampleto.envand add yourOPENAI_API_KEY:cp .env.example .env - Run the application:
streamlit run app.py
๐ Project Structure
app.py: Main Streamlit interface with custom chat UI.agent.py: Core logic for Whisper, GPT-4, and DALLยทE integration..screenshots/: Documentation assets showing the app in action.