Spaces:

niddijoris
/

VoiceToImage

Sleeping

App Files Files Community

VoiceToImage / README.md

niddijoris

Update README.md

ae7fe56 verified 4 months ago

preview code

raw

history blame contribute delete

2.81 kB

metadata

license: afl-3.0
title: Voice to Image
sdk: docker
emoji: 💻
colorFrom: red
colorTo: green

🎙️ Voice into Imagination

Voice into Imagination is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.

🛠️ How It Works

The application utilizes a multi-stage pipeline to process your input:

Voice Capture: High-fidelity audio recording through Streamlit's interface.
Transcription: Speech-to-text conversion powered by OpenAI Whisper.
Prompt Engineering: GPT-4o-mini refines your transcript into a detailed, visually-rich image prompt.
Image Synthesis: DALL·E 3 generates a high-resolution 1024x1024 image based on the refined prompt.

📸 Real Usage Example & Workflow

Follow this step-by-step workflow to see the magic in action.

1. Speak Your Idea

Open the app and use the Recorder at the bottom. Simply click the microphone icon and describe the image you want to see.

Status: Capturing audio and preparing for transcription.

2. Live Processing

The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.

Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.

3. Behold the Result

The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.

Example: "Create me a snow leopard" transformed into a stunning mountain scene.

4. Continuous Interaction

You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.

Example: Following up with "A blue robot."

🚀 Getting Started

Prerequisites

Python 3.9+
OpenAI API Key

Installation

Clone the repository (or download the source).
Install dependencies:
```
pip install -r requirements.txt
```
Configure environment: Copy .env.example to .env and add your OPENAI_API_KEY:
```
cp .env.example .env
```
Run the application:
```
streamlit run app.py
```

📂 Project Structure

app.py: Main Streamlit interface with custom chat UI.
agent.py: Core logic for Whisper, GPT-4, and DALL·E integration.
.screenshots/: Documentation assets showing the app in action.