VoiceToImage / README.md
niddijoris's picture
Update README.md
ae7fe56 verified
---
license: afl-3.0
title: Voice to Image
sdk: docker
emoji: 💻
colorFrom: red
colorTo: green
---
# 🎙️ Voice into Imagination
**Voice into Imagination** is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.
---
## 🛠️ How It Works
The application utilizes a multi-stage pipeline to process your input:
1. **Voice Capture**: High-fidelity audio recording through Streamlit's interface.
2. **Transcription**: Speech-to-text conversion powered by **OpenAI Whisper**.
3. **Prompt Engineering**: **GPT-4o-mini** refines your transcript into a detailed, visually-rich image prompt.
4. **Image Synthesis**: **DALL·E 3** generates a high-resolution 1024x1024 image based on the refined prompt.
---
## 📸 Real Usage Example & Workflow
Follow this step-by-step workflow to see the magic in action.
### 1. Speak Your Idea
Open the app and use the **Recorder** at the bottom. Simply click the microphone icon and describe the image you want to see.
![Recording Stage](.screenshots/1.png)
*Status: Capturing audio and preparing for transcription.*
### 2. Live Processing
The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.
![Processing Stage](.screenshots/3.png)
*Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.*
### 3. Behold the Result
The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.
![Result Stage](.screenshots/2.png)
*Example: "Create me a snow leopard" transformed into a stunning mountain scene.*
### 4. Continuous Interaction
You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.
![Multi-turn Interaction](.screenshots/4.png)
*Example: Following up with "A blue robot."*
---
## 🚀 Getting Started
### Prerequisites
- Python 3.9+
- OpenAI API Key
### Installation
1. **Clone the repository** (or download the source).
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Configure environment**:
Copy `.env.example` to `.env` and add your `OPENAI_API_KEY`:
```bash
cp .env.example .env
```
4. **Run the application**:
```bash
streamlit run app.py
```
---
## 📂 Project Structure
- `app.py`: Main Streamlit interface with custom chat UI.
- `agent.py`: Core logic for Whisper, GPT-4, and DALL·E integration.
- `.screenshots/`: Documentation assets showing the app in action.