Spaces:
Sleeping
Sleeping
File size: 2,813 Bytes
ae7fe56 b713a83 464df1f b713a83 464df1f b713a83 464df1f b713a83 464df1f b713a83 ae7fe56 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ---
license: afl-3.0
title: Voice to Image
sdk: docker
emoji: 💻
colorFrom: red
colorTo: green
---
# 🎙️ Voice into Imagination
**Voice into Imagination** is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.
---
## 🛠️ How It Works
The application utilizes a multi-stage pipeline to process your input:
1. **Voice Capture**: High-fidelity audio recording through Streamlit's interface.
2. **Transcription**: Speech-to-text conversion powered by **OpenAI Whisper**.
3. **Prompt Engineering**: **GPT-4o-mini** refines your transcript into a detailed, visually-rich image prompt.
4. **Image Synthesis**: **DALL·E 3** generates a high-resolution 1024x1024 image based on the refined prompt.
---
## 📸 Real Usage Example & Workflow
Follow this step-by-step workflow to see the magic in action.
### 1. Speak Your Idea
Open the app and use the **Recorder** at the bottom. Simply click the microphone icon and describe the image you want to see.

*Status: Capturing audio and preparing for transcription.*
### 2. Live Processing
The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.

*Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.*
### 3. Behold the Result
The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.

*Example: "Create me a snow leopard" transformed into a stunning mountain scene.*
### 4. Continuous Interaction
You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.

*Example: Following up with "A blue robot."*
---
## 🚀 Getting Started
### Prerequisites
- Python 3.9+
- OpenAI API Key
### Installation
1. **Clone the repository** (or download the source).
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Configure environment**:
Copy `.env.example` to `.env` and add your `OPENAI_API_KEY`:
```bash
cp .env.example .env
```
4. **Run the application**:
```bash
streamlit run app.py
```
---
## 📂 Project Structure
- `app.py`: Main Streamlit interface with custom chat UI.
- `agent.py`: Core logic for Whisper, GPT-4, and DALL·E integration.
- `.screenshots/`: Documentation assets showing the app in action. |