File size: 2,813 Bytes
ae7fe56
 
 
 
 
 
 
 
b713a83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
464df1f
b713a83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
464df1f
 
b713a83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
464df1f
b713a83
464df1f
b713a83
 
ae7fe56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: afl-3.0
title: Voice to Image
sdk: docker
emoji: 💻
colorFrom: red
colorTo: green
---
# 🎙️ Voice into Imagination

**Voice into Imagination** is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.

---

## 🛠️ How It Works

The application utilizes a multi-stage pipeline to process your input:

1.  **Voice Capture**: High-fidelity audio recording through Streamlit's interface.
2.  **Transcription**: Speech-to-text conversion powered by **OpenAI Whisper**.
3.  **Prompt Engineering**: **GPT-4o-mini** refines your transcript into a detailed, visually-rich image prompt.
4.  **Image Synthesis**: **DALL·E 3** generates a high-resolution 1024x1024 image based on the refined prompt.

---

## 📸 Real Usage Example & Workflow

Follow this step-by-step workflow to see the magic in action.

### 1. Speak Your Idea
Open the app and use the **Recorder** at the bottom. Simply click the microphone icon and describe the image you want to see.

![Recording Stage](.screenshots/1.png)
*Status: Capturing audio and preparing for transcription.*

### 2. Live Processing
The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.

![Processing Stage](.screenshots/3.png)
*Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.*

### 3. Behold the Result
The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.

![Result Stage](.screenshots/2.png)
*Example: "Create me a snow leopard" transformed into a stunning mountain scene.*

### 4. Continuous Interaction
You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.

![Multi-turn Interaction](.screenshots/4.png)
*Example: Following up with "A blue robot."*

---

## 🚀 Getting Started

### Prerequisites

- Python 3.9+
- OpenAI API Key

### Installation

1.  **Clone the repository** (or download the source).
2.  **Install dependencies**:
    ```bash
    pip install -r requirements.txt
    ```
3.  **Configure environment**:
    Copy `.env.example` to `.env` and add your `OPENAI_API_KEY`:
    ```bash
    cp .env.example .env
    ```
4.  **Run the application**:
    ```bash
    streamlit run app.py
    ```

---

## 📂 Project Structure

- `app.py`: Main Streamlit interface with custom chat UI.
- `agent.py`: Core logic for Whisper, GPT-4, and DALL·E integration.
- `.screenshots/`: Documentation assets showing the app in action.