Spaces:

niddijoris
/

VoiceToImage

Sleeping

App Files Files Community

VoiceToImage / README.md

niddijoris

Update README.md

ae7fe56 verified 4 months ago

preview code

raw

history blame contribute delete

2.81 kB

	---
	license: afl-3.0
	title: Voice to Image
	sdk: docker
	emoji: 💻
	colorFrom: red
	colorTo: green
	---
	# 🎙️ Voice into Imagination

	Voice into Imagination is a sophisticated AI-powered application that transforms your spoken words into vivid, high-quality images. By blending voice transcription, natural language understanding, and state-of-the-art image synthesis, it provides a seamless bridge between thought and visualization.

	---

	## 🛠️ How It Works

	The application utilizes a multi-stage pipeline to process your input:

	1. Voice Capture: High-fidelity audio recording through Streamlit's interface.
	2. Transcription: Speech-to-text conversion powered by OpenAI Whisper.
	3. Prompt Engineering: GPT-4o-mini refines your transcript into a detailed, visually-rich image prompt.
	4. Image Synthesis: DALL·E 3 generates a high-resolution 1024x1024 image based on the refined prompt.

	---

	## 📸 Real Usage Example & Workflow

	Follow this step-by-step workflow to see the magic in action.

	### 1. Speak Your Idea
	Open the app and use the Recorder at the bottom. Simply click the microphone icon and describe the image you want to see.

	![Recording Stage](.screenshots/1.png)
	Status: Capturing audio and preparing for transcription.

	### 2. Live Processing
	The system provides real-time feedback. It transcribes your voice and displays what it understood. You'll see a success message showing your captured text.

	![Processing Stage](.screenshots/3.png)
	Status: System logs show the internal operations (sidebar) while the main UI displays the transcribed text.

	### 3. Behold the Result
	The agent generates a detailed prompt and creates your image. The results are displayed in a clean, chat-like interface.

	![Result Stage](.screenshots/2.png)
	Example: "Create me a snow leopard" transformed into a stunning mountain scene.

	### 4. Continuous Interaction
	You can keep adding more images or refining your ideas. All system logs are tracked in the sidebar to ensure transparency.

	![Multi-turn Interaction](.screenshots/4.png)
	Example: Following up with "A blue robot."

	---

	## 🚀 Getting Started

	### Prerequisites

	- Python 3.9+
	- OpenAI API Key

	### Installation

	1. Clone the repository (or download the source).
	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```
	3. Configure environment:
	Copy `.env.example` to `.env` and add your `OPENAI_API_KEY`:
	```bash
	cp .env.example .env
	```
	4. Run the application:
	```bash
	streamlit run app.py
	```

	---

	## 📂 Project Structure

	- `app.py`: Main Streamlit interface with custom chat UI.
	- `agent.py`: Core logic for Whisper, GPT-4, and DALL·E integration.
	- `.screenshots/`: Documentation assets showing the app in action.