doc_alive / README.md
slxhere's picture
Add audio generation
5c9f0d9
---
title: Doc Alive - RAG to Image
emoji: πŸ“¦πŸŽ¨
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: "5.44.1"
app_file: app.py
pinned: false
---
# πŸ“¦β†’πŸ§ β†’πŸŽ¨ Doc Alive: RAG-to-Image with OpenAI
This project turns documents into **illustrations** with the help of RAG (Retrieval-Augmented Generation), LLM prompt engineering, and OpenAI’s image generation.
Upload a `.txt`, `.md`, or `.pdf` file, describe your goal, and the app will:
1. **Extract text** from your file
2. **Retrieve key excerpts** using embeddings
3. **Ask an LLM** to craft a structured image generation spec
4. **Generate an illustration** with OpenAI’s image model
---
## πŸš€ Demo
This app runs on **Hugging Face Spaces** using **Gradio**.
---
## πŸ”‘ API Key
You must provide your own **OpenAI API key** to use this demo.
- Enter your key in the input box (starts with `sk-...`)
- The key is **not stored** β€” it is only used in memory for your current session
---
## πŸ“‚ Project Structure
β”œβ”€ app.py # Gradio UI entry
β”œβ”€ requirements.txt # Dependencies
β”œβ”€ rag/ # Text extraction + retrieval
β”œβ”€ llm/ # Structured LLM call helper
β”œβ”€ generation/ # Image generation helper
---
## πŸ›  Tech Stack
- [Gradio](https://www.gradio.app/) – UI framework
- [OpenAI](https://platform.openai.com/) – LLM + image generation
- [RAG (text-embedding-3-small)](https://platform.openai.com/docs/guides/embeddings) – semantic retrieval
---
## ⚠️ Notes
- The OpenAI API key is required for both embeddings and image generation
- We do **not** log or save your key
- Depending on your key usage, OpenAI will bill API calls