Spaces:
Sleeping
Sleeping
| title: PDF to Podcast Converter | |
| emoji: ποΈ | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| # NotebookMg | |
| This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis. | |
| ## Features | |
| - PDF text extraction and cleaning | |
| - Conversion of academic/technical content into natural dialogue | |
| - Dynamic conversation generation between two hosts (Alex and Jamie) | |
| - High-quality text-to-speech synthesis | |
| - Web interface for easy interaction | |
| - API endpoints for programmatic access | |
| ## Prerequisites | |
| - Python 3.8+ | |
| - Google Gemini API key | |
| - ElevenLabs API key | |
| ## Installation | |
| 1. Clone the repository: | |
| ```bash | |
| git clone <repository-url> | |
| cd pdf-to-podcast | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Set up environment variables: | |
| ```bash | |
| # Create a .env file | |
| touch .env | |
| # Add your API keys | |
| echo "GEMINI_API_KEY=your_gemini_api_key" >> .env | |
| echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env | |
| ``` | |
| ## Project Structure | |
| ``` | |
| pdf-to-podcast/ | |
| βββ main.py # Core conversion logic | |
| βββ app.py # FastAPI application | |
| βββ run.py # Server startup script | |
| βββ templates/ # HTML templates | |
| β βββ index.html # Web interface | |
| βββ uploads/ # Temporary PDF storage | |
| βββ outputs/ # Generated files | |
| ``` | |
| ## Usage | |
| ### Web Interface | |
| 1. Start the server: | |
| ```bash | |
| python run.py | |
| ``` | |
| 2. Open your browser and navigate to `http://localhost:8000` | |
| 3. Upload a PDF file | |
| 4. Download the generated files: | |
| - Cleaned text version | |
| - Conversation transcript | |
| - MP3 podcast file | |
| ### API Endpoints | |
| - `POST /upload-pdf/`: Upload PDF and generate podcast | |
| - `GET /download/{filename}`: Download generated files | |
| - `GET /status`: Check API status | |
| ## API Examples | |
| ```python | |
| import requests | |
| # Upload PDF | |
| with open('document.pdf', 'rb') as f: | |
| response = requests.post( | |
| 'http://localhost:8000/upload-pdf/', | |
| files={'file': f} | |
| ) | |
| # Download generated podcast | |
| response = requests.get( | |
| 'http://localhost:8000/download/document_podcast.mp3' | |
| ) | |
| ``` | |
| ## Configuration | |
| Voice IDs can be configured in `main.py`: | |
| ```python | |
| self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel voice | |
| self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD" # Adam voice | |
| ``` | |
| ## Dependencies | |
| - `google-generativeai`: Gemini Pro API | |
| - `elevenlabs`: Text-to-speech synthesis | |
| - `PyPDF2`: PDF processing | |
| - `fastapi`: Web API framework | |
| - `pydub`: Audio processing | |
| - `python-multipart`: File upload handling | |
| - `uvicorn`: ASGI server | |
| - `jinja2`: Template engine | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Commit your changes | |
| 4. Push to the branch | |
| 5. Create a Pull Request | |
| ## License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## Acknowledgments | |
| - Google Gemini for AI text processing | |
| - ElevenLabs for voice synthesis | |
| - FastAPI team for the excellent web framework | |
| ## Support | |
| For support, please open an issue in the GitHub repository or contact [your-email]. |