Spaces:

m1n9k7
/

NotebookMg

Sleeping

App Files Files Community

NotebookMg / README.md

TheM1N9

readme updated

363f523 about 1 year ago

preview code

raw

history blame contribute delete

3.14 kB

	---
	title: PDF to Podcast Converter
	emoji: 🎙️
	colorFrom: green
	colorTo: purple
	sdk: docker
	app_port: 7860
	---

	# NotebookMg

	This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis.

	## Features

	- PDF text extraction and cleaning
	- Conversion of academic/technical content into natural dialogue
	- Dynamic conversation generation between two hosts (Alex and Jamie)
	- High-quality text-to-speech synthesis
	- Web interface for easy interaction
	- API endpoints for programmatic access

	## Prerequisites

	- Python 3.8+
	- Google Gemini API key
	- ElevenLabs API key

	## Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd pdf-to-podcast
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Set up environment variables:
	```bash
	# Create a .env file
	touch .env

	# Add your API keys
	echo "GEMINI_API_KEY=your_gemini_api_key" >> .env
	echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env
	```

	## Project Structure

	```
	pdf-to-podcast/
	├── main.py # Core conversion logic
	├── app.py # FastAPI application
	├── run.py # Server startup script
	├── templates/ # HTML templates
	│ └── index.html # Web interface
	├── uploads/ # Temporary PDF storage
	└── outputs/ # Generated files
	```

	## Usage

	### Web Interface

	1. Start the server:
	```bash
	python run.py
	```

	2. Open your browser and navigate to `http://localhost:8000`
	3. Upload a PDF file
	4. Download the generated files:
	- Cleaned text version
	- Conversation transcript
	- MP3 podcast file

	### API Endpoints

	- `POST /upload-pdf/`: Upload PDF and generate podcast
	- `GET /download/{filename}`: Download generated files
	- `GET /status`: Check API status

	## API Examples

	```python
	import requests

	# Upload PDF
	with open('document.pdf', 'rb') as f:
	response = requests.post(
	'http://localhost:8000/upload-pdf/',
	files={'file': f}
	)

	# Download generated podcast
	response = requests.get(
	'http://localhost:8000/download/document_podcast.mp3'
	)
	```

	## Configuration

	Voice IDs can be configured in `main.py`:
	```python
	self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel voice
	self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD" # Adam voice
	```

	## Dependencies

	- `google-generativeai`: Gemini Pro API
	- `elevenlabs`: Text-to-speech synthesis
	- `PyPDF2`: PDF processing
	- `fastapi`: Web API framework
	- `pydub`: Audio processing
	- `python-multipart`: File upload handling
	- `uvicorn`: ASGI server
	- `jinja2`: Template engine

	## Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Commit your changes
	4. Push to the branch
	5. Create a Pull Request

	## License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## Acknowledgments

	- Google Gemini for AI text processing
	- ElevenLabs for voice synthesis
	- FastAPI team for the excellent web framework

	## Support

	For support, please open an issue in the GitHub repository or contact [your-email].