| --- |
| title: SongLift LyrGen2 |
| emoji: π΅ |
| colorFrom: indigo |
| colorTo: purple |
| sdk: streamlit |
| sdk_version: 1.41.0 |
| app_file: app.py |
| pinned: false |
| --- |
| |
| # SongLift LyrGen2 π΅ |
|
|
| An AI-powered lyrics generation system that uses semantic understanding of existing lyrics to generate new, contextually relevant song lyrics. Built with LangChain, RAG (Retrieval-Augmented Generation), and OpenAI's GPT-4. |
|
|
| ## π Deploy Your Own |
|
|
| This app is designed to be easily deployed on HuggingFace Spaces. Follow the setup instructions below to create your own instance. |
|
|
| ## β¨ Features |
|
|
| - **Semantic Lyrics Generation**: Uses vector embeddings of 234K+ lyrics for contextual understanding |
| - **RAG Technology**: Retrieval-Augmented Generation finds similar lyrics to inform new creations |
| - **Modern Sensibilities**: Trained on contemporary pop and hip-hop lyrics |
| - **Interactive Web Interface**: Clean Streamlit interface for easy use |
| - **Source Attribution**: Shows which lyrics influenced the generation |
|
|
| ## ποΈ Architecture |
|
|
| ### Core Components |
| - **Vector Database**: ChromaDB with OpenAI Ada-002 embeddings |
| - **AI Models**: GPT-4 for generation, Ada-002 for embeddings |
| - **Data Pipeline**: Automated processing of raw lyrics into searchable embeddings |
| - **Dual Deployment**: Local development + HuggingFace Spaces production |
|
|
| ### Workflow |
| ``` |
| Raw Lyrics β Data Cleaning β Text Chunking β Embeddings β ChromaDB β Generation |
| ``` |
|
|
| ## π οΈ Local Development |
|
|
| ### Prerequisites |
| - Python 3.8+ |
| - OpenAI API key |
| - HuggingFace token (optional, for dataset access) |
|
|
| ### Setup |
| ```bash |
| # Clone the repository |
| git clone <your-repo-url> |
| cd SongLift_LyrGen2 |
| |
| # Create virtual environment |
| python -m venv .venv |
| source .venv/bin/activate # On Windows: .venv\Scripts\activate |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| |
| # Configure environment |
| cp .env.example .env |
| # Edit .env with your API keys |
| ``` |
|
|
| ### Environment Variables |
| Create a `.env` file with: |
| ```env |
| OPENAI_API_KEY=your_openai_api_key_here |
| HF_TOKEN=your_huggingface_token_here |
| DEPLOYMENT_MODE=local |
| ``` |
|
|
| ### Run Locally |
| ```bash |
| streamlit run app.py |
| ``` |
| Visit `http://localhost:8501` |
|
|
| ## π§ͺ Testing & Validation |
|
|
| ```bash |
| # Test your environment setup |
| python scripts/test_environment.py |
| |
| # Test OpenAI connection |
| python scripts/test_openai_connection.py |
| |
| # Validate embeddings database |
| python scripts/test_embeddings.py |
| ``` |
|
|
| ## π Data Processing |
|
|
| The system processes lyrics through a sophisticated pipeline: |
|
|
| 1. **Raw Data Loading** (`scripts/process_lyrics.py`) |
| - Multi-encoding support (UTF-8, Latin-1, CP1252) |
| - Section detection ([Verse], [Chorus], etc.) |
| - Metadata preservation |
|
|
| 2. **Text Processing** |
| - Recursive text splitting (300 chars, 75 overlap) |
| - Batch processing with rate limiting |
| - Automatic retry on API limits |
|
|
| 3. **Vector Storage** |
| - ChromaDB collection: "lyrics_v1" |
| - ~234K embedded documents |
| - Metadata tracking (artist, song title) |
| |
| ## π Deployment |
| |
| ### HuggingFace Spaces |
| The app auto-deploys to HuggingFace Spaces via GitHub sync: |
| - **Space**: [SongLift/LyrGen2](https://huggingface.co/spaces/SongLift/LyrGen2) |
| - **Dataset**: [SongLift/LyrGen2_DB](https://huggingface.co/datasets/SongLift/LyrGen2_DB) |
| |
| Configure secrets in HF Spaces settings: |
| - `OPENAI_API_KEY` |
| - `HF_TOKEN` |
|
|
| ### Local to Production Sync |
| ```bash |
| # Process and upload embeddings |
| python scripts/process_lyrics.py |
| python scripts/upload_embeddings.py |
| ``` |
|
|
| ## π§ Configuration |
|
|
| Key configuration in `config/settings.py`: |
| - **Models**: GPT-4 for generation, Ada-002 for embeddings |
| - **Paths**: Auto-detects local vs HuggingFace environment |
| - **Database**: ChromaDB with persistent storage |
|
|
| ## π Project Structure |
|
|
| ``` |
| SongLift_LyrGen2/ |
| βββ app.py # Main Streamlit application |
| βββ config/ |
| β βββ settings.py # Central configuration |
| βββ src/ |
| β βββ generator/ # Core generation logic |
| β βββ utils/ # Utility functions |
| βββ scripts/ # Data processing & testing |
| βββ data/ |
| β βββ raw/lyrics/ # Place your lyrics files here (organized by artist folders) |
| β βββ processed/ # Generated embeddings & ChromaDB files |
| βββ .env.example # Environment variables template |
| ``` |
|
|
| ### π Data Directory Setup |
|
|
| The `data/` directory structure is preserved for you to add your own lyrics: |
|
|
| ``` |
| data/raw/lyrics/ |
| βββ artist1/ |
| β βββ song1.txt |
| β βββ song2.txt |
| βββ artist2/ |
| β βββ song1.txt |
| β βββ song2.txt |
| βββ ... |
| ``` |
|
|
| After adding lyrics, run the processing pipeline: |
| ```bash |
| python scripts/process_lyrics.py |
| ``` |
|
|
| ## π Browser Compatibility |
| β οΈ **Recommended**: Chrome or Chromium-based browsers for optimal performance. Some features may not work correctly in Safari. |
|
|
| ## οΏ½ HouggingFace Spaces Setup |
|
|
| ### Deploy Your Own Space |
|
|
| 1. **Create a HuggingFace Space**: |
| - Go to [HuggingFace Spaces](https://huggingface.co/spaces) |
| - Click "Create new Space" |
| - Choose "Streamlit" as SDK |
| - Set `app_file: app.py` |
|
|
| 2. **Configure Secrets**: |
| - In your Space settings, add these secrets: |
| - `OPENAI_API_KEY`: Your OpenAI API key |
| - `HF_TOKEN`: Your HuggingFace token (for dataset access) |
|
|
| 3. **Upload Your Dataset**: |
| ```bash |
| # Process and upload embeddings to HF dataset |
| python scripts/process_lyrics.py |
| python scripts/upload_embeddings.py |
| ``` |
|
|
| 4. **Sync with GitHub** (optional): |
| - Connect your Space to a GitHub repo for automatic deployments |
| - Push changes to GitHub β auto-deploys to HF Spaces |
|
|
| ### Running HuggingFace Locally |
|
|
| You can test the HuggingFace environment locally: |
|
|
| ```bash |
| # Set HuggingFace mode |
| export DEPLOYMENT_MODE=huggingface |
| |
| # Run locally (will use HF dataset paths) |
| streamlit run app.py |
| ``` |
|
|
| This helps debug HF-specific issues before deploying. |
|
|
| ## π€ Contributing |
|
|
| 1. Fork the repository |
| 2. Create a feature branch |
| 3. Make your changes |
| 4. Add tests if applicable |
| 5. Submit a pull request |
|
|
| ## π License |
|
|
| MIT License |
|
|
| Copyright (c) 2024 SongLift |
|
|
| Permission is hereby granted, free of charge, to any person obtaining a copy |
| of this software and associated documentation files (the "Software"), to deal |
| in the Software without restriction, including without limitation the rights |
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
| copies of the Software, and to permit persons to whom the Software is |
| furnished to do so, subject to the following conditions: |
|
|
| The above copyright notice and this permission notice shall be included in all |
| copies or substantial portions of the Software. |
|
|
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
| SOFTWARE. |
|
|
| ## π Acknowledgments |
|
|
| - Built with [LangChain](https://langchain.com/) and [Streamlit](https://streamlit.io/) |
| - Powered by [OpenAI](https://openai.com/) and [HuggingFace](https://huggingface.co/) |
| - Vector storage by [ChromaDB](https://www.trychroma.com/) |
|
|
|
|