lxtung95 commited on
Commit
ec08c78
·
verified ·
1 Parent(s): 9d142db

Complete project documentation and architecture overhaul

Browse files

Refined README with detailed project objective and comprehensive technical methodology. Added a transparent directory structure highlighting modular package logic, external dataset links, and Google Colab workspace access.

Files changed (1) hide show
  1. README.md +71 -33
README.md CHANGED
@@ -1,34 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: LyricLoop Studio
3
- emoji: 🎤
4
- colorFrom: indigo
5
- colorTo: blue
6
- sdk: streamlit
7
- app_file: app.py
8
- pinned: false
9
- short_description: AI studio for structured lyrics, fine-tuned on Gemma-2b.
10
- license: gemma
11
- sdk_version: 1.53.0
12
- ---
13
-
14
- # LyricLoop v2.0
15
- Author: Alexander Tung (Columbia University)
16
-
17
- ## Project Objective
18
- LyricLoo bridges the gap between semantic LLM text and professional musical phrasing. This framework fine-tunes Google's Gemma-2b-it to generate lyrics adhering to specific structures (*Verse, Chorus, Bridge*) and genre-specific stylings (Electronic, Pop, Rock, Hip-Hop).
19
-
20
- ## Technical Methodology
21
- - Fine-Tuning: Implemented Low-Rank Adaptation (LoRA) to specialize the model in rhythmic patterns while preserving base reasoning.
22
- - Optimization: Used 4-bit Quantization (QLoRA) via `bitsandbytes` to reduce the memory footprint during training.
23
- - Instruction Tuning: Supervised Fine-Tuning (SFT) with custom templates to enforce structural and genre constraints.
24
-
25
- ## Data & Stack
26
- - Corpus: 5mm+ Song Lyrics ([Genius Dataset](https://www.kaggle.com/datasets/carlosgdcj/genius-song-lyrics-with-language-information)).
27
- - Metadata: Artist mapping via [Pitchfork Reviews](https://www.kaggle.com/datasets/timstafford/pitchfork-reviews).
28
- - Stack: Python, Hugging Face (`Transformers`, `PEFT`, `TRL`), PyTorch, and Google Colab (L4 GPU).
29
-
30
- ## Studio Guide
31
- 1. Details: Enter a song title and an Artist Aesthetic (e.g., *Taylor Swift*) to set the tone.
32
- 2. Genre: Select your target genre to adjust rhythmic density.
33
- 3. Compose: Use the *Creativity (Temperature)* slider to control experimental word choice.
34
- 4. Export: Download the final composition as a `.txt` file for your creative workflow.
 
1
+ NAME
2
+
3
+ LyricLoop LLM
4
+
5
+ ---
6
+
7
+ PROJECT OBJECTIVE
8
+
9
+ LyricLoop bridges the gap between semantic LLM text and professional musical phrasing. This framework fine-tunes Google's Gemma-2b-it to generate lyrics adhering to specific structures (Verse, Chorus, Bridge) and genre-specific stylings (Electronic, Pop, Rock, Hip-Hop).
10
+
11
+ ---
12
+
13
+ LANGUAGE / STACK
14
+
15
+ Python | PyTorch, Hugging Face (Transformers, PEFT, TRL), Streamlit
16
+
17
+ ---
18
+
19
+ TECHNICAL METHODOLOGY
20
+
21
+ - Fine-Tuning: Implemented Low-Rank Adaptation (LoRA) to specialize the model in rhythmic patterns while preserving base reasoning.
22
+ - Optimization: Used 4-bit Quantization (QLoRA) via bitsandbytes to reduce the memory footprint during training.
23
+ - Instruction Tuning: Supervised Fine-Tuning (SFT) with custom templates to enforce structural and genre constraints.
24
+
25
+ ---
26
+
27
+ PROJECT STRUCTURE
28
+
29
+ - app.py: main streamlit application entry point and UI logic.
30
+ - src/lyricloop/: core modular package containing engine logic:
31
+ - config.py: global constants and path management.
32
+ - data.py: prompt engineering and dataset preprocessing.
33
+ - environment.py: hardware-aware setup (MPS/CPU/CUDA).
34
+ - metrics.py: inference execution and perplexity scoring.
35
+ - viz.py: standardized plotting and visual utilities.
36
+ - notebooks/: development playground, training workflows, and EDA.
37
+ - reports/: written technical documentation and project summaries.
38
+ - assets/: visual artifacts and plots used in documentation.
39
+ - requirements.txt: dependency management for environment parity.
40
+
41
+ ---
42
+
43
+ DATA & SOURCE
44
+
45
+ - Corpus: 5mm+ Song Lyrics (Genius Dataset).
46
+ - Metadata: Artist mapping via Pitchfork Reviews.
47
+ - Stack: Python, Hugging Face (Transformers, PEFT, TRL), PyTorch, and Google Colab (L4 GPU).
48
+
49
+ ---
50
+
51
+ EXTERNAL RESOURCES
52
+
53
+ - Full Project Workspace (Google Drive): [Access the Notebooks & Raw Data](https://drive.google.com/drive/folders/1M5SJRaaK8OaskUgEsBupgGVN_-fQS3i4?usp=sharing)
54
+ - Training Environment: Google Colab (L4 GPU)
55
+
56
+ ---
57
+
58
+ STUDIO GUIDE
59
+
60
+ - Run on Hugging Face lxtung95/lyricloop
61
+ - App URL: https://lxtung95-lyricloop.hf.space/
62
+ 1. Details: Enter a song title and an Artist Aesthetic (e.g., Taylor Swift) to set the tone.
63
+ 2. Genre: Select your target genre to adjust rhythmic density.
64
+ 3. Compose: Use the Creativity (Temperature) slider to control experimental word choice.
65
+ 4. Export: Download the final composition as a .txt file for your creative workflow.
66
+
67
  ---
68
+
69
+ SUPPORT
70
+
71
+ Visit my GitHub repository for the latest scripts and downloads:
72
+ https://github.com/lxntung95