Spaces:
Running
on
Zero
Running
on
Zero
Commit
·
eb7dc0e
1
Parent(s):
4159150
Fixed pydantic version issue
Browse files- app.py +2 -2
- requirements.txt +2 -3
app.py
CHANGED
|
@@ -468,7 +468,7 @@ with gr.Blocks(title="ARIA - Art to Music Generator") as demo:
|
|
| 468 |
results = gr.Markdown()
|
| 469 |
|
| 470 |
gr.Markdown("""
|
| 471 |
-
|
| 472 |
|
| 473 |
ARIA is a deep learning system that generates music from artwork by:
|
| 474 |
1. Using a image-emotion model to extract emotional content from images
|
|
@@ -478,7 +478,7 @@ with gr.Blocks(title="ARIA - Art to Music Generator") as demo:
|
|
| 478 |
["Symbolic music generation conditioned on continuous-valued emotions"](https://ieeexplore.ieee.org/document/9762257).
|
| 479 |
Original implementation: [github.com/serkansulun/midi-emotion](https://github.com/serkansulun/midi-emotion)
|
| 480 |
|
| 481 |
-
|
| 482 |
|
| 483 |
**continuous_concat (Recommended)**
|
| 484 |
Creates a single vector from valence and arousal values, repeats it across the sequence, and concatenates it with every music token embedding. This approach gives the emotion information *global influence* throughout the entire generation process, allowing the transformer to access emotional context at every timestep. Research shows this method achieves the best performance in both note prediction accuracy and emotional coherence.
|
|
|
|
| 468 |
results = gr.Markdown()
|
| 469 |
|
| 470 |
gr.Markdown("""
|
| 471 |
+
## About ARIA
|
| 472 |
|
| 473 |
ARIA is a deep learning system that generates music from artwork by:
|
| 474 |
1. Using a image-emotion model to extract emotional content from images
|
|
|
|
| 478 |
["Symbolic music generation conditioned on continuous-valued emotions"](https://ieeexplore.ieee.org/document/9762257).
|
| 479 |
Original implementation: [github.com/serkansulun/midi-emotion](https://github.com/serkansulun/midi-emotion)
|
| 480 |
|
| 481 |
+
## Conditioning Types
|
| 482 |
|
| 483 |
**continuous_concat (Recommended)**
|
| 484 |
Creates a single vector from valence and arousal values, repeats it across the sequence, and concatenates it with every music token embedding. This approach gives the emotion information *global influence* throughout the entire generation process, allowing the transformer to access emotional context at every timestep. Research shows this method achieves the best performance in both note prediction accuracy and emotional coherence.
|
requirements.txt
CHANGED
|
@@ -2,7 +2,7 @@ torch>=2.1.0
|
|
| 2 |
torchvision>=0.16.0
|
| 3 |
numpy>=1.21.0
|
| 4 |
Pillow>=10.0.0
|
| 5 |
-
gradio>=
|
| 6 |
matplotlib>=3.7.0
|
| 7 |
huggingface_hub>=0.19.0
|
| 8 |
pretty-midi>=0.2.9
|
|
@@ -12,5 +12,4 @@ midi2audio>=0.1.1
|
|
| 12 |
transformers>=4.35.0
|
| 13 |
spaces>=0.32.0
|
| 14 |
numba>=0.60.0
|
| 15 |
-
llvmlite>=0.43.0
|
| 16 |
-
pydantic==1.10.15
|
|
|
|
| 2 |
torchvision>=0.16.0
|
| 3 |
numpy>=1.21.0
|
| 4 |
Pillow>=10.0.0
|
| 5 |
+
gradio>=5.30
|
| 6 |
matplotlib>=3.7.0
|
| 7 |
huggingface_hub>=0.19.0
|
| 8 |
pretty-midi>=0.2.9
|
|
|
|
| 12 |
transformers>=4.35.0
|
| 13 |
spaces>=0.32.0
|
| 14 |
numba>=0.60.0
|
| 15 |
+
llvmlite>=0.43.0
|
|
|