Spaces:

vincentamato
/

ARIA

Running

vincentamato commited on Jul 23, 2025

Commit

eb7dc0e

1 Parent(s): 4159150

Fixed pydantic version issue

Files changed (2) hide show

app.py CHANGED Viewed

@@ -468,7 +468,7 @@ with gr.Blocks(title="ARIA - Art to Music Generator") as demo:
             results = gr.Markdown()
     gr.Markdown("""
-    ### About ARIA
     ARIA is a deep learning system that generates music from artwork by:
     1. Using a image-emotion model to extract emotional content from images
@@ -478,7 +478,7 @@ with gr.Blocks(title="ARIA - Art to Music Generator") as demo:
     ["Symbolic music generation conditioned on continuous-valued emotions"](https://ieeexplore.ieee.org/document/9762257).
     Original implementation: [github.com/serkansulun/midi-emotion](https://github.com/serkansulun/midi-emotion)
-    ### Conditioning Types
     **continuous_concat (Recommended)**
     Creates a single vector from valence and arousal values, repeats it across the sequence, and concatenates it with every music token embedding. This approach gives the emotion information *global influence* throughout the entire generation process, allowing the transformer to access emotional context at every timestep. Research shows this method achieves the best performance in both note prediction accuracy and emotional coherence.

             results = gr.Markdown()
     gr.Markdown("""
+    ## About ARIA
     ARIA is a deep learning system that generates music from artwork by:
     1. Using a image-emotion model to extract emotional content from images
     ["Symbolic music generation conditioned on continuous-valued emotions"](https://ieeexplore.ieee.org/document/9762257).
     Original implementation: [github.com/serkansulun/midi-emotion](https://github.com/serkansulun/midi-emotion)
+    ## Conditioning Types
     **continuous_concat (Recommended)**
     Creates a single vector from valence and arousal values, repeats it across the sequence, and concatenates it with every music token embedding. This approach gives the emotion information *global influence* throughout the entire generation process, allowing the transformer to access emotional context at every timestep. Research shows this method achieves the best performance in both note prediction accuracy and emotional coherence.

requirements.txt CHANGED Viewed

@@ -2,7 +2,7 @@ torch>=2.1.0
 torchvision>=0.16.0
 numpy>=1.21.0
 Pillow>=10.0.0
-gradio>=4.0.0
 matplotlib>=3.7.0
 huggingface_hub>=0.19.0
 pretty-midi>=0.2.9
@@ -12,5 +12,4 @@ midi2audio>=0.1.1
 transformers>=4.35.0
 spaces>=0.32.0
 numba>=0.60.0
-llvmlite>=0.43.0
-pydantic==1.10.15

 torchvision>=0.16.0
 numpy>=1.21.0
 Pillow>=10.0.0
+gradio>=5.30
 matplotlib>=3.7.0
 huggingface_hub>=0.19.0
 pretty-midi>=0.2.9
 transformers>=4.35.0
 spaces>=0.32.0
 numba>=0.60.0
+llvmlite>=0.43.0