susnato
/

pop2piano_dev

Automatic Speech Recognition

Transformers

PyTorch

Model card Files Files and versions

xet

Community

susnato commited on Aug 28, 2023

Commit

bfedd54

1 Parent(s): 352d412

Upload README.md

Browse files

Files changed (1) hide show

README.md +67 -14

README.md CHANGED Viewed

@@ -4,8 +4,6 @@
 {}
 ---
-DISCLAIMER : I don't own the weights of Pop2Piano, this repo was created during the integration of Pop2Piano to HF transformers.
 # POP2PIANO
 Pop2Piano, a Transformer network that generates piano covers given waveforms of pop
@@ -14,44 +12,99 @@ music.
 Pop2Piano was proposed in the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee.
-Inspired by [T5](https://arxiv.org/abs/1910.10683), Pop2Piano
-is the first model to generate a piano cover directly from pop audio without using melody and
-chord extraction modules.
 ## Model Sources
-- [**Original Repository**](https://github.com/sweetcocoa/pop2piano)
 - [**Paper**](https://arxiv.org/abs/2211.00895)
-- [**Demo**]# TODO (after the ongoing PR is merged)
 # Usage
-First, install the required packages:
 ```
-pip install --upgrade transformers
 ```
 ## Pop music to Piano
- TODO (after the ongoing PR is merged)
 ## Example
-### Pop Music
 <audio controls>
     <source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/0/audio/audio.mp3" type="audio/mpeg">
 Your browser does not support the audio element.
 </audio>
-### Generated MIDI
-TODO (after the MIDI version is uploaded to the same repo above)
 ## Tips
-TODO
 # Citation

 {}
 ---
 # POP2PIANO
 Pop2Piano, a Transformer network that generates piano covers given waveforms of pop
 Pop2Piano was proposed in the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee.
+Piano covers of pop music are widely enjoyed, but generating them from music is not a trivial task. It requires great
+expertise with playing piano as well as knowing different characteristics and melodies of a song. With Pop2Piano you
+can directly generate a cover from a song's audio waveform. It is the first model to directly generate a piano cover
+from pop audio without melody and chord extraction modules.
+Pop2Piano is an encoder-decoder Transformer model based on [T5](https://arxiv.org/pdf/1910.10683.pdf). The input audio
+is transformed to its waveform and passed to the encoder, which transforms it to a latent representation. The decoder
+uses these latent representations to generate token ids in an autoregressive way. Each token id corresponds to one of four
+different token types: time, velocity, note and 'special'. The token ids are then decoded to their equivalent MIDI file.
 ## Model Sources
 - [**Paper**](https://arxiv.org/abs/2211.00895)
+- [**Original Repository**](https://github.com/sweetcocoa/pop2piano)
+- [**HuggingFace Space Demo**](https://huggingface.co/spaces/sweetcocoa/pop2piano)
 # Usage
+To use Pop2Piano, you will need to install the 🤗 Transformers library, as well as the following third party modules:
 ```
+pip install https://github.com/huggingface/transformers.git
+pip install pretty-midi==0.2.9 essentia==2.1b6.dev1034 librosa scipy
 ```
+Please note that you may need to restart your runtime after installation.
 ## Pop music to Piano
+### Code Example
+- Using your own Audio
+```python
+>>> import librosa
+>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
+>>> audio, sr = librosa.load("<your_audio_file_here>", sr=44100)  # feel free to change the sr to a suitable value.
+>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
+>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
+>>> inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
+>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
+>>> tokenizer_output = processor.batch_decode(
+...     token_ids=model_output, feature_extractor_output=inputs
+... )["pretty_midi_objects"][0]
+>>> tokenizer_output.write("./Outputs/midi_output.mid")
+```
+- Audio from Hugging Face Hub
+```python
+>>> from datasets import load_dataset
+>>> from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
+>>> model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
+>>> processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
+>>> ds = load_dataset("sweetcocoa/pop2piano_ci", split="test")
+>>> inputs = processor(
+...     audio=ds["audio"][0]["array"], sampling_rate=ds["audio"][0]["sampling_rate"], return_tensors="pt"
+... )
+>>> model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
+>>> tokenizer_output = processor.batch_decode(
+...     token_ids=model_output, feature_extractor_output=inputs
+... )["pretty_midi_objects"][0]
+>>> tokenizer_output.write("./Outputs/midi_output.mid")
+```
 ## Example
+ Here we present an example of generated MIDI.
+- Actual Pop Music
 <audio controls>
     <source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/0/audio/audio.mp3" type="audio/mpeg">
 Your browser does not support the audio element.
 </audio>
+- Generated MIDI
+<audio controls>
+    <source src="https://datasets-server.huggingface.co/assets/sweetcocoa/pop2piano_ci/--/sweetcocoa--pop2piano_ci/test/1/audio/audio.mp3" type="audio/mpeg">
+Your browser does not support the audio element.
+</audio>
 ## Tips
+1. Pop2Piano is an Encoder-Decoder based model like T5.
+2. Pop2Piano can be used to generate midi-audio files for a given audio sequence.
+3. Choosing different composers in `Pop2PianoForConditionalGeneration.generate()` can lead to variety of different results.
+4. Setting the sampling rate to 44.1 kHz when loading the audio file can give good performance.
+5. Though Pop2Piano was mainly trained on Korean Pop music, it also does pretty well on other Western Pop or Hip Hop songs.
 # Citation