Localsong
/

LocalSong

Model card Files Files and versions

LocalSong / README.md

Localsong's picture

Upload 5 files

12bbde9 verified 3 months ago

|

history blame contribute delete

2.31 kB

	---
	license: apache-2.0
	---

	# LocalSong

	LocalSong is a 700M parameter audio generation model focused on melodic instrumental music that uses tag-based conditioning. It was trained in 3 days on 1xH100 from scratch, reusing the ACE-Step VAE.

	## Installation

	### Prerequisites

	- Python 3.10 or higher
	- CUDA-capable GPU recommended with 8GB of VRAM

	### Setup

	```
	hf download Localsong/LocalSong --local-dir LocalSong
	cd LocalSong
	python3 -m venv venv
	source venv/bin/activate
	pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --extra-index-url https://download.pytorch.org/whl/cu128
	pip install -r requirements.txt
	```

	### Run

	```
	python gradio_app.py
	```

	The interface will be available at `http://localhost:7860`

	### Generation Advice

	Generations should use one of the soundtrack, soundtrack1 or soundtrack2 tags, as well as at least one other tag. They can use up to 8 tags; try combining genres and instruments.

	The default settings (CFG 3.5, steps 200) have been tested as optimal.

	If generation is too slow on your system, try lowering steps to 100.

	The first generation will be slower due to torch.compile, then speed will increase.

	The model was trained on vocals but not lyrics. Vocals will not have recognizable words.

	## LoRA Training

	- Prepare folder of .mp3 files
	- Run python train_lora_encode_latents.py --audio-dir=/path/to/your/mp3s --output-dir=latents to save the latents
	- Run python train_lora.py --latents_dir=latents to train the LoRA. You may need to adjust learning rate, steps or batch size depending on your dataset etc.
	- Run python merge_lora.py --lora-checkpoint=lora_step1000.safetensors --output-checkpoint=merged.safetensors to merge the LoRA checkpoint into the base model for inference
	- Run python gradio_app.py --checkpoint=merged.safetensors to run the merged checkpoint for inference
	- Test inference with tag "soundtrack"; Lora training uses this tag. Additional tags may work.

	## Credits

	This project builds upon the following open-source projects:

	- Model Architecture: Adapted from [DDT](https://github.com/MCG-NJU/DDT)
	- Flow Matching: Adapted from [minRF](https://github.com/cloneofsimo/minRF)
	- Audio VAE: [ACE-Step](https://github.com/ACE-Step/ACE-Step)

	## License

	This project is licensed under the Apache License 2.0