AEmotionStudio
/

audiosr-models

audio-super-resolution

Model card Files Files and versions

audiosr-models / README.md

AEmotionStudio's picture

Add README

ca3ea8d verified about 1 month ago

|

history blame contribute delete

2.43 kB

	---
	license: mit
	tags:
	- audio
	- audio-super-resolution
	- upscaling
	- audiosr
	- safetensors
	- maestraea
	pipeline_tag: audio-to-audio
	---

	# AudioSR Models (Safetensors)

	Audio Super-Resolution — Upscale Any Audio to 48kHz

	[Original Source](https://github.com/haoheliu/versatile_audio_super_resolution) by [Haohe Liu](https://github.com/haoheliu) · MIT License

	> Converted from `pytorch_model.bin` to safetensors format for faster loading and safer deserialization. For use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea).

	## Available Models

	\| Variant \| Files \| Size \| Description \|
	\|---------\|-------\|------\|-------------\|
	\| basic \| `basic/audiosr_basic.safetensors` \| 6.2 GB \| General audio (music, SFX, speech) \|
	\| speech \| `speech/audiosr_speech-*.safetensors` (3 shards) \| 6.2 GB \| Optimized for spoken word \|

	## What AudioSR Does

	AudioSR uses latent diffusion to upscale any audio to 48kHz, restoring high-frequency content that was lost to:

	- Low sample rate recording (8kHz, 16kHz, 22kHz → 48kHz)
	- Lossy compression (MP3, AAC artifacts)
	- Bandwidth-limited audio

	### Key Parameters

	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| `ddim_steps` \| 10–200 \| 50 \| More steps = higher quality \|
	\| `guidance_scale` \| 1–10 \| 3.5 \| Prompt adherence \|
	\| `model_name` \| basic/speech \| basic \| Which variant to use \|

	### VRAM Requirements

	- Minimum: ~4 GB
	- Recommended: ~6 GB (for longer audio)

	## Usage with Mæstræa

	These models are automatically downloaded by the Mæstræa AI Workstation backend.

	### Direct Usage

	```python
	import audiosr
	model = audiosr.build_model(model_name="basic")
	waveform = audiosr.super_resolution(
	model, "input.wav",
	seed=42, guidance_scale=3.5, ddim_steps=50
	)
	```

	## Original Source

	\| Variant \| Original Repo \|
	\|---------\|--------------\|
	\| basic \| [haoheliu/audiosr_basic](https://huggingface.co/haoheliu/audiosr_basic) \|
	\| speech \| [haoheliu/audiosr_speech](https://huggingface.co/haoheliu/audiosr_speech) \|

	## License

	MIT — same as the original AudioSR release.

	## Credits

	- Model: [AudioSR](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu et al.
	- Paper: [Versatile Audio Super Resolution](https://arxiv.org/abs/2309.07314)
	- Conversion & Mirror by: [AEmotionStudio](https://huggingface.co/AEmotionStudio)