AEmotionStudio
/

sam-audio-models

audio-separation

sound-separation

Model card Files Files and versions

sam-audio-models / README.md

AEmotionStudio's picture

Upload README.md with huggingface_hub

f3889cd verified 2 days ago

|

history blame contribute delete

2.62 kB

	---
	license: other
	license_name: sam-license
	license_link: LICENSE
	tags:
	- audio
	- audio-separation
	- sound-separation
	- sam-audio
	- meta
	- pytorch
	- safetensors
	- bf16
	pipeline_tag: audio-to-audio
	base_model: facebook/sam-audio-large-tv
	---

	# SAM-Audio Models (BF16 Safetensors)

	Ungated mirrors of Meta's [SAM-Audio](https://github.com/facebookresearch/sam-audio) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.

	## What is SAM-Audio?

	SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for isolating any sound in audio using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures.

	- Text prompts — isolate sounds by describing them (e.g. "drums", "vocals", "piano")
	- Visual prompts — point at objects in video to extract their sound
	- Span prompts — specify time ranges where the target sound occurs

	The `-tv` variants are optimized for target correctness and visual prompting.

	## Available Models

	\| Model \| Parameters \| File Size \| Original \|
	\|---\|---\|---\|---\|
	\| `sam-audio-large-tv-bf16.safetensors` \| 3,715,221,638 \| 6.92 GiB \| [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv) \|
	\| `sam-audio-base-tv-bf16.safetensors` \| 1,931,243,654 \| 3.60 GiB \| [facebook/sam-audio-base-tv](https://huggingface.co/facebook/sam-audio-base-tv) \|

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `sam-audio-large-tv-bf16.safetensors` \| Large-TV model weights (BF16) \|
	\| `sam-audio-base-tv-bf16.safetensors` \| Base-TV model weights (BF16) \|
	\| `config.json` \| Model configuration \|
	\| `LICENSE` \| SAM License (required for redistribution) \|

	## Usage

	```python
	# With ComfyUI-FFMPEGA (automatic download)
	# Set no_llm_mode = "audio_separate" and prompt = "vocals"

	# Or standalone:
	from sam_audio import SAMAudio
	model = SAMAudio.from_pretrained("path/to/this/repo")
	```

	## License

	This model is distributed under the SAM License — see the [LICENSE](LICENSE) file. Key points:

	- ✅ Commercial use permitted
	- ✅ Redistribution permitted (with license included)
	- ✅ Derivative works permitted
	- ❌ No military/warfare, nuclear, or espionage use
	- ❌ No reverse engineering

	## Credits

	- Original model by: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
	- Paper: SAM-Audio: Segment Anything in Audio
	- Redistributed by: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)