| | --- |
| | license: other |
| | license_name: sam-license |
| | license_link: LICENSE |
| | tags: |
| | - audio |
| | - audio-separation |
| | - sound-separation |
| | - sam-audio |
| | - meta |
| | - pytorch |
| | - safetensors |
| | - bf16 |
| | pipeline_tag: audio-to-audio |
| | base_model: facebook/sam-audio-large-tv |
| | --- |
| | |
| | # SAM-Audio Models (BF16 Safetensors) |
| |
|
| | **Ungated mirrors** of Meta's [SAM-Audio](https://github.com/facebookresearch/sam-audio) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access. |
| |
|
| | ## What is SAM-Audio? |
| |
|
| | SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for **isolating any sound in audio** using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures. |
| |
|
| | - **Text prompts** — isolate sounds by describing them (e.g. *"drums"*, *"vocals"*, *"piano"*) |
| | - **Visual prompts** — point at objects in video to extract their sound |
| | - **Span prompts** — specify time ranges where the target sound occurs |
| |
|
| | The `-tv` variants are optimized for **target correctness** and **visual prompting**. |
| |
|
| | ## Available Models |
| |
|
| | | Model | Parameters | File Size | Original | |
| | |---|---|---|---| |
| | | `sam-audio-large-tv-bf16.safetensors` | 3,715,221,638 | 6.92 GiB | [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv) | |
| | | `sam-audio-base-tv-bf16.safetensors` | 1,931,243,654 | 3.60 GiB | [facebook/sam-audio-base-tv](https://huggingface.co/facebook/sam-audio-base-tv) | |
| |
|
| | ## Files |
| |
|
| | | File | Description | |
| | |---|---| |
| | | `sam-audio-large-tv-bf16.safetensors` | Large-TV model weights (BF16) | |
| | | `sam-audio-base-tv-bf16.safetensors` | Base-TV model weights (BF16) | |
| | | `config.json` | Model configuration | |
| | | `LICENSE` | SAM License (required for redistribution) | |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | # With ComfyUI-FFMPEGA (automatic download) |
| | # Set no_llm_mode = "audio_separate" and prompt = "vocals" |
| | |
| | # Or standalone: |
| | from sam_audio import SAMAudio |
| | model = SAMAudio.from_pretrained("path/to/this/repo") |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is distributed under the **SAM License** — see the [LICENSE](LICENSE) file. Key points: |
| |
|
| | - ✅ Commercial use permitted |
| | - ✅ Redistribution permitted (with license included) |
| | - ✅ Derivative works permitted |
| | - ❌ No military/warfare, nuclear, or espionage use |
| | - ❌ No reverse engineering |
| |
|
| | ## Credits |
| |
|
| | - **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio) |
| | - **Paper**: *SAM-Audio: Segment Anything in Audio* |
| | - **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA) |
| |
|