File size: 2,620 Bytes
9a76fcd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f3889cd
9a76fcd
f3889cd
9a76fcd
 
 
 
 
 
 
 
 
f3889cd
 
 
 
 
 
 
 
9a76fcd
 
 
 
 
f3889cd
 
9a76fcd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: other
license_name: sam-license
license_link: LICENSE
tags:
  - audio
  - audio-separation
  - sound-separation
  - sam-audio
  - meta
  - pytorch
  - safetensors
  - bf16
pipeline_tag: audio-to-audio
base_model: facebook/sam-audio-large-tv
---

# SAM-Audio Models (BF16 Safetensors)

**Ungated mirrors** of Meta's [SAM-Audio](https://github.com/facebookresearch/sam-audio) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.

## What is SAM-Audio?

SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for **isolating any sound in audio** using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures.

- **Text prompts** — isolate sounds by describing them (e.g. *"drums"*, *"vocals"*, *"piano"*)
- **Visual prompts** — point at objects in video to extract their sound
- **Span prompts** — specify time ranges where the target sound occurs

The `-tv` variants are optimized for **target correctness** and **visual prompting**.

## Available Models

| Model | Parameters | File Size | Original |
|---|---|---|---|
| `sam-audio-large-tv-bf16.safetensors` | 3,715,221,638 | 6.92 GiB | [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv) |
| `sam-audio-base-tv-bf16.safetensors` | 1,931,243,654 | 3.60 GiB | [facebook/sam-audio-base-tv](https://huggingface.co/facebook/sam-audio-base-tv) |

## Files

| File | Description |
|---|---|
| `sam-audio-large-tv-bf16.safetensors` | Large-TV model weights (BF16) |
| `sam-audio-base-tv-bf16.safetensors` | Base-TV model weights (BF16) |
| `config.json` | Model configuration |
| `LICENSE` | SAM License (required for redistribution) |

## Usage

```python
# With ComfyUI-FFMPEGA (automatic download)
# Set no_llm_mode = "audio_separate" and prompt = "vocals"

# Or standalone:
from sam_audio import SAMAudio
model = SAMAudio.from_pretrained("path/to/this/repo")
```

## License

This model is distributed under the **SAM License** — see the [LICENSE](LICENSE) file. Key points:

- ✅ Commercial use permitted
- ✅ Redistribution permitted (with license included)
- ✅ Derivative works permitted
- ❌ No military/warfare, nuclear, or espionage use
- ❌ No reverse engineering

## Credits

- **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
- **Paper**: *SAM-Audio: Segment Anything in Audio*
- **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)