Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: sam-license
|
| 4 |
+
license_link: LICENSE
|
| 5 |
+
tags:
|
| 6 |
+
- audio
|
| 7 |
+
- audio-separation
|
| 8 |
+
- sound-separation
|
| 9 |
+
- sam-audio
|
| 10 |
+
- meta
|
| 11 |
+
- pytorch
|
| 12 |
+
- safetensors
|
| 13 |
+
- bf16
|
| 14 |
+
pipeline_tag: audio-to-audio
|
| 15 |
+
base_model: facebook/sam-audio-large-tv
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# SAM-Audio Large-TV (BF16)
|
| 19 |
+
|
| 20 |
+
This is an **ungated mirror** of Meta's [SAM-Audio Large-TV](https://huggingface.co/facebook/sam-audio-large-tv) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.
|
| 21 |
+
|
| 22 |
+
## What is SAM-Audio?
|
| 23 |
+
|
| 24 |
+
SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for **isolating any sound in audio** using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures.
|
| 25 |
+
|
| 26 |
+
- **Text prompts** — isolate sounds by describing them (e.g. *"drums"*, *"vocals"*, *"piano"*)
|
| 27 |
+
- **Visual prompts** — point at objects in video to extract their sound
|
| 28 |
+
- **Span prompts** — specify time ranges where the target sound occurs
|
| 29 |
+
|
| 30 |
+
The `-tv` variant is optimized for **target correctness** and **visual prompting**.
|
| 31 |
+
|
| 32 |
+
## Files
|
| 33 |
+
|
| 34 |
+
| File | Description |
|
| 35 |
+
|---|---|
|
| 36 |
+
| `sam-audio-large-tv-bf16.safetensors` | Model weights (BF16 safetensors format) |
|
| 37 |
+
| `config.json` | Model configuration |
|
| 38 |
+
| `LICENSE` | SAM License (required for redistribution) |
|
| 39 |
+
|
| 40 |
+
## Model Info
|
| 41 |
+
|
| 42 |
+
| Property | Value |
|
| 43 |
+
|---|---|
|
| 44 |
+
| Source | [`facebook/sam-audio-large-tv`](https://huggingface.co/facebook/sam-audio-large-tv) |
|
| 45 |
+
| Dtype | `bf16` (`torch.bfloat16`) |
|
| 46 |
+
| Parameters | 3,715,221,638 |
|
| 47 |
+
| File size | 6.92 GiB (original: 13.84 GiB) |
|
| 48 |
+
| Sample rate | 48,000 Hz |
|
| 49 |
+
|
| 50 |
+
## Usage
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
# With ComfyUI-FFMPEGA (automatic download)
|
| 54 |
+
# Set no_llm_mode = "audio_separate" and prompt = "vocals"
|
| 55 |
+
|
| 56 |
+
# Or standalone:
|
| 57 |
+
from sam_audio import SAMAudio
|
| 58 |
+
model = SAMAudio.from_pretrained("path/to/this/repo")
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## License
|
| 62 |
+
|
| 63 |
+
This model is distributed under the **SAM License** — see the [LICENSE](LICENSE) file. Key points:
|
| 64 |
+
|
| 65 |
+
- ✅ Commercial use permitted
|
| 66 |
+
- ✅ Redistribution permitted (with license included)
|
| 67 |
+
- ✅ Derivative works permitted
|
| 68 |
+
- ❌ No military/warfare, nuclear, or espionage use
|
| 69 |
+
- ❌ No reverse engineering
|
| 70 |
+
|
| 71 |
+
## Credits
|
| 72 |
+
|
| 73 |
+
- **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
|
| 74 |
+
- **Original HuggingFace repo**: [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv)
|
| 75 |
+
- **Paper**: *SAM-Audio: Segment Anything in Audio*
|
| 76 |
+
- **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)
|