Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -15,9 +15,9 @@ pipeline_tag: audio-to-audio
|
|
| 15 |
base_model: facebook/sam-audio-large-tv
|
| 16 |
---
|
| 17 |
|
| 18 |
-
# SAM-Audio
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
## What is SAM-Audio?
|
| 23 |
|
|
@@ -27,26 +27,24 @@ SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for *
|
|
| 27 |
- **Visual prompts** — point at objects in video to extract their sound
|
| 28 |
- **Span prompts** — specify time ranges where the target sound occurs
|
| 29 |
|
| 30 |
-
The `-tv`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Files
|
| 33 |
|
| 34 |
| File | Description |
|
| 35 |
|---|---|
|
| 36 |
-
| `sam-audio-large-tv-bf16.safetensors` |
|
|
|
|
| 37 |
| `config.json` | Model configuration |
|
| 38 |
| `LICENSE` | SAM License (required for redistribution) |
|
| 39 |
|
| 40 |
-
## Model Info
|
| 41 |
-
|
| 42 |
-
| Property | Value |
|
| 43 |
-
|---|---|
|
| 44 |
-
| Source | [`facebook/sam-audio-large-tv`](https://huggingface.co/facebook/sam-audio-large-tv) |
|
| 45 |
-
| Dtype | `bf16` (`torch.bfloat16`) |
|
| 46 |
-
| Parameters | 3,715,221,638 |
|
| 47 |
-
| File size | 6.92 GiB (original: 13.84 GiB) |
|
| 48 |
-
| Sample rate | 48,000 Hz |
|
| 49 |
-
|
| 50 |
## Usage
|
| 51 |
|
| 52 |
```python
|
|
@@ -71,6 +69,5 @@ This model is distributed under the **SAM License** — see the [LICENSE](LICENS
|
|
| 71 |
## Credits
|
| 72 |
|
| 73 |
- **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
|
| 74 |
-
- **Original HuggingFace repo**: [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv)
|
| 75 |
- **Paper**: *SAM-Audio: Segment Anything in Audio*
|
| 76 |
- **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)
|
|
|
|
| 15 |
base_model: facebook/sam-audio-large-tv
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# SAM-Audio Models (BF16 Safetensors)
|
| 19 |
|
| 20 |
+
**Ungated mirrors** of Meta's [SAM-Audio](https://github.com/facebookresearch/sam-audio) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.
|
| 21 |
|
| 22 |
## What is SAM-Audio?
|
| 23 |
|
|
|
|
| 27 |
- **Visual prompts** — point at objects in video to extract their sound
|
| 28 |
- **Span prompts** — specify time ranges where the target sound occurs
|
| 29 |
|
| 30 |
+
The `-tv` variants are optimized for **target correctness** and **visual prompting**.
|
| 31 |
+
|
| 32 |
+
## Available Models
|
| 33 |
+
|
| 34 |
+
| Model | Parameters | File Size | Original |
|
| 35 |
+
|---|---|---|---|
|
| 36 |
+
| `sam-audio-large-tv-bf16.safetensors` | 3,715,221,638 | 6.92 GiB | [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv) |
|
| 37 |
+
| `sam-audio-base-tv-bf16.safetensors` | 1,931,243,654 | 3.60 GiB | [facebook/sam-audio-base-tv](https://huggingface.co/facebook/sam-audio-base-tv) |
|
| 38 |
|
| 39 |
## Files
|
| 40 |
|
| 41 |
| File | Description |
|
| 42 |
|---|---|
|
| 43 |
+
| `sam-audio-large-tv-bf16.safetensors` | Large-TV model weights (BF16) |
|
| 44 |
+
| `sam-audio-base-tv-bf16.safetensors` | Base-TV model weights (BF16) |
|
| 45 |
| `config.json` | Model configuration |
|
| 46 |
| `LICENSE` | SAM License (required for redistribution) |
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## Usage
|
| 49 |
|
| 50 |
```python
|
|
|
|
| 69 |
## Credits
|
| 70 |
|
| 71 |
- **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
|
|
|
|
| 72 |
- **Paper**: *SAM-Audio: Segment Anything in Audio*
|
| 73 |
- **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)
|