AEmotionStudio
/

sam-audio-models

@@ -15,9 +15,9 @@ pipeline_tag: audio-to-audio
 base_model: facebook/sam-audio-large-tv
 ---
-# SAM-Audio Large-TV (BF16)
-This is an **ungated mirror** of Meta's [SAM-Audio Large-TV](https://huggingface.co/facebook/sam-audio-large-tv) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.
 ## What is SAM-Audio?
@@ -27,26 +27,24 @@ SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for *
 - **Visual prompts** — point at objects in video to extract their sound
 - **Span prompts** — specify time ranges where the target sound occurs
-The `-tv` variant is optimized for **target correctness** and **visual prompting**.
 ## Files
 | File | Description |
 |---|---|
-| `sam-audio-large-tv-bf16.safetensors` | Model weights (BF16 safetensors format) |
 | `config.json` | Model configuration |
 | `LICENSE` | SAM License (required for redistribution) |
-## Model Info
-| Property | Value |
-|---|---|
-| Source | [`facebook/sam-audio-large-tv`](https://huggingface.co/facebook/sam-audio-large-tv) |
-| Dtype | `bf16` (`torch.bfloat16`) |
-| Parameters | 3,715,221,638 |
-| File size | 6.92 GiB (original: 13.84 GiB) |
-| Sample rate | 48,000 Hz |
 ## Usage
 ```python
@@ -71,6 +69,5 @@ This model is distributed under the **SAM License** — see the [LICENSE](LICENS
 ## Credits
 - **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
-- **Original HuggingFace repo**: [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv)
 - **Paper**: *SAM-Audio: Segment Anything in Audio*
 - **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)

 base_model: facebook/sam-audio-large-tv
 ---
+# SAM-Audio Models (BF16 Safetensors)
+**Ungated mirrors** of Meta's [SAM-Audio](https://github.com/facebookresearch/sam-audio) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.
 ## What is SAM-Audio?
 - **Visual prompts** — point at objects in video to extract their sound
 - **Span prompts** — specify time ranges where the target sound occurs
+The `-tv` variants are optimized for **target correctness** and **visual prompting**.
+## Available Models
+| Model | Parameters | File Size | Original |
+|---|---|---|---|
+| `sam-audio-large-tv-bf16.safetensors` | 3,715,221,638 | 6.92 GiB | [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv) |
+| `sam-audio-base-tv-bf16.safetensors` | 1,931,243,654 | 3.60 GiB | [facebook/sam-audio-base-tv](https://huggingface.co/facebook/sam-audio-base-tv) |
 ## Files
 | File | Description |
 |---|---|
+| `sam-audio-large-tv-bf16.safetensors` | Large-TV model weights (BF16) |
+| `sam-audio-base-tv-bf16.safetensors` | Base-TV model weights (BF16) |
 | `config.json` | Model configuration |
 | `LICENSE` | SAM License (required for redistribution) |
 ## Usage
 ```python
 ## Credits
 - **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
 - **Paper**: *SAM-Audio: Segment Anything in Audio*
 - **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)