sam-audio-models / README.md
AEmotionStudio's picture
Upload README.md with huggingface_hub
9a76fcd verified
|
raw
history blame
2.58 kB
metadata
license: other
license_name: sam-license
license_link: LICENSE
tags:
  - audio
  - audio-separation
  - sound-separation
  - sam-audio
  - meta
  - pytorch
  - safetensors
  - bf16
pipeline_tag: audio-to-audio
base_model: facebook/sam-audio-large-tv

SAM-Audio Large-TV (BF16)

This is an ungated mirror of Meta's SAM-Audio Large-TV model weights, converted to BF16 safetensors format and redistributed under the SAM License for easier access.

What is SAM-Audio?

SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for isolating any sound in audio using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures.

  • Text prompts — isolate sounds by describing them (e.g. "drums", "vocals", "piano")
  • Visual prompts — point at objects in video to extract their sound
  • Span prompts — specify time ranges where the target sound occurs

The -tv variant is optimized for target correctness and visual prompting.

Files

File Description
sam-audio-large-tv-bf16.safetensors Model weights (BF16 safetensors format)
config.json Model configuration
LICENSE SAM License (required for redistribution)

Model Info

Property Value
Source facebook/sam-audio-large-tv
Dtype bf16 (torch.bfloat16)
Parameters 3,715,221,638
File size 6.92 GiB (original: 13.84 GiB)
Sample rate 48,000 Hz

Usage

# With ComfyUI-FFMPEGA (automatic download)
# Set no_llm_mode = "audio_separate" and prompt = "vocals"

# Or standalone:
from sam_audio import SAMAudio
model = SAMAudio.from_pretrained("path/to/this/repo")

License

This model is distributed under the SAM License — see the LICENSE file. Key points:

  • ✅ Commercial use permitted
  • ✅ Redistribution permitted (with license included)
  • ✅ Derivative works permitted
  • ❌ No military/warfare, nuclear, or espionage use
  • ❌ No reverse engineering

Credits