AEmotionStudio commited on
Commit
9a76fcd
·
verified ·
1 Parent(s): b556a11

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: sam-license
4
+ license_link: LICENSE
5
+ tags:
6
+ - audio
7
+ - audio-separation
8
+ - sound-separation
9
+ - sam-audio
10
+ - meta
11
+ - pytorch
12
+ - safetensors
13
+ - bf16
14
+ pipeline_tag: audio-to-audio
15
+ base_model: facebook/sam-audio-large-tv
16
+ ---
17
+
18
+ # SAM-Audio Large-TV (BF16)
19
+
20
+ This is an **ungated mirror** of Meta's [SAM-Audio Large-TV](https://huggingface.co/facebook/sam-audio-large-tv) model weights, converted to BF16 safetensors format and redistributed under the [SAM License](LICENSE) for easier access.
21
+
22
+ ## What is SAM-Audio?
23
+
24
+ SAM-Audio (Segment Anything Model for Audio) is Meta AI's foundation model for **isolating any sound in audio** using text, visual, or temporal prompts. It can separate specific sounds from complex audio mixtures.
25
+
26
+ - **Text prompts** — isolate sounds by describing them (e.g. *"drums"*, *"vocals"*, *"piano"*)
27
+ - **Visual prompts** — point at objects in video to extract their sound
28
+ - **Span prompts** — specify time ranges where the target sound occurs
29
+
30
+ The `-tv` variant is optimized for **target correctness** and **visual prompting**.
31
+
32
+ ## Files
33
+
34
+ | File | Description |
35
+ |---|---|
36
+ | `sam-audio-large-tv-bf16.safetensors` | Model weights (BF16 safetensors format) |
37
+ | `config.json` | Model configuration |
38
+ | `LICENSE` | SAM License (required for redistribution) |
39
+
40
+ ## Model Info
41
+
42
+ | Property | Value |
43
+ |---|---|
44
+ | Source | [`facebook/sam-audio-large-tv`](https://huggingface.co/facebook/sam-audio-large-tv) |
45
+ | Dtype | `bf16` (`torch.bfloat16`) |
46
+ | Parameters | 3,715,221,638 |
47
+ | File size | 6.92 GiB (original: 13.84 GiB) |
48
+ | Sample rate | 48,000 Hz |
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ # With ComfyUI-FFMPEGA (automatic download)
54
+ # Set no_llm_mode = "audio_separate" and prompt = "vocals"
55
+
56
+ # Or standalone:
57
+ from sam_audio import SAMAudio
58
+ model = SAMAudio.from_pretrained("path/to/this/repo")
59
+ ```
60
+
61
+ ## License
62
+
63
+ This model is distributed under the **SAM License** — see the [LICENSE](LICENSE) file. Key points:
64
+
65
+ - ✅ Commercial use permitted
66
+ - ✅ Redistribution permitted (with license included)
67
+ - ✅ Derivative works permitted
68
+ - ❌ No military/warfare, nuclear, or espionage use
69
+ - ❌ No reverse engineering
70
+
71
+ ## Credits
72
+
73
+ - **Original model by**: [Meta AI (FAIR)](https://github.com/facebookresearch/sam-audio)
74
+ - **Original HuggingFace repo**: [facebook/sam-audio-large-tv](https://huggingface.co/facebook/sam-audio-large-tv)
75
+ - **Paper**: *SAM-Audio: Segment Anything in Audio*
76
+ - **Redistributed by**: [Æmotion Studio](https://huggingface.co/AEmotionStudio) for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA)