prince-canuma commited on
Commit
060fd4a
·
verified ·
1 Parent(s): d20305b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ base_model:
6
+ - facebook/sam-audio-base
7
+ pipeline_tag: audio-to-audio
8
+ library_name: mlx-audio
9
+ tags:
10
+ - audio-to-audio
11
+ - speech
12
+ - speech generation
13
+ - voice isolation
14
+ - mlx
15
+ ---
16
+ # mlx-community/sam-audio-base
17
+ This model was converted to MLX format from [`facebook/sam-audio-base`](https://huggingface.co/facebook/sam-audio-base) using mlx-audio version **0.3.2**.
18
+ Refer to the [original model card](https://huggingface.co/facebook/sam-audio-base) for more details on the model.
19
+
20
+ ## Use with mlx
21
+ ```bash
22
+ pip install -U mlx-audio
23
+ ```
24
+
25
+ ## Voice Isolation:
26
+ ```python
27
+ from mlx_audio.sts import SAMAudio, SAMAudioProcessor, save_audio
28
+ import mlx.core as mx
29
+
30
+ # Load model and processor
31
+ processor = SAMAudioProcessor.from_pretrained("mlx-community/sam-audio-base")
32
+ model = SAMAudio.from_pretrained("mlx-community/sam-audio-base")
33
+
34
+ # Process inputs
35
+ batch = processor(
36
+ descriptions=["speech"],
37
+ audios=["path/to/audio.mp3"],
38
+ # anchors=[[("+ ", 0.2, 0.5)]], # Optional: temporal
39
+ )
40
+
41
+ # Separate audio
42
+ result = model.separate(
43
+ audios=batch.audios,
44
+ descriptions=batch.descriptions,
45
+ sizes=batch.sizes,
46
+ anchor_ids=batch.anchor_ids,
47
+ anchor_alignment=batch.anchor_alignment,
48
+ ode_decode_chunk_size=50, # Chunked decoding for memory efficiency
49
+ )
50
+
51
+ # For long audio files, use separate_long().
52
+ # Note: This is slower than separate() but it is more memory efficient.
53
+ # result = model.separate_long(
54
+ # audios=batch.audios,
55
+ # descriptions=batch.descriptions,
56
+ # chunk_seconds=10.0,
57
+ # overlap_seconds=3.0,
58
+ # anchor_ids=batch.anchor_ids,
59
+ # anchor_alignment=batch.anchor_alignment,
60
+ # ode_decode_chunk_size=50, # Chunked decoding for memory efficiency
61
+ # )
62
+
63
+ # Save output
64
+ ## Isolated speech
65
+ save_audio(result.target[0], "separated.wav", sample_rate=model.sample_rate)
66
+
67
+ ## Residual audio (background music/noise/other sounds)
68
+ save_audio(result.residual[0], "residual.wav", sample_rate=model.sample_rate)
69
+
70
+ # Check memory usage
71
+ print(f"Peak memory: {result.peak_memory:.2f} GB")
72
+ ```