File size: 2,037 Bytes
1be2e40
4ca84ac
1be2e40
 
 
4ca84ac
 
1be2e40
 
4ca84ac
1be2e40
 
 
 
 
4ca84ac
 
 
1be2e40
 
 
 
 
 
 
 
 
 
 
 
4ca84ac
 
1be2e40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: other
language:
- en
base_model:
- facebook/sam-audio-small
pipeline_tag: audio-to-audio
library_name: mlx-audio
tags:
- audio-to-audio
- speech
- speech generation
- voice isolation
- mlx
---
# mlx-community/sam-audio-small
This model was converted to MLX format from [`facebook/sam-audio-small`](https://huggingface.co/facebook/sam-audio-small) using mlx-audio version **0.2.10**.
Refer to the [original model card](https://huggingface.co/facebook/sam-audio-small) for more details on the model.

## Use with mlx
```bash
pip install -U mlx-audio
```

## Voice Isolation:
```python
from mlx_audio.sts import SAMAudio, SAMAudioProcessor, save_audio
import mlx.core as mx

# Load model and processor
processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-small")
model = SAMAudio.from_pretrained("facebook/sam-audio-small")

# Process inputs
batch = processor(
    descriptions=["speech"],
    audios=["path/to/audio.mp3"],
    # anchors=[[("+", 0.2, 0.5)]],  # Optional: temporal
)

# Separate audio
result = model.separate(
    audios=batch.audios,
    descriptions=batch.descriptions,
    sizes=batch.sizes,
    anchor_ids=batch.anchor_ids,
    anchor_alignment=batch.anchor_alignment,
    ode_decode_chunk_size=50,  # Chunked decoding for memory efficiency
)

# For long audio files, use separate_long().
# Note: This is slower than separate() but it is more memory efficient.
# result = model.separate_long(
#     audios=batch.audios,
#     descriptions=batch.descriptions,
#     chunk_seconds=10.0,
#     overlap_seconds=3.0,
#     anchor_ids=batch.anchor_ids,
#     anchor_alignment=batch.anchor_alignment,
#     ode_decode_chunk_size=50,  # Chunked decoding for memory efficiency
# )

# Save output
## Isolated speech
save_audio(result.target[0], "separated.wav", sample_rate=model.sample_rate)

## Residual audio (background music/noise/other sounds)
save_audio(result.residual[0], "residual.wav", sample_rate=model.sample_rate)

# Check memory usage
print(f"Peak memory: {result.peak_memory:.2f} GB")
```