File size: 2,037 Bytes
1be2e40 4ca84ac 1be2e40 4ca84ac 1be2e40 4ca84ac 1be2e40 4ca84ac 1be2e40 4ca84ac 1be2e40 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: other
language:
- en
base_model:
- facebook/sam-audio-small
pipeline_tag: audio-to-audio
library_name: mlx-audio
tags:
- audio-to-audio
- speech
- speech generation
- voice isolation
- mlx
---
# mlx-community/sam-audio-small
This model was converted to MLX format from [`facebook/sam-audio-small`](https://huggingface.co/facebook/sam-audio-small) using mlx-audio version **0.2.10**.
Refer to the [original model card](https://huggingface.co/facebook/sam-audio-small) for more details on the model.
## Use with mlx
```bash
pip install -U mlx-audio
```
## Voice Isolation:
```python
from mlx_audio.sts import SAMAudio, SAMAudioProcessor, save_audio
import mlx.core as mx
# Load model and processor
processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-small")
model = SAMAudio.from_pretrained("facebook/sam-audio-small")
# Process inputs
batch = processor(
descriptions=["speech"],
audios=["path/to/audio.mp3"],
# anchors=[[("+", 0.2, 0.5)]], # Optional: temporal
)
# Separate audio
result = model.separate(
audios=batch.audios,
descriptions=batch.descriptions,
sizes=batch.sizes,
anchor_ids=batch.anchor_ids,
anchor_alignment=batch.anchor_alignment,
ode_decode_chunk_size=50, # Chunked decoding for memory efficiency
)
# For long audio files, use separate_long().
# Note: This is slower than separate() but it is more memory efficient.
# result = model.separate_long(
# audios=batch.audios,
# descriptions=batch.descriptions,
# chunk_seconds=10.0,
# overlap_seconds=3.0,
# anchor_ids=batch.anchor_ids,
# anchor_alignment=batch.anchor_alignment,
# ode_decode_chunk_size=50, # Chunked decoding for memory efficiency
# )
# Save output
## Isolated speech
save_audio(result.target[0], "separated.wav", sample_rate=model.sample_rate)
## Residual audio (background music/noise/other sounds)
save_audio(result.residual[0], "residual.wav", sample_rate=model.sample_rate)
# Check memory usage
print(f"Peak memory: {result.peak_memory:.2f} GB")
``` |