|
|
--- |
|
|
license: other |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- facebook/sam-audio-small |
|
|
pipeline_tag: audio-to-audio |
|
|
library_name: mlx-audio |
|
|
tags: |
|
|
- audio-to-audio |
|
|
- speech |
|
|
- speech generation |
|
|
- voice isolation |
|
|
- mlx |
|
|
--- |
|
|
# mlx-community/sam-audio-small |
|
|
This model was converted to MLX format from [`facebook/sam-audio-small`](https://huggingface.co/facebook/sam-audio-small) using mlx-audio version **0.2.10**. |
|
|
Refer to the [original model card](https://huggingface.co/facebook/sam-audio-small) for more details on the model. |
|
|
|
|
|
## Use with mlx |
|
|
```bash |
|
|
pip install -U mlx-audio |
|
|
``` |
|
|
|
|
|
## Voice Isolation: |
|
|
```python |
|
|
from mlx_audio.sts import SAMAudio, SAMAudioProcessor, save_audio |
|
|
import mlx.core as mx |
|
|
|
|
|
# Load model and processor |
|
|
processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-small") |
|
|
model = SAMAudio.from_pretrained("facebook/sam-audio-small") |
|
|
|
|
|
# Process inputs |
|
|
batch = processor( |
|
|
descriptions=["speech"], |
|
|
audios=["path/to/audio.mp3"], |
|
|
# anchors=[[("+", 0.2, 0.5)]], # Optional: temporal |
|
|
) |
|
|
|
|
|
# Separate audio |
|
|
result = model.separate( |
|
|
audios=batch.audios, |
|
|
descriptions=batch.descriptions, |
|
|
sizes=batch.sizes, |
|
|
anchor_ids=batch.anchor_ids, |
|
|
anchor_alignment=batch.anchor_alignment, |
|
|
ode_decode_chunk_size=50, # Chunked decoding for memory efficiency |
|
|
) |
|
|
|
|
|
# For long audio files, use separate_long(). |
|
|
# Note: This is slower than separate() but it is more memory efficient. |
|
|
# result = model.separate_long( |
|
|
# audios=batch.audios, |
|
|
# descriptions=batch.descriptions, |
|
|
# chunk_seconds=10.0, |
|
|
# overlap_seconds=3.0, |
|
|
# anchor_ids=batch.anchor_ids, |
|
|
# anchor_alignment=batch.anchor_alignment, |
|
|
# ode_decode_chunk_size=50, # Chunked decoding for memory efficiency |
|
|
# ) |
|
|
|
|
|
# Save output |
|
|
## Isolated speech |
|
|
save_audio(result.target[0], "separated.wav", sample_rate=model.sample_rate) |
|
|
|
|
|
## Residual audio (background music/noise/other sounds) |
|
|
save_audio(result.residual[0], "residual.wav", sample_rate=model.sample_rate) |
|
|
|
|
|
# Check memory usage |
|
|
print(f"Peak memory: {result.peak_memory:.2f} GB") |
|
|
``` |