|
|
--- |
|
|
license: other |
|
|
license_name: license |
|
|
license_link: LICENSE |
|
|
pipeline_tag: voice-activity-detection |
|
|
--- |
|
|
|
|
|
# MMM — Multi-Mixture Model for Speaker Identification |
|
|
|
|
|
**MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component. |
|
|
|
|
|
The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application. |
|
|
|
|
|
**Designed and trained by:** **Chance Brownfield** |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Model type:** Hybrid generative sequential model |
|
|
- **Framework:** PyTorch |
|
|
- **Primary domain:** Audio / time-series |
|
|
- **Main use case:** Speaker identification and embedding extraction |
|
|
- **Input:** 1-D audio signals or time-series features |
|
|
- **Output:** Latent embeddings, likelihood scores, predictions |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture Summary |
|
|
|
|
|
### VariationalRecurrentMarkovGaussianTransformer |
|
|
|
|
|
The core MMM model integrates: |
|
|
|
|
|
- **Variational Autoencoder (VAE)** |
|
|
Encodes each time step into a latent variable and reconstructs the input. |
|
|
|
|
|
- **RNN Emission Network** |
|
|
Produces emission parameters for the HMM from latent sequences. |
|
|
|
|
|
- **Hidden Markov Model (HMM)** |
|
|
Models temporal structure in latent space using Gaussian Mixture emissions. |
|
|
|
|
|
- **Gaussian Mixture Models (GMMs)** |
|
|
Used both internally (HMM emissions) and externally for speaker enrollment. |
|
|
|
|
|
- **Transformer** |
|
|
Operates on latent sequences for recognition or domain mapping. |
|
|
|
|
|
- **Latent Weight Vectors** |
|
|
Learnable vectors: |
|
|
- `pred_weights` |
|
|
- `recog_weights` |
|
|
- `gen_weights` |
|
|
Used to reweight latent dimensions for prediction, recognition, and generation. |
|
|
|
|
|
## Capabilities |
|
|
|
|
|
- **Embedding extraction** for speaker identification |
|
|
- **Speaker enrollment** using GMM, HMM, or full MMM models |
|
|
- **Sequence prediction** |
|
|
- **Latent sequence generation** via HMM sampling |
|
|
- **Recognition / mapping** using Transformer layers |
|
|
|
|
|
--- |
|
|
|
|
|
## Repository Contents |
|
|
|
|
|
### `MMM.py` |
|
|
Core model definitions and manager classes: |
|
|
- `MMTransformer` |
|
|
- `MMModel` |
|
|
- `MMM` |
|
|
|
|
|
### `ASI.py` |
|
|
Automatic Speaker identification wrapper: |
|
|
- Generates embeddings |
|
|
- Enrolls speakers using GMM/HMM/MMM |
|
|
- Scores and identifies query audio |
|
|
|
|
|
### Clone the repository |
|
|
|
|
|
```bash |
|
|
git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID |
|
|
``` |
|
|
|
|
|
## Using the Pre-Trained Model |
|
|
|
|
|
### Load a Saved Model |
|
|
|
|
|
```python |
|
|
from MMM import MMM |
|
|
|
|
|
manager = MMM.load("mmm.pt") |
|
|
base_model = manager.models["unknown"] |
|
|
base_model.eval() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### Load from Hugging Face Hub |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
from MMM import MMM |
|
|
|
|
|
pt_file = hf_hub_download( |
|
|
repo_id="username/Multi-Mixture_Speaker_ID", |
|
|
filename="mmm.pt" |
|
|
) |
|
|
|
|
|
manager = MMM.load(pt_file) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Speaker Identification |
|
|
|
|
|
### Generate an Embedding |
|
|
|
|
|
```python |
|
|
from ASI import Speaker_ID |
|
|
|
|
|
speaker_system = Speaker_ID( |
|
|
mmm_manager=manager, |
|
|
base_model_id="unknown", |
|
|
seq_len=1200, |
|
|
sr=1200, |
|
|
) |
|
|
|
|
|
embedding = speaker_system.generate_embedding("audio.wav") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### Enroll a Speaker |
|
|
|
|
|
```python |
|
|
speaker_system.enroll_speaker( |
|
|
speaker_id="Alice", |
|
|
audio_input="alice.wav", |
|
|
model_type="gmm", |
|
|
n_components=4, |
|
|
epochs=50, |
|
|
lr=1e-3, |
|
|
) |
|
|
``` |
|
|
|
|
|
Supported `model_type` values: |
|
|
|
|
|
* `"gmm"` |
|
|
* `"hmm"` |
|
|
* `"mmm"` |
|
|
|
|
|
--- |
|
|
|
|
|
### Identify a Query |
|
|
|
|
|
```python |
|
|
best_speaker, best_score, scores = speaker_system.identify("query.wav") |
|
|
|
|
|
print("Predicted speaker:", best_speaker) |
|
|
print("Scores:", scores) |
|
|
``` |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
* Performance depends heavily on audio quality and data distribution |
|
|
* Out-of-distribution speakers and noisy recordings may reduce accuracy |
|
|
* Speaker identification involves biometric data — use responsibly and with consent |
|
|
* Not intended for high-stakes or security-critical deployment without extensive validation |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
### Dual License: Non-Commercial Free Use + Commercial License Required |
|
|
|
|
|
**Non-Commercial Use (Free):** |
|
|
|
|
|
* Research |
|
|
* Education |
|
|
* Personal projects |
|
|
* Non-monetized demos |
|
|
* Open-source experimentation |
|
|
|
|
|
Attribution to **Chance Brownfield** is required. |
|
|
|
|
|
**Commercial Use (Permission Required):** |
|
|
|
|
|
* SaaS products |
|
|
* Paid APIs |
|
|
* Monetized applications |
|
|
* Enterprise/internal commercial tools |
|
|
* Advertising-supported systems |
|
|
|
|
|
Unauthorized commercial use is prohibited. |
|
|
|
|
|
**Author:** Chance Brownfield |
|
|
**Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me) |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this work, please credit: |
|
|
|
|
|
> Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*. |
|
|
|
|
|
--- |
|
|
|
|
|
## Author |
|
|
|
|
|
**Chance Brownfield** |
|
|
Designer and trainer of the MMM architecture |
|
|
Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me) |
|
|
|
|
|
``` |
|
|
|