File size: 4,956 Bytes

f325d89

---
license: other
license_name: license
license_link: LICENSE
pipeline_tag: voice-activity-detection
---

# MMM — Multi-Mixture Model for Speaker Identification

**MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component.

The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.

**Designed and trained by:** **Chance Brownfield**

---

## Model Overview

- **Model type:** Hybrid generative sequential model  
- **Framework:** PyTorch  
- **Primary domain:** Audio / time-series  
- **Main use case:** Speaker identification and embedding extraction  
- **Input:** 1-D audio signals or time-series features  
- **Output:** Latent embeddings, likelihood scores, predictions  

---

## Architecture Summary

### VariationalRecurrentMarkovGaussianTransformer

The core MMM model integrates:

- **Variational Autoencoder (VAE)**  
  Encodes each time step into a latent variable and reconstructs the input.

- **RNN Emission Network**  
  Produces emission parameters for the HMM from latent sequences.

- **Hidden Markov Model (HMM)**  
  Models temporal structure in latent space using Gaussian Mixture emissions.

- **Gaussian Mixture Models (GMMs)**  
  Used both internally (HMM emissions) and externally for speaker enrollment.

- **Transformer**  
  Operates on latent sequences for recognition or domain mapping.

- **Latent Weight Vectors**  
  Learnable vectors:
  - `pred_weights`
  - `recog_weights`
  - `gen_weights`  
  Used to reweight latent dimensions for prediction, recognition, and generation.

## Capabilities

- **Embedding extraction** for speaker identification  
- **Speaker enrollment** using GMM, HMM, or full MMM models  
- **Sequence prediction**  
- **Latent sequence generation** via HMM sampling  
- **Recognition / mapping** using Transformer layers  

---

## Repository Contents

### `MMM.py`
Core model definitions and manager classes:
- `MMTransformer`
- `MMModel`
- `MMM`

### `ASI.py`
Automatic Speaker identification wrapper:
- Generates embeddings
- Enrolls speakers using GMM/HMM/MMM
- Scores and identifies query audio

### Clone the repository

```bash
git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
```

## Using the Pre-Trained Model

### Load a Saved Model

```python
from MMM import MMM

manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()
```

---

### Load from Hugging Face Hub

```python
from huggingface_hub import hf_hub_download
from MMM import MMM

pt_file = hf_hub_download(
    repo_id="username/Multi-Mixture_Speaker_ID",
    filename="mmm.pt"
)

manager = MMM.load(pt_file)
```

---

## Speaker Identification

### Generate an Embedding

```python
from ASI import Speaker_ID

speaker_system = Speaker_ID(
    mmm_manager=manager,
    base_model_id="unknown",
    seq_len=1200,
    sr=1200,
)

embedding = speaker_system.generate_embedding("audio.wav")
```

---

### Enroll a Speaker

```python
speaker_system.enroll_speaker(
    speaker_id="Alice",
    audio_input="alice.wav",
    model_type="gmm",
    n_components=4,
    epochs=50,
    lr=1e-3,
)
```

Supported `model_type` values:

* `"gmm"`
* `"hmm"`
* `"mmm"`

---

### Identify a Query

```python
best_speaker, best_score, scores = speaker_system.identify("query.wav")

print("Predicted speaker:", best_speaker)
print("Scores:", scores)
```

## Bias, Risks, and Limitations

* Performance depends heavily on audio quality and data distribution
* Out-of-distribution speakers and noisy recordings may reduce accuracy
* Speaker identification involves biometric data — use responsibly and with consent
* Not intended for high-stakes or security-critical deployment without extensive validation

---

## License

### Dual License: Non-Commercial Free Use + Commercial License Required

**Non-Commercial Use (Free):**

* Research
* Education
* Personal projects
* Non-monetized demos
* Open-source experimentation

Attribution to **Chance Brownfield** is required.

**Commercial Use (Permission Required):**

* SaaS products
* Paid APIs
* Monetized applications
* Enterprise/internal commercial tools
* Advertising-supported systems

Unauthorized commercial use is prohibited.

**Author:** Chance Brownfield
**Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me)

---

## Citation

If you use this work, please credit:

> Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*.

---

## Author

**Chance Brownfield**
Designer and trainer of the MMM architecture
Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)

```