File size: 4,956 Bytes
f325d89 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | ---
license: other
license_name: license
license_link: LICENSE
pipeline_tag: voice-activity-detection
---
# MMM — Multi-Mixture Model for Speaker Identification
**MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component.
The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.
**Designed and trained by:** **Chance Brownfield**
---
## Model Overview
- **Model type:** Hybrid generative sequential model
- **Framework:** PyTorch
- **Primary domain:** Audio / time-series
- **Main use case:** Speaker identification and embedding extraction
- **Input:** 1-D audio signals or time-series features
- **Output:** Latent embeddings, likelihood scores, predictions
---
## Architecture Summary
### VariationalRecurrentMarkovGaussianTransformer
The core MMM model integrates:
- **Variational Autoencoder (VAE)**
Encodes each time step into a latent variable and reconstructs the input.
- **RNN Emission Network**
Produces emission parameters for the HMM from latent sequences.
- **Hidden Markov Model (HMM)**
Models temporal structure in latent space using Gaussian Mixture emissions.
- **Gaussian Mixture Models (GMMs)**
Used both internally (HMM emissions) and externally for speaker enrollment.
- **Transformer**
Operates on latent sequences for recognition or domain mapping.
- **Latent Weight Vectors**
Learnable vectors:
- `pred_weights`
- `recog_weights`
- `gen_weights`
Used to reweight latent dimensions for prediction, recognition, and generation.
## Capabilities
- **Embedding extraction** for speaker identification
- **Speaker enrollment** using GMM, HMM, or full MMM models
- **Sequence prediction**
- **Latent sequence generation** via HMM sampling
- **Recognition / mapping** using Transformer layers
---
## Repository Contents
### `MMM.py`
Core model definitions and manager classes:
- `MMTransformer`
- `MMModel`
- `MMM`
### `ASI.py`
Automatic Speaker identification wrapper:
- Generates embeddings
- Enrolls speakers using GMM/HMM/MMM
- Scores and identifies query audio
### Clone the repository
```bash
git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
```
## Using the Pre-Trained Model
### Load a Saved Model
```python
from MMM import MMM
manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()
```
---
### Load from Hugging Face Hub
```python
from huggingface_hub import hf_hub_download
from MMM import MMM
pt_file = hf_hub_download(
repo_id="username/Multi-Mixture_Speaker_ID",
filename="mmm.pt"
)
manager = MMM.load(pt_file)
```
---
## Speaker Identification
### Generate an Embedding
```python
from ASI import Speaker_ID
speaker_system = Speaker_ID(
mmm_manager=manager,
base_model_id="unknown",
seq_len=1200,
sr=1200,
)
embedding = speaker_system.generate_embedding("audio.wav")
```
---
### Enroll a Speaker
```python
speaker_system.enroll_speaker(
speaker_id="Alice",
audio_input="alice.wav",
model_type="gmm",
n_components=4,
epochs=50,
lr=1e-3,
)
```
Supported `model_type` values:
* `"gmm"`
* `"hmm"`
* `"mmm"`
---
### Identify a Query
```python
best_speaker, best_score, scores = speaker_system.identify("query.wav")
print("Predicted speaker:", best_speaker)
print("Scores:", scores)
```
## Bias, Risks, and Limitations
* Performance depends heavily on audio quality and data distribution
* Out-of-distribution speakers and noisy recordings may reduce accuracy
* Speaker identification involves biometric data — use responsibly and with consent
* Not intended for high-stakes or security-critical deployment without extensive validation
---
## License
### Dual License: Non-Commercial Free Use + Commercial License Required
**Non-Commercial Use (Free):**
* Research
* Education
* Personal projects
* Non-monetized demos
* Open-source experimentation
Attribution to **Chance Brownfield** is required.
**Commercial Use (Permission Required):**
* SaaS products
* Paid APIs
* Monetized applications
* Enterprise/internal commercial tools
* Advertising-supported systems
Unauthorized commercial use is prohibited.
**Author:** Chance Brownfield
**Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
---
## Citation
If you use this work, please credit:
> Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*.
---
## Author
**Chance Brownfield**
Designer and trainer of the MMM architecture
Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
```
|