File size: 4,956 Bytes
f325d89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
license: other
license_name: license
license_link: LICENSE
pipeline_tag: voice-activity-detection
---

# MMM — Multi-Mixture Model for Speaker Identification

**MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component.

The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.

**Designed and trained by:** **Chance Brownfield**

---

## Model Overview

- **Model type:** Hybrid generative sequential model  
- **Framework:** PyTorch  
- **Primary domain:** Audio / time-series  
- **Main use case:** Speaker identification and embedding extraction  
- **Input:** 1-D audio signals or time-series features  
- **Output:** Latent embeddings, likelihood scores, predictions  

---

## Architecture Summary

### VariationalRecurrentMarkovGaussianTransformer

The core MMM model integrates:

- **Variational Autoencoder (VAE)**  
  Encodes each time step into a latent variable and reconstructs the input.

- **RNN Emission Network**  
  Produces emission parameters for the HMM from latent sequences.

- **Hidden Markov Model (HMM)**  
  Models temporal structure in latent space using Gaussian Mixture emissions.

- **Gaussian Mixture Models (GMMs)**  
  Used both internally (HMM emissions) and externally for speaker enrollment.

- **Transformer**  
  Operates on latent sequences for recognition or domain mapping.

- **Latent Weight Vectors**  
  Learnable vectors:
  - `pred_weights`
  - `recog_weights`
  - `gen_weights`  
  Used to reweight latent dimensions for prediction, recognition, and generation.

## Capabilities

- **Embedding extraction** for speaker identification  
- **Speaker enrollment** using GMM, HMM, or full MMM models  
- **Sequence prediction**  
- **Latent sequence generation** via HMM sampling  
- **Recognition / mapping** using Transformer layers  

---

## Repository Contents

### `MMM.py`
Core model definitions and manager classes:
- `MMTransformer`
- `MMModel`
- `MMM`

### `ASI.py`
Automatic Speaker identification wrapper:
- Generates embeddings
- Enrolls speakers using GMM/HMM/MMM
- Scores and identifies query audio

### Clone the repository

```bash
git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
```

## Using the Pre-Trained Model

### Load a Saved Model

```python
from MMM import MMM

manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()
```

---

### Load from Hugging Face Hub

```python
from huggingface_hub import hf_hub_download
from MMM import MMM

pt_file = hf_hub_download(
    repo_id="username/Multi-Mixture_Speaker_ID",
    filename="mmm.pt"
)

manager = MMM.load(pt_file)
```

---

## Speaker Identification

### Generate an Embedding

```python
from ASI import Speaker_ID

speaker_system = Speaker_ID(
    mmm_manager=manager,
    base_model_id="unknown",
    seq_len=1200,
    sr=1200,
)

embedding = speaker_system.generate_embedding("audio.wav")
```

---

### Enroll a Speaker

```python
speaker_system.enroll_speaker(
    speaker_id="Alice",
    audio_input="alice.wav",
    model_type="gmm",
    n_components=4,
    epochs=50,
    lr=1e-3,
)
```

Supported `model_type` values:

* `"gmm"`
* `"hmm"`
* `"mmm"`

---

### Identify a Query

```python
best_speaker, best_score, scores = speaker_system.identify("query.wav")

print("Predicted speaker:", best_speaker)
print("Scores:", scores)
```

## Bias, Risks, and Limitations

* Performance depends heavily on audio quality and data distribution
* Out-of-distribution speakers and noisy recordings may reduce accuracy
* Speaker identification involves biometric data — use responsibly and with consent
* Not intended for high-stakes or security-critical deployment without extensive validation

---

## License

### Dual License: Non-Commercial Free Use + Commercial License Required

**Non-Commercial Use (Free):**

* Research
* Education
* Personal projects
* Non-monetized demos
* Open-source experimentation

Attribution to **Chance Brownfield** is required.

**Commercial Use (Permission Required):**

* SaaS products
* Paid APIs
* Monetized applications
* Enterprise/internal commercial tools
* Advertising-supported systems

Unauthorized commercial use is prohibited.

**Author:** Chance Brownfield
**Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me)

---

## Citation

If you use this work, please credit:

> Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*.

---

## Author

**Chance Brownfield**
Designer and trainer of the MMM architecture
Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)

```