HiMind commited on
Commit
f325d89
·
verified ·
1 Parent(s): 7ea7959

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +218 -5
README.md CHANGED
@@ -1,5 +1,218 @@
1
- ---
2
- license: other
3
- license_name: license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: license
4
+ license_link: LICENSE
5
+ pipeline_tag: voice-activity-detection
6
+ ---
7
+
8
+ # MMM — Multi-Mixture Model for Speaker Identification
9
+
10
+ **MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component.
11
+
12
+ The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.
13
+
14
+ **Designed and trained by:** **Chance Brownfield**
15
+
16
+ ---
17
+
18
+ ## Model Overview
19
+
20
+ - **Model type:** Hybrid generative sequential model
21
+ - **Framework:** PyTorch
22
+ - **Primary domain:** Audio / time-series
23
+ - **Main use case:** Speaker identification and embedding extraction
24
+ - **Input:** 1-D audio signals or time-series features
25
+ - **Output:** Latent embeddings, likelihood scores, predictions
26
+
27
+ ---
28
+
29
+ ## Architecture Summary
30
+
31
+ ### VariationalRecurrentMarkovGaussianTransformer
32
+
33
+ The core MMM model integrates:
34
+
35
+ - **Variational Autoencoder (VAE)**
36
+ Encodes each time step into a latent variable and reconstructs the input.
37
+
38
+ - **RNN Emission Network**
39
+ Produces emission parameters for the HMM from latent sequences.
40
+
41
+ - **Hidden Markov Model (HMM)**
42
+ Models temporal structure in latent space using Gaussian Mixture emissions.
43
+
44
+ - **Gaussian Mixture Models (GMMs)**
45
+ Used both internally (HMM emissions) and externally for speaker enrollment.
46
+
47
+ - **Transformer**
48
+ Operates on latent sequences for recognition or domain mapping.
49
+
50
+ - **Latent Weight Vectors**
51
+ Learnable vectors:
52
+ - `pred_weights`
53
+ - `recog_weights`
54
+ - `gen_weights`
55
+ Used to reweight latent dimensions for prediction, recognition, and generation.
56
+
57
+ ## Capabilities
58
+
59
+ - **Embedding extraction** for speaker identification
60
+ - **Speaker enrollment** using GMM, HMM, or full MMM models
61
+ - **Sequence prediction**
62
+ - **Latent sequence generation** via HMM sampling
63
+ - **Recognition / mapping** using Transformer layers
64
+
65
+ ---
66
+
67
+ ## Repository Contents
68
+
69
+ ### `MMM.py`
70
+ Core model definitions and manager classes:
71
+ - `MMTransformer`
72
+ - `MMModel`
73
+ - `MMM`
74
+
75
+ ### `ASI.py`
76
+ Automatic Speaker identification wrapper:
77
+ - Generates embeddings
78
+ - Enrolls speakers using GMM/HMM/MMM
79
+ - Scores and identifies query audio
80
+
81
+ ### Clone the repository
82
+
83
+ ```bash
84
+ git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
85
+ ```
86
+
87
+ ## Using the Pre-Trained Model
88
+
89
+ ### Load a Saved Model
90
+
91
+ ```python
92
+ from MMM import MMM
93
+
94
+ manager = MMM.load("mmm.pt")
95
+ base_model = manager.models["unknown"]
96
+ base_model.eval()
97
+ ```
98
+
99
+ ---
100
+
101
+ ### Load from Hugging Face Hub
102
+
103
+ ```python
104
+ from huggingface_hub import hf_hub_download
105
+ from MMM import MMM
106
+
107
+ pt_file = hf_hub_download(
108
+ repo_id="username/Multi-Mixture_Speaker_ID",
109
+ filename="mmm.pt"
110
+ )
111
+
112
+ manager = MMM.load(pt_file)
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Speaker Identification
118
+
119
+ ### Generate an Embedding
120
+
121
+ ```python
122
+ from ASI import Speaker_ID
123
+
124
+ speaker_system = Speaker_ID(
125
+ mmm_manager=manager,
126
+ base_model_id="unknown",
127
+ seq_len=1200,
128
+ sr=1200,
129
+ )
130
+
131
+ embedding = speaker_system.generate_embedding("audio.wav")
132
+ ```
133
+
134
+ ---
135
+
136
+ ### Enroll a Speaker
137
+
138
+ ```python
139
+ speaker_system.enroll_speaker(
140
+ speaker_id="Alice",
141
+ audio_input="alice.wav",
142
+ model_type="gmm",
143
+ n_components=4,
144
+ epochs=50,
145
+ lr=1e-3,
146
+ )
147
+ ```
148
+
149
+ Supported `model_type` values:
150
+
151
+ * `"gmm"`
152
+ * `"hmm"`
153
+ * `"mmm"`
154
+
155
+ ---
156
+
157
+ ### Identify a Query
158
+
159
+ ```python
160
+ best_speaker, best_score, scores = speaker_system.identify("query.wav")
161
+
162
+ print("Predicted speaker:", best_speaker)
163
+ print("Scores:", scores)
164
+ ```
165
+
166
+ ## Bias, Risks, and Limitations
167
+
168
+ * Performance depends heavily on audio quality and data distribution
169
+ * Out-of-distribution speakers and noisy recordings may reduce accuracy
170
+ * Speaker identification involves biometric data — use responsibly and with consent
171
+ * Not intended for high-stakes or security-critical deployment without extensive validation
172
+
173
+ ---
174
+
175
+ ## License
176
+
177
+ ### Dual License: Non-Commercial Free Use + Commercial License Required
178
+
179
+ **Non-Commercial Use (Free):**
180
+
181
+ * Research
182
+ * Education
183
+ * Personal projects
184
+ * Non-monetized demos
185
+ * Open-source experimentation
186
+
187
+ Attribution to **Chance Brownfield** is required.
188
+
189
+ **Commercial Use (Permission Required):**
190
+
191
+ * SaaS products
192
+ * Paid APIs
193
+ * Monetized applications
194
+ * Enterprise/internal commercial tools
195
+ * Advertising-supported systems
196
+
197
+ Unauthorized commercial use is prohibited.
198
+
199
+ **Author:** Chance Brownfield
200
+ **Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
201
+
202
+ ---
203
+
204
+ ## Citation
205
+
206
+ If you use this work, please credit:
207
+
208
+ > Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*.
209
+
210
+ ---
211
+
212
+ ## Author
213
+
214
+ **Chance Brownfield**
215
+ Designer and trainer of the MMM architecture
216
+ Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
217
+
218
+ ```