HiMind's picture
Update README.md
f325d89 verified
---
license: other
license_name: license
license_link: LICENSE
pipeline_tag: voice-activity-detection
---
# MMM — Multi-Mixture Model for Speaker Identification
**MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component.
The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.
**Designed and trained by:** **Chance Brownfield**
---
## Model Overview
- **Model type:** Hybrid generative sequential model
- **Framework:** PyTorch
- **Primary domain:** Audio / time-series
- **Main use case:** Speaker identification and embedding extraction
- **Input:** 1-D audio signals or time-series features
- **Output:** Latent embeddings, likelihood scores, predictions
---
## Architecture Summary
### VariationalRecurrentMarkovGaussianTransformer
The core MMM model integrates:
- **Variational Autoencoder (VAE)**
Encodes each time step into a latent variable and reconstructs the input.
- **RNN Emission Network**
Produces emission parameters for the HMM from latent sequences.
- **Hidden Markov Model (HMM)**
Models temporal structure in latent space using Gaussian Mixture emissions.
- **Gaussian Mixture Models (GMMs)**
Used both internally (HMM emissions) and externally for speaker enrollment.
- **Transformer**
Operates on latent sequences for recognition or domain mapping.
- **Latent Weight Vectors**
Learnable vectors:
- `pred_weights`
- `recog_weights`
- `gen_weights`
Used to reweight latent dimensions for prediction, recognition, and generation.
## Capabilities
- **Embedding extraction** for speaker identification
- **Speaker enrollment** using GMM, HMM, or full MMM models
- **Sequence prediction**
- **Latent sequence generation** via HMM sampling
- **Recognition / mapping** using Transformer layers
---
## Repository Contents
### `MMM.py`
Core model definitions and manager classes:
- `MMTransformer`
- `MMModel`
- `MMM`
### `ASI.py`
Automatic Speaker identification wrapper:
- Generates embeddings
- Enrolls speakers using GMM/HMM/MMM
- Scores and identifies query audio
### Clone the repository
```bash
git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
```
## Using the Pre-Trained Model
### Load a Saved Model
```python
from MMM import MMM
manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()
```
---
### Load from Hugging Face Hub
```python
from huggingface_hub import hf_hub_download
from MMM import MMM
pt_file = hf_hub_download(
repo_id="username/Multi-Mixture_Speaker_ID",
filename="mmm.pt"
)
manager = MMM.load(pt_file)
```
---
## Speaker Identification
### Generate an Embedding
```python
from ASI import Speaker_ID
speaker_system = Speaker_ID(
mmm_manager=manager,
base_model_id="unknown",
seq_len=1200,
sr=1200,
)
embedding = speaker_system.generate_embedding("audio.wav")
```
---
### Enroll a Speaker
```python
speaker_system.enroll_speaker(
speaker_id="Alice",
audio_input="alice.wav",
model_type="gmm",
n_components=4,
epochs=50,
lr=1e-3,
)
```
Supported `model_type` values:
* `"gmm"`
* `"hmm"`
* `"mmm"`
---
### Identify a Query
```python
best_speaker, best_score, scores = speaker_system.identify("query.wav")
print("Predicted speaker:", best_speaker)
print("Scores:", scores)
```
## Bias, Risks, and Limitations
* Performance depends heavily on audio quality and data distribution
* Out-of-distribution speakers and noisy recordings may reduce accuracy
* Speaker identification involves biometric data — use responsibly and with consent
* Not intended for high-stakes or security-critical deployment without extensive validation
---
## License
### Dual License: Non-Commercial Free Use + Commercial License Required
**Non-Commercial Use (Free):**
* Research
* Education
* Personal projects
* Non-monetized demos
* Open-source experimentation
Attribution to **Chance Brownfield** is required.
**Commercial Use (Permission Required):**
* SaaS products
* Paid APIs
* Monetized applications
* Enterprise/internal commercial tools
* Advertising-supported systems
Unauthorized commercial use is prohibited.
**Author:** Chance Brownfield
**Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
---
## Citation
If you use this work, please credit:
> Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*.
---
## Author
**Chance Brownfield**
Designer and trainer of the MMM architecture
Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
```