MMM โ Multi-Mixture Model for Speaker Identification
MMM (Multi-Mixture Model) is a PyTorch-based framework implementing a hybrid time-series architecture that combines Variational Autoencoders (VAE), Recurrent Neural Networks (RNNs), Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs), and an optional Transformer component.
The framework is designed primarily for audio tasks, with a reference implementation focused on speaker identification. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.
Designed and trained by: Chance Brownfield
Model Overview
- Model type: Hybrid generative sequential model
- Framework: PyTorch
- Primary domain: Audio / time-series
- Main use case: Speaker identification and embedding extraction
- Input: 1-D audio signals or time-series features
- Output: Latent embeddings, likelihood scores, predictions
Architecture Summary
VariationalRecurrentMarkovGaussianTransformer
The core MMM model integrates:
Variational Autoencoder (VAE)
Encodes each time step into a latent variable and reconstructs the input.RNN Emission Network
Produces emission parameters for the HMM from latent sequences.Hidden Markov Model (HMM)
Models temporal structure in latent space using Gaussian Mixture emissions.Gaussian Mixture Models (GMMs)
Used both internally (HMM emissions) and externally for speaker enrollment.Transformer
Operates on latent sequences for recognition or domain mapping.Latent Weight Vectors
Learnable vectors:pred_weightsrecog_weightsgen_weights
Used to reweight latent dimensions for prediction, recognition, and generation.
Capabilities
- Embedding extraction for speaker identification
- Speaker enrollment using GMM, HMM, or full MMM models
- Sequence prediction
- Latent sequence generation via HMM sampling
- Recognition / mapping using Transformer layers
Repository Contents
MMM.py
Core model definitions and manager classes:
MMTransformerMMModelMMM
ASI.py
Automatic Speaker identification wrapper:
- Generates embeddings
- Enrolls speakers using GMM/HMM/MMM
- Scores and identifies query audio
Clone the repository
git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
Using the Pre-Trained Model
Load a Saved Model
from MMM import MMM
manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()
Load from Hugging Face Hub
from huggingface_hub import hf_hub_download
from MMM import MMM
pt_file = hf_hub_download(
repo_id="username/Multi-Mixture_Speaker_ID",
filename="mmm.pt"
)
manager = MMM.load(pt_file)
Speaker Identification
Generate an Embedding
from ASI import Speaker_ID
speaker_system = Speaker_ID(
mmm_manager=manager,
base_model_id="unknown",
seq_len=1200,
sr=1200,
)
embedding = speaker_system.generate_embedding("audio.wav")
Enroll a Speaker
speaker_system.enroll_speaker(
speaker_id="Alice",
audio_input="alice.wav",
model_type="gmm",
n_components=4,
epochs=50,
lr=1e-3,
)
Supported model_type values:
"gmm""hmm""mmm"
Identify a Query
best_speaker, best_score, scores = speaker_system.identify("query.wav")
print("Predicted speaker:", best_speaker)
print("Scores:", scores)
Bias, Risks, and Limitations
- Performance depends heavily on audio quality and data distribution
- Out-of-distribution speakers and noisy recordings may reduce accuracy
- Speaker identification involves biometric data โ use responsibly and with consent
- Not intended for high-stakes or security-critical deployment without extensive validation
License
Dual License: Non-Commercial Free Use + Commercial License Required
Non-Commercial Use (Free):
- Research
- Education
- Personal projects
- Non-monetized demos
- Open-source experimentation
Attribution to Chance Brownfield is required.
Commercial Use (Permission Required):
- SaaS products
- Paid APIs
- Monetized applications
- Enterprise/internal commercial tools
- Advertising-supported systems
Unauthorized commercial use is prohibited.
Author: Chance Brownfield Contact: HiMindAi@proton.me
Citation
If you use this work, please credit:
Chane Brownfield. (2025). MMM: Multi-Mixture Model for Speaker Identification.
Author
Chance Brownfield Designer and trainer of the MMM architecture Email: HiMindAi@proton.me