MMM — Multi-Mixture Model for Speaker Identification

MMM (Multi-Mixture Model) is a PyTorch-based framework implementing a hybrid time-series architecture that combines Variational Autoencoders (VAE), Recurrent Neural Networks (RNNs), Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs), and an optional Transformer component.

The framework is designed primarily for audio tasks, with a reference implementation focused on speaker identification. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.

Designed and trained by: Chance Brownfield

Model Overview

Model type: Hybrid generative sequential model
Framework: PyTorch
Primary domain: Audio / time-series
Main use case: Speaker identification and embedding extraction
Input: 1-D audio signals or time-series features
Output: Latent embeddings, likelihood scores, predictions

Architecture Summary

VariationalRecurrentMarkovGaussianTransformer

The core MMM model integrates:

Variational Autoencoder (VAE)
Encodes each time step into a latent variable and reconstructs the input.
RNN Emission Network
Produces emission parameters for the HMM from latent sequences.
Hidden Markov Model (HMM)
Models temporal structure in latent space using Gaussian Mixture emissions.
Gaussian Mixture Models (GMMs)
Used both internally (HMM emissions) and externally for speaker enrollment.
Transformer
Operates on latent sequences for recognition or domain mapping.
Latent Weight Vectors
Learnable vectors:
- pred_weights
- recog_weights
- gen_weights
  Used to reweight latent dimensions for prediction, recognition, and generation.

Capabilities

Embedding extraction for speaker identification
Speaker enrollment using GMM, HMM, or full MMM models
Sequence prediction
Latent sequence generation via HMM sampling
Recognition / mapping using Transformer layers

Repository Contents

`MMM.py`

Core model definitions and manager classes:

MMTransformer
MMModel
MMM

`ASI.py`

Automatic Speaker identification wrapper:

Generates embeddings
Enrolls speakers using GMM/HMM/MMM
Scores and identifies query audio

Clone the repository

git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID

Using the Pre-Trained Model

Load a Saved Model

from MMM import MMM

manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()

Load from Hugging Face Hub

from huggingface_hub import hf_hub_download
from MMM import MMM

pt_file = hf_hub_download(
    repo_id="username/Multi-Mixture_Speaker_ID",
    filename="mmm.pt"
)

manager = MMM.load(pt_file)

Speaker Identification

Generate an Embedding

from ASI import Speaker_ID

speaker_system = Speaker_ID(
    mmm_manager=manager,
    base_model_id="unknown",
    seq_len=1200,
    sr=1200,
)

embedding = speaker_system.generate_embedding("audio.wav")

Enroll a Speaker

speaker_system.enroll_speaker(
    speaker_id="Alice",
    audio_input="alice.wav",
    model_type="gmm",
    n_components=4,
    epochs=50,
    lr=1e-3,
)

Supported model_type values:

"gmm"
"hmm"
"mmm"

Identify a Query

best_speaker, best_score, scores = speaker_system.identify("query.wav")

print("Predicted speaker:", best_speaker)
print("Scores:", scores)

Bias, Risks, and Limitations

Performance depends heavily on audio quality and data distribution
Out-of-distribution speakers and noisy recordings may reduce accuracy
Speaker identification involves biometric data — use responsibly and with consent
Not intended for high-stakes or security-critical deployment without extensive validation

License

Dual License: Non-Commercial Free Use + Commercial License Required

Non-Commercial Use (Free):

Research
Education
Personal projects
Non-monetized demos
Open-source experimentation

Attribution to Chance Brownfield is required.

Commercial Use (Permission Required):

SaaS products
Paid APIs
Monetized applications
Enterprise/internal commercial tools
Advertising-supported systems

Unauthorized commercial use is prohibited.

Author: Chance Brownfield Contact: HiMindAi@proton.me

Citation

If you use this work, please credit:

Chane Brownfield. (2025). MMM: Multi-Mixture Model for Speaker Identification.

Author

Chance Brownfield Designer and trainer of the MMM architecture Email: HiMindAi@proton.me

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

HiMind
/

Multi-Mixture_Speaker_ID