MMM โ€” Multi-Mixture Model for Speaker Identification

MMM (Multi-Mixture Model) is a PyTorch-based framework implementing a hybrid time-series architecture that combines Variational Autoencoders (VAE), Recurrent Neural Networks (RNNs), Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs), and an optional Transformer component.

The framework is designed primarily for audio tasks, with a reference implementation focused on speaker identification. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.

Designed and trained by: Chance Brownfield


Model Overview

  • Model type: Hybrid generative sequential model
  • Framework: PyTorch
  • Primary domain: Audio / time-series
  • Main use case: Speaker identification and embedding extraction
  • Input: 1-D audio signals or time-series features
  • Output: Latent embeddings, likelihood scores, predictions

Architecture Summary

VariationalRecurrentMarkovGaussianTransformer

The core MMM model integrates:

  • Variational Autoencoder (VAE)
    Encodes each time step into a latent variable and reconstructs the input.

  • RNN Emission Network
    Produces emission parameters for the HMM from latent sequences.

  • Hidden Markov Model (HMM)
    Models temporal structure in latent space using Gaussian Mixture emissions.

  • Gaussian Mixture Models (GMMs)
    Used both internally (HMM emissions) and externally for speaker enrollment.

  • Transformer
    Operates on latent sequences for recognition or domain mapping.

  • Latent Weight Vectors
    Learnable vectors:

    • pred_weights
    • recog_weights
    • gen_weights
      Used to reweight latent dimensions for prediction, recognition, and generation.

Capabilities

  • Embedding extraction for speaker identification
  • Speaker enrollment using GMM, HMM, or full MMM models
  • Sequence prediction
  • Latent sequence generation via HMM sampling
  • Recognition / mapping using Transformer layers

Repository Contents

MMM.py

Core model definitions and manager classes:

  • MMTransformer
  • MMModel
  • MMM

ASI.py

Automatic Speaker identification wrapper:

  • Generates embeddings
  • Enrolls speakers using GMM/HMM/MMM
  • Scores and identifies query audio

Clone the repository

git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID

Using the Pre-Trained Model

Load a Saved Model

from MMM import MMM

manager = MMM.load("mmm.pt")
base_model = manager.models["unknown"]
base_model.eval()

Load from Hugging Face Hub

from huggingface_hub import hf_hub_download
from MMM import MMM

pt_file = hf_hub_download(
    repo_id="username/Multi-Mixture_Speaker_ID",
    filename="mmm.pt"
)

manager = MMM.load(pt_file)

Speaker Identification

Generate an Embedding

from ASI import Speaker_ID

speaker_system = Speaker_ID(
    mmm_manager=manager,
    base_model_id="unknown",
    seq_len=1200,
    sr=1200,
)

embedding = speaker_system.generate_embedding("audio.wav")

Enroll a Speaker

speaker_system.enroll_speaker(
    speaker_id="Alice",
    audio_input="alice.wav",
    model_type="gmm",
    n_components=4,
    epochs=50,
    lr=1e-3,
)

Supported model_type values:

  • "gmm"
  • "hmm"
  • "mmm"

Identify a Query

best_speaker, best_score, scores = speaker_system.identify("query.wav")

print("Predicted speaker:", best_speaker)
print("Scores:", scores)

Bias, Risks, and Limitations

  • Performance depends heavily on audio quality and data distribution
  • Out-of-distribution speakers and noisy recordings may reduce accuracy
  • Speaker identification involves biometric data โ€” use responsibly and with consent
  • Not intended for high-stakes or security-critical deployment without extensive validation

License

Dual License: Non-Commercial Free Use + Commercial License Required

Non-Commercial Use (Free):

  • Research
  • Education
  • Personal projects
  • Non-monetized demos
  • Open-source experimentation

Attribution to Chance Brownfield is required.

Commercial Use (Permission Required):

  • SaaS products
  • Paid APIs
  • Monetized applications
  • Enterprise/internal commercial tools
  • Advertising-supported systems

Unauthorized commercial use is prohibited.

Author: Chance Brownfield Contact: HiMindAi@proton.me


Citation

If you use this work, please credit:

Chane Brownfield. (2025). MMM: Multi-Mixture Model for Speaker Identification.


Author

Chance Brownfield Designer and trainer of the MMM architecture Email: HiMindAi@proton.me


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using HiMind/Multi-Mixture_Speaker_ID 1