--- license: other license_name: license license_link: LICENSE pipeline_tag: voice-activity-detection --- # MMM — Multi-Mixture Model for Speaker Identification **MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component. The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application. **Designed and trained by:** **Chance Brownfield** --- ## Model Overview - **Model type:** Hybrid generative sequential model - **Framework:** PyTorch - **Primary domain:** Audio / time-series - **Main use case:** Speaker identification and embedding extraction - **Input:** 1-D audio signals or time-series features - **Output:** Latent embeddings, likelihood scores, predictions --- ## Architecture Summary ### VariationalRecurrentMarkovGaussianTransformer The core MMM model integrates: - **Variational Autoencoder (VAE)** Encodes each time step into a latent variable and reconstructs the input. - **RNN Emission Network** Produces emission parameters for the HMM from latent sequences. - **Hidden Markov Model (HMM)** Models temporal structure in latent space using Gaussian Mixture emissions. - **Gaussian Mixture Models (GMMs)** Used both internally (HMM emissions) and externally for speaker enrollment. - **Transformer** Operates on latent sequences for recognition or domain mapping. - **Latent Weight Vectors** Learnable vectors: - `pred_weights` - `recog_weights` - `gen_weights` Used to reweight latent dimensions for prediction, recognition, and generation. ## Capabilities - **Embedding extraction** for speaker identification - **Speaker enrollment** using GMM, HMM, or full MMM models - **Sequence prediction** - **Latent sequence generation** via HMM sampling - **Recognition / mapping** using Transformer layers --- ## Repository Contents ### `MMM.py` Core model definitions and manager classes: - `MMTransformer` - `MMModel` - `MMM` ### `ASI.py` Automatic Speaker identification wrapper: - Generates embeddings - Enrolls speakers using GMM/HMM/MMM - Scores and identifies query audio ### Clone the repository ```bash git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID ``` ## Using the Pre-Trained Model ### Load a Saved Model ```python from MMM import MMM manager = MMM.load("mmm.pt") base_model = manager.models["unknown"] base_model.eval() ``` --- ### Load from Hugging Face Hub ```python from huggingface_hub import hf_hub_download from MMM import MMM pt_file = hf_hub_download( repo_id="username/Multi-Mixture_Speaker_ID", filename="mmm.pt" ) manager = MMM.load(pt_file) ``` --- ## Speaker Identification ### Generate an Embedding ```python from ASI import Speaker_ID speaker_system = Speaker_ID( mmm_manager=manager, base_model_id="unknown", seq_len=1200, sr=1200, ) embedding = speaker_system.generate_embedding("audio.wav") ``` --- ### Enroll a Speaker ```python speaker_system.enroll_speaker( speaker_id="Alice", audio_input="alice.wav", model_type="gmm", n_components=4, epochs=50, lr=1e-3, ) ``` Supported `model_type` values: * `"gmm"` * `"hmm"` * `"mmm"` --- ### Identify a Query ```python best_speaker, best_score, scores = speaker_system.identify("query.wav") print("Predicted speaker:", best_speaker) print("Scores:", scores) ``` ## Bias, Risks, and Limitations * Performance depends heavily on audio quality and data distribution * Out-of-distribution speakers and noisy recordings may reduce accuracy * Speaker identification involves biometric data — use responsibly and with consent * Not intended for high-stakes or security-critical deployment without extensive validation --- ## License ### Dual License: Non-Commercial Free Use + Commercial License Required **Non-Commercial Use (Free):** * Research * Education * Personal projects * Non-monetized demos * Open-source experimentation Attribution to **Chance Brownfield** is required. **Commercial Use (Permission Required):** * SaaS products * Paid APIs * Monetized applications * Enterprise/internal commercial tools * Advertising-supported systems Unauthorized commercial use is prohibited. **Author:** Chance Brownfield **Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me) --- ## Citation If you use this work, please credit: > Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*. --- ## Author **Chance Brownfield** Designer and trainer of the MMM architecture Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me) ```