HiMind
/

Multi-Mixture_Speaker_ID

Voice Activity Detection

Model card Files Files and versions

xet

Community

HiMind commited on 29 days ago

Commit

f325d89

verified ·

1 Parent(s): 7ea7959

Update README.md

Browse files

Files changed (1) hide show

README.md +218 -5

README.md CHANGED Viewed

@@ -1,5 +1,218 @@
----
-license: other
-license_name: license
-license_link: LICENSE
----

+---
+license: other
+license_name: license
+license_link: LICENSE
+pipeline_tag: voice-activity-detection
+---
+# MMM — Multi-Mixture Model for Speaker Identification
+**MMM (Multi-Mixture Model)** is a PyTorch-based framework implementing a hybrid time-series architecture that combines **Variational Autoencoders (VAE)**, **Recurrent Neural Networks (RNNs)**, **Hidden Markov Models (HMMs)**, **Gaussian Mixture Models (GMMs)**, and an optional **Transformer** component.
+The framework is designed primarily for **audio tasks**, with a reference implementation focused on **speaker identification**. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.
+**Designed and trained by:** **Chance Brownfield**
+---
+## Model Overview
+- **Model type:** Hybrid generative sequential model
+- **Framework:** PyTorch
+- **Primary domain:** Audio / time-series
+- **Main use case:** Speaker identification and embedding extraction
+- **Input:** 1-D audio signals or time-series features
+- **Output:** Latent embeddings, likelihood scores, predictions
+---
+## Architecture Summary
+### VariationalRecurrentMarkovGaussianTransformer
+The core MMM model integrates:
+- **Variational Autoencoder (VAE)**
+  Encodes each time step into a latent variable and reconstructs the input.
+- **RNN Emission Network**
+  Produces emission parameters for the HMM from latent sequences.
+- **Hidden Markov Model (HMM)**
+  Models temporal structure in latent space using Gaussian Mixture emissions.
+- **Gaussian Mixture Models (GMMs)**
+  Used both internally (HMM emissions) and externally for speaker enrollment.
+- **Transformer**
+  Operates on latent sequences for recognition or domain mapping.
+- **Latent Weight Vectors**
+  Learnable vectors:
+  - `pred_weights`
+  - `recog_weights`
+  - `gen_weights`
+  Used to reweight latent dimensions for prediction, recognition, and generation.
+## Capabilities
+- **Embedding extraction** for speaker identification
+- **Speaker enrollment** using GMM, HMM, or full MMM models
+- **Sequence prediction**
+- **Latent sequence generation** via HMM sampling
+- **Recognition / mapping** using Transformer layers
+---
+## Repository Contents
+### `MMM.py`
+Core model definitions and manager classes:
+- `MMTransformer`
+- `MMModel`
+- `MMM`
+### `ASI.py`
+Automatic Speaker identification wrapper:
+- Generates embeddings
+- Enrolls speakers using GMM/HMM/MMM
+- Scores and identifies query audio
+### Clone the repository
+```bash
+git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
+```
+## Using the Pre-Trained Model
+### Load a Saved Model
+```python
+from MMM import MMM
+manager = MMM.load("mmm.pt")
+base_model = manager.models["unknown"]
+base_model.eval()
+```
+---
+### Load from Hugging Face Hub
+```python
+from huggingface_hub import hf_hub_download
+from MMM import MMM
+pt_file = hf_hub_download(
+    repo_id="username/Multi-Mixture_Speaker_ID",
+    filename="mmm.pt"
+)
+manager = MMM.load(pt_file)
+```
+---
+## Speaker Identification
+### Generate an Embedding
+```python
+from ASI import Speaker_ID
+speaker_system = Speaker_ID(
+    mmm_manager=manager,
+    base_model_id="unknown",
+    seq_len=1200,
+    sr=1200,
+)
+embedding = speaker_system.generate_embedding("audio.wav")
+```
+---
+### Enroll a Speaker
+```python
+speaker_system.enroll_speaker(
+    speaker_id="Alice",
+    audio_input="alice.wav",
+    model_type="gmm",
+    n_components=4,
+    epochs=50,
+    lr=1e-3,
+)
+```
+Supported `model_type` values:
+* `"gmm"`
+* `"hmm"`
+* `"mmm"`
+---
+### Identify a Query
+```python
+best_speaker, best_score, scores = speaker_system.identify("query.wav")
+print("Predicted speaker:", best_speaker)
+print("Scores:", scores)
+```
+## Bias, Risks, and Limitations
+* Performance depends heavily on audio quality and data distribution
+* Out-of-distribution speakers and noisy recordings may reduce accuracy
+* Speaker identification involves biometric data — use responsibly and with consent
+* Not intended for high-stakes or security-critical deployment without extensive validation
+---
+## License
+### Dual License: Non-Commercial Free Use + Commercial License Required
+**Non-Commercial Use (Free):**
+* Research
+* Education
+* Personal projects
+* Non-monetized demos
+* Open-source experimentation
+Attribution to **Chance Brownfield** is required.
+**Commercial Use (Permission Required):**
+* SaaS products
+* Paid APIs
+* Monetized applications
+* Enterprise/internal commercial tools
+* Advertising-supported systems
+Unauthorized commercial use is prohibited.
+**Author:** Chance Brownfield
+**Contact:** [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
+---
+## Citation
+If you use this work, please credit:
+> Chane Brownfield. (2025). *MMM: Multi-Mixture Model for Speaker Identification*.
+---
+## Author
+**Chance Brownfield**
+Designer and trainer of the MMM architecture
+Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)
+```