Update README.md

f325d89 verified 16 days ago

4.96 kB

	---
	license: other
	license_name: license
	license_link: LICENSE
	pipeline_tag: voice-activity-detection
	---

	# MMM — Multi-Mixture Model for Speaker Identification

	MMM (Multi-Mixture Model) is a PyTorch-based framework implementing a hybrid time-series architecture that combines Variational Autoencoders (VAE), Recurrent Neural Networks (RNNs), Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs), and an optional Transformer component.

	The framework is designed primarily for audio tasks, with a reference implementation focused on speaker identification. This repository includes model code, training scripts, speaker identification utilities, and a demo web application.

	Designed and trained by: Chance Brownfield

	---

	## Model Overview

	- Model type: Hybrid generative sequential model
	- Framework: PyTorch
	- Primary domain: Audio / time-series
	- Main use case: Speaker identification and embedding extraction
	- Input: 1-D audio signals or time-series features
	- Output: Latent embeddings, likelihood scores, predictions

	---

	## Architecture Summary

	### VariationalRecurrentMarkovGaussianTransformer

	The core MMM model integrates:

	- Variational Autoencoder (VAE)
	Encodes each time step into a latent variable and reconstructs the input.

	- RNN Emission Network
	Produces emission parameters for the HMM from latent sequences.

	- Hidden Markov Model (HMM)
	Models temporal structure in latent space using Gaussian Mixture emissions.

	- Gaussian Mixture Models (GMMs)
	Used both internally (HMM emissions) and externally for speaker enrollment.

	- Transformer
	Operates on latent sequences for recognition or domain mapping.

	- Latent Weight Vectors
	Learnable vectors:
	- `pred_weights`
	- `recog_weights`
	- `gen_weights`
	Used to reweight latent dimensions for prediction, recognition, and generation.

	## Capabilities

	- Embedding extraction for speaker identification
	- Speaker enrollment using GMM, HMM, or full MMM models
	- Sequence prediction
	- Latent sequence generation via HMM sampling
	- Recognition / mapping using Transformer layers

	---

	## Repository Contents

	### `MMM.py`
	Core model definitions and manager classes:
	- `MMTransformer`
	- `MMModel`
	- `MMM`

	### `ASI.py`
	Automatic Speaker identification wrapper:
	- Generates embeddings
	- Enrolls speakers using GMM/HMM/MMM
	- Scores and identifies query audio

	### Clone the repository

	```bash
	git clone https://huggingface.co/HiMind/Multi-Mixture_Speaker_ID
	```

	## Using the Pre-Trained Model

	### Load a Saved Model

	```python
	from MMM import MMM

	manager = MMM.load("mmm.pt")
	base_model = manager.models["unknown"]
	base_model.eval()
	```

	---

	### Load from Hugging Face Hub

	```python
	from huggingface_hub import hf_hub_download
	from MMM import MMM

	pt_file = hf_hub_download(
	repo_id="username/Multi-Mixture_Speaker_ID",
	filename="mmm.pt"
	)

	manager = MMM.load(pt_file)
	```

	---

	## Speaker Identification

	### Generate an Embedding

	```python
	from ASI import Speaker_ID

	speaker_system = Speaker_ID(
	mmm_manager=manager,
	base_model_id="unknown",
	seq_len=1200,
	sr=1200,
	)

	embedding = speaker_system.generate_embedding("audio.wav")
	```

	---

	### Enroll a Speaker

	```python
	speaker_system.enroll_speaker(
	speaker_id="Alice",
	audio_input="alice.wav",
	model_type="gmm",
	n_components=4,
	epochs=50,
	lr=1e-3,
	)
	```

	Supported `model_type` values:

	* `"gmm"`
	* `"hmm"`
	* `"mmm"`

	---

	### Identify a Query

	```python
	best_speaker, best_score, scores = speaker_system.identify("query.wav")

	print("Predicted speaker:", best_speaker)
	print("Scores:", scores)
	```

	## Bias, Risks, and Limitations

	* Performance depends heavily on audio quality and data distribution
	* Out-of-distribution speakers and noisy recordings may reduce accuracy
	* Speaker identification involves biometric data — use responsibly and with consent
	* Not intended for high-stakes or security-critical deployment without extensive validation

	---

	## License

	### Dual License: Non-Commercial Free Use + Commercial License Required

	Non-Commercial Use (Free):

	* Research
	* Education
	* Personal projects
	* Non-monetized demos
	* Open-source experimentation

	Attribution to Chance Brownfield is required.

	Commercial Use (Permission Required):

	* SaaS products
	* Paid APIs
	* Monetized applications
	* Enterprise/internal commercial tools
	* Advertising-supported systems

	Unauthorized commercial use is prohibited.

	Author: Chance Brownfield
	Contact: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)

	---

	## Citation

	If you use this work, please credit:

	> Chane Brownfield. (2025). MMM: Multi-Mixture Model for Speaker Identification.

	---

	## Author

	Chance Brownfield
	Designer and trainer of the MMM architecture
	Email: [HiMindAi@proton.me](mailto:HiMindAi@proton.me)

	```