Audio Classification
MLX
audio
speech-emotion-recognition
emotion-recognition
emotion2vec
data2vec
apple-silicon
Instructions to use mlx-community/emotion2vec-plus-large-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/emotion2vec-plus-large-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir emotion2vec-plus-large-mlx mlx-community/emotion2vec-plus-large-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
metadata
license: other
license_name: funasr-model-license
license_link: https://huggingface.co/emotion2vec/emotion2vec_plus_large/blob/main/LICENSE
library_name: mlx
base_model: emotion2vec/emotion2vec_plus_large
pipeline_tag: audio-classification
tags:
- mlx
- audio
- audio-classification
- speech-emotion-recognition
- emotion-recognition
- emotion2vec
- data2vec
- apple-silicon
mlx-community/emotion2vec-plus-large-mlx
The emotion2vec+ large speech-emotion-recognition model converted to MLX format for native
inference on Apple Silicon, consumed by the xocialize/emotion2vec-mlx-swift
Swift port. Refer to the original model card
for details.
Model
- Family: emotion2vec / emotion2vec+ (Ma et al., "emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation," arXiv:2312.15185)
- Architecture: Data2Vec 2.0 — conv feature extractor → transformer encoder → 9-class linear head
- Output: 9-class categorical emotion (
angry,disgusted,fearful,happy,neutral,other,sad,surprised,unknown) - Sample rate: 16000 Hz, mono
- Precision: fp16 (233 tensors)
Files
emotion2vec_large.safetensors— the MLX weights (fp16).emotion2vec_large_config.json— model config consumed by the loader.
Usage (Swift / MLX)
import Emotion2VecMLX
import Hub
let dir = try await HubApi().snapshot(from: "mlx-community/emotion2vec-plus-large-mlx")
let recogniser = try await EmotionRecogniser(weightsDirectory: dir,
config: EmotionRecogniserConfig(models: .categorical))
let result = try await recogniser.classify(audioURL: speechURL)
print(result.categorical.label, result.categorical.confidence)
Source
- Original model: https://huggingface.co/emotion2vec/emotion2vec_plus_large
- Swift consumer: https://github.com/xocialize/emotion2vec-mlx-swift
License
FunASR's custom MODEL_LICENSE — permits use, copy, modification, and redistribution with attribution and model-name retention (no-denigration clause, no warranty). Non-SPDX but permissive. See the original license.