xocialize's picture
Add emotion2vec+ large MLX weights (fp16) + config + model card
f2f5dd2 verified
metadata
license: other
license_name: funasr-model-license
license_link: https://huggingface.co/emotion2vec/emotion2vec_plus_large/blob/main/LICENSE
library_name: mlx
base_model: emotion2vec/emotion2vec_plus_large
pipeline_tag: audio-classification
tags:
  - mlx
  - audio
  - audio-classification
  - speech-emotion-recognition
  - emotion-recognition
  - emotion2vec
  - data2vec
  - apple-silicon

mlx-community/emotion2vec-plus-large-mlx

The emotion2vec+ large speech-emotion-recognition model converted to MLX format for native inference on Apple Silicon, consumed by the xocialize/emotion2vec-mlx-swift Swift port. Refer to the original model card for details.

Model

  • Family: emotion2vec / emotion2vec+ (Ma et al., "emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation," arXiv:2312.15185)
  • Architecture: Data2Vec 2.0 — conv feature extractor → transformer encoder → 9-class linear head
  • Output: 9-class categorical emotion (angry, disgusted, fearful, happy, neutral, other, sad, surprised, unknown)
  • Sample rate: 16000 Hz, mono
  • Precision: fp16 (233 tensors)

Files

  • emotion2vec_large.safetensors — the MLX weights (fp16).
  • emotion2vec_large_config.json — model config consumed by the loader.

Usage (Swift / MLX)

import Emotion2VecMLX
import Hub

let dir = try await HubApi().snapshot(from: "mlx-community/emotion2vec-plus-large-mlx")
let recogniser = try await EmotionRecogniser(weightsDirectory: dir,
                                             config: EmotionRecogniserConfig(models: .categorical))
let result = try await recogniser.classify(audioURL: speechURL)
print(result.categorical.label, result.categorical.confidence)

Source

License

FunASR's custom MODEL_LICENSE — permits use, copy, modification, and redistribution with attribution and model-name retention (no-denigration clause, no warranty). Non-SPDX but permissive. See the original license.