Add emotion2vec+ large MLX weights (fp16) + config + model card

Browse files

Files changed (3) hide show

README.md +61 -0
emotion2vec_large.safetensors +3 -0
emotion2vec_large_config.json +76 -0

README.md ADDED Viewed

	@@ -0,0 +1,61 @@

+---
+license: other
+license_name: funasr-model-license
+license_link: https://huggingface.co/emotion2vec/emotion2vec_plus_large/blob/main/LICENSE
+library_name: mlx
+base_model: emotion2vec/emotion2vec_plus_large
+pipeline_tag: audio-classification
+tags:
+- mlx
+- audio
+- audio-classification
+- speech-emotion-recognition
+- emotion-recognition
+- emotion2vec
+- data2vec
+- apple-silicon
+---
+# mlx-community/emotion2vec-plus-large-mlx
+The **emotion2vec+ large** speech-emotion-recognition model converted to MLX format for native
+inference on Apple Silicon, consumed by the [`xocialize/emotion2vec-mlx-swift`](https://github.com/xocialize/emotion2vec-mlx-swift)
+Swift port. Refer to the [original model card](https://huggingface.co/emotion2vec/emotion2vec_plus_large)
+for details.
+## Model
+- **Family:** emotion2vec / emotion2vec+ (Ma et al., "emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation," [arXiv:2312.15185](https://arxiv.org/abs/2312.15185))
+- **Architecture:** Data2Vec 2.0 — conv feature extractor → transformer encoder → 9-class linear head
+- **Output:** 9-class categorical emotion (`angry`, `disgusted`, `fearful`, `happy`, `neutral`, `other`, `sad`, `surprised`, `unknown`)
+- **Sample rate:** 16000 Hz, mono
+- **Precision:** fp16 (233 tensors)
+## Files
+- `emotion2vec_large.safetensors` — the MLX weights (fp16).
+- `emotion2vec_large_config.json` — model config consumed by the loader.
+## Usage (Swift / MLX)
+```swift
+import Emotion2VecMLX
+import Hub
+let dir = try await HubApi().snapshot(from: "mlx-community/emotion2vec-plus-large-mlx")
+let recogniser = try await EmotionRecogniser(weightsDirectory: dir,
+                                             config: EmotionRecogniserConfig(models: .categorical))
+let result = try await recogniser.classify(audioURL: speechURL)
+print(result.categorical.label, result.categorical.confidence)
+```
+## Source
+- **Original model:** https://huggingface.co/emotion2vec/emotion2vec_plus_large
+- **Swift consumer:** https://github.com/xocialize/emotion2vec-mlx-swift
+## License
+FunASR's custom MODEL_LICENSE — permits use, copy, modification, and redistribution with
+attribution and model-name retention (no-denigration clause, no warranty). Non-SPDX but
+permissive. See the [original license](https://huggingface.co/emotion2vec/emotion2vec_plus_large/blob/main/LICENSE).

emotion2vec_large.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ecfbf0d668bc86d963332bb72871744184c24c84a9ac7c5d9b113d0bf55fbb94
+size 324305186

emotion2vec_large_config.json ADDED Viewed

	@@ -0,0 +1,76 @@

+{
+  "model_type": "emotion2vec_plus_large",
+  "architecture": "data2vec2",
+  "num_classes": 9,
+  "hidden_dim": 1024,
+  "ffn_dim": 4096,
+  "num_layers": 12,
+  "num_context_blocks": 4,
+  "num_shared_blocks": 8,
+  "num_heads": 16,
+  "conv_feature_layers": [
+    [
+      1,
+      512,
+      10,
+      5
+    ],
+    [
+      512,
+      512,
+      3,
+      2
+    ],
+    [
+      512,
+      512,
+      3,
+      2
+    ],
+    [
+      512,
+      512,
+      3,
+      2
+    ],
+    [
+      512,
+      512,
+      3,
+      2
+    ],
+    [
+      512,
+      512,
+      2,
+      2
+    ],
+    [
+      512,
+      512,
+      2,
+      2
+    ]
+  ],
+  "pos_conv_depth": 5,
+  "pos_conv_kernel": 19,
+  "pos_conv_groups": 16,
+  "feature_dim": 512,
+  "has_context_norm": true,
+  "has_extra_tokens": true,
+  "has_alibi_scale": true,
+  "num_extra_tokens": 10,
+  "layer_norm_first": false,
+  "dtype": "float16",
+  "emotion_labels": [
+    "angry",
+    "disgusted",
+    "fearful",
+    "happy",
+    "neutral",
+    "other",
+    "sad",
+    "surprised",
+    "unknown"
+  ]
+}