|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: feature-extraction |
|
|
tags: |
|
|
- wav2vec2 |
|
|
- conformer |
|
|
- speech |
|
|
datasets: |
|
|
- openslr/librispeech_asr |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# wav2vec2-conformer-base |
|
|
|
|
|
## Dataset |
|
|
- [librispeech](https://www.openslr.org/12) |
|
|
|
|
|
## Framework |
|
|
- [faiseq](https://github.com/facebookresearch/fairseq) |
|
|
|
|
|
## Model Info |
|
|
``` |
|
|
model: |
|
|
_name: wav2vec2 |
|
|
quantize_targets: true |
|
|
final_dim: 256 |
|
|
encoder_layerdrop: 0.05 |
|
|
dropout_input: 0.1 |
|
|
dropout_features: 0.1 |
|
|
feature_grad_mult: 0.1 |
|
|
|
|
|
encoder_layers: 12 |
|
|
encoder_embed_dim: 768 |
|
|
encoder_ffn_embed_dim: 3072 |
|
|
encoder_attention_heads: 12 |
|
|
|
|
|
layer_type: conformer |
|
|
attn_type: espnet |
|
|
pos_enc_type: rel_pos |
|
|
``` |
|
|
|
|
|
|
|
|
This model is use in [emotion-conformer](https://github.com/poyu39/emotion-conformer) |