litagin ly-ospo-ops commited on
Commit
cc4a1aa
·
0 Parent(s):

Duplicate from ly-corporation/PASQA

Browse files

Co-authored-by: LY Corporation <ly-ospo-ops@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +28 -0
  3. checkpoint-100000steps.pkl +3 -0
  4. config.yml +32 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ja
5
+ tags:
6
+ - mos
7
+ - speech-quality-assessment
8
+ - pitch-accent
9
+ - wav2vec2
10
+ - audio
11
+ ---
12
+
13
+ # PASQA — Pitch-Accent-Focused Speech Quality Assessment Model
14
+
15
+ Pretrained checkpoint for **[PASQA](https://github.com/lycorp-jp/PASQA)**, an SSL-based MOS prediction model
16
+ that estimates the perceptual quality of synthesized Japanese speech with a focus on
17
+ **pitch-accent correctness**.
18
+
19
+ The model is developed by LY Corporation and released alongside our INTERSPEECH 2026 paper.
20
+
21
+ ## Usage
22
+
23
+ Inference is done with the code in the [PASQA](https://github.com/lycorp-jp/PASQA) repository. See the repository for setup and usage instructions.
24
+
25
+ ## License
26
+
27
+ These model parameters are released under the
28
+ [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
checkpoint-100000steps.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03c9e8880a28f65fd9b8611f3fe3e179020b067d892cd6f6a4c311572b8a8bc7
3
+ size 753466693
config.yml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_input: waveform
2
+ model_params:
3
+ attn_alpha_init: 0.1
4
+ attn_dim: 256
5
+ attn_dropout: 0.1
6
+ attn_heads: 4
7
+ error_head_dnn_dim: 64
8
+ mean_net_dnn_dim: 64
9
+ mean_net_output_type: scalar
10
+ mean_net_range_clipping: true
11
+ mora_dropout: 0.1
12
+ mora_emb_dim: 256
13
+ mora_ffn_dim: 512
14
+ mora_max_len: 128
15
+ mora_pos_encoding: rope
16
+ mora_transformer_heads: 4
17
+ mora_transformer_layers: 1
18
+ mora_vocab_size: 141
19
+ s3prl_name: wav2vec2
20
+ speaker_grl_dropout: 0.1
21
+ speaker_grl_hidden_dim: 128
22
+ ssl_model_layer_idx: -1
23
+ ssl_model_output_dim: 768
24
+ ssl_module: s3prl
25
+ use_error_head: true
26
+ use_listener_modeling: false
27
+ use_mean_listener: false
28
+ use_mora: true
29
+ use_speaker_grl: true
30
+ mora_vocab_path: data/vocab.txt
31
+ num_speakers: 13
32
+ sampling_rate: 16000