mmcauliffe commited on
Commit
0091ffa
·
verified ·
1 Parent(s): 39655e7

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ acoustic/final.alimdl filter=lfs diff=lfs merge=lfs -text
37
+ acoustic/final.mdl filter=lfs diff=lfs merge=lfs -text
38
+ acoustic/tree filter=lfs diff=lfs merge=lfs -text
39
+ g2p/korean_mfa/model.fst filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ko
4
+ pipeline_tag: other
5
+ library_name: montreal-forced-aligner
6
+ tags:
7
+ - montreal-forced-aligner
8
+ - forced-alignment
9
+ license: cc-by-4.0
10
+ ---
11
+ # Model Card for korean_mfa
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+ This MFA model is for aligning Korean speech.
16
+
17
+ - [Model details](#model-details)
18
+ - [Uses](#uses)
19
+ - [Performance Factors](#how-to-get-started-with-the-model)
20
+ - [Dictionary Details](#dictionary-details)
21
+ - [Training Details](#training-details)
22
+ - [Evaluation](#evaluation)
23
+ - [Contact](#contact)
24
+
25
+ ## Model Details
26
+
27
+ ### Model Description
28
+
29
+ <!-- Provide a longer summary of what this model is. -->
30
+
31
+
32
+
33
+ - **Developed by:** Michael McAuliffe
34
+ - **Funded by:** N/A
35
+ - **Model type:** Montreal Forced Aligner model
36
+ - **Language(s) (NLP):** Korean
37
+ - **License:** cc-by-4.0
38
+
39
+ ## Uses
40
+
41
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
42
+
43
+ ### Direct Use
44
+
45
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
46
+
47
+ This model is intended to be used for forced alignment of Korean speech. Please see https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html for details on common fixes.
48
+
49
+ ### Out-of-Scope Use
50
+
51
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
52
+
53
+ This model cannot provide accurate assessments of goodness of pronunciations or provide transcripts as it is trained to be accepting of variation in pronunciation to provide a reasonable alignment for Korean speech.
54
+
55
+ ## Bias, Risks, and Limitations
56
+
57
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
58
+
59
+ This model will perform best on the variety of speech that it was trained on. The speakers in the training data are all adult speakers, so child speech alignment may not be accurate.
60
+
61
+ ### Recommendations
62
+
63
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
64
+
65
+ When using this model on a variety that it was not trained on, better results can be attained by adapting the model to the data to be aligned first. See https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/adapt_acoustic_model.html and https://github.com/mmcauliffe/mfa-adaptation for example usage and scripts.
66
+
67
+ ## How to Get Started with the Model
68
+
69
+ Use the code below to get started with the model.
70
+
71
+ To get started, follow the instructions for [installing MFA](https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html). To align files using this model, use the [mfa align](https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/alignment.html) command.
72
+
73
+ ## Dictionary Details
74
+
75
+ #### Details for korean_mfa dictionary and G2P model
76
+
77
+ - **Source:** wikipron
78
+ - **Orthography:** Hangul
79
+ - **Phone set:** MFA
80
+ - **Words:** 79,188
81
+ * **Phones:** 95
82
+ * **Graphemes:** 2,250
83
+
84
+ ##### IPA chart
85
+
86
+ ###### Consonants
87
+
88
+ | Manner | Labial | Alveolar | Retroflex | Palatal | Velar | Glottal |
89
+ | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
90
+ | **Nasal** | m mː | n nː | | ɲ | ŋ | |
91
+ | Palatalized | mʲ mʲː | | | | | |
92
+ | **Stop** | p b | t d | | c ɟ | k ɡ | |
93
+ | Tense | p͈ | t͈ t͈ː | | c͈ | k͈ k͈ː | |
94
+ | Aspirated | pʰ pʰː | tʰ tʰː | | cʰ cʰː | kʰ kʰː | |
95
+ | Unreleased | p̚ | t̚ | | | k̚ | |
96
+ | Labialized | pʷ bʷ | tʷ tʷː dʷ | | | kʷ kʷː ɡʷ | |
97
+ | Palatalized | pʲ pʲː bʲ | tʲ dʲ | | | | |
98
+ | **Affricate** | | | | tɕ tɕː dʑ | | |
99
+ | Tense | | | | tɕ͈ tɕ͈ː | | |
100
+ | Aspirated | | | | tɕʰ tɕʰː | | |
101
+ | Labialized | | | | tɕʷ tɕʷː dʑʷ | | |
102
+ | **Sibilant** | | s sː | | | | |
103
+ | Tense | | s͈ | | ɕ͈ | | |
104
+ | Aspirated | | sʰ sʰː | | ɕʰ | | |
105
+ | Labialized | | sʷ | | | | |
106
+ | **Fricative** | | | | ç ʝ | | h ɦ|
107
+ | Labialized | ɸʷ | | | | | |
108
+ | **Approximant** | w | | | j ɥ | ɰ | |
109
+ | **Tap** | | ɾ | | | | |
110
+ | Labialized | | ɾʷ | | | | |
111
+ | Palatalized | | ɾʲ | | | | |
112
+ | **Lateral** | | | | ʎ ʎː | | |
113
+ | **Lateral Tap** | | | ɭ ɭː | | | |
114
+
115
+ ###### Vowels
116
+
117
+ | | Front | Near-Front | Central | Near-Back | Back |
118
+ | :----: | :----: | :----: | :----: | :----: | :----: |
119
+ | **Close** | i iː | | ɨ ɨː | | u uː|
120
+ | | | | | | |
121
+ | **Close-Mid** | e eː | | | | o oː|
122
+ | | | | | | |
123
+ | **Open-Mid** | ɛ ɛː | | | | ʌ ʌː|
124
+ | | | | ɐ | | |
125
+ | **Open** | | | | | |
126
+
127
+ ## Training Details
128
+
129
+ ### Training Data
130
+
131
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
132
+
133
+ #### Common Voice Korean
134
+
135
+ - **Source:** https://voice.mozilla.org/en/datasets
136
+ - **License:** [CC-0](https://creativecommons.org/publicdomain/zero/1.0/)
137
+ - **Dialects:** N/A
138
+ - **Number of hours:** 1.41
139
+ - **Number of utterances:** 916
140
+ - **Number of speakers:** 46
141
+ - **Female speakers:** 7
142
+ - **Male speakers:** 15
143
+ - **Unknown speakers:** 24
144
+
145
+
146
+ #### GlobalPhone Korean
147
+
148
+ - **Source:** https://catalogue.elra.info/en-us/repository/browse/ELRA-S0200/
149
+ - **License:** [ELRA](https://www.elra.info/en/services-around-lrs/distribution/licensing/)
150
+ - **Dialects:** N/A
151
+ - **Number of hours:** 20.88
152
+ - **Number of utterances:** 8,007
153
+ - **Number of speakers:** 99
154
+ - **Female speakers:** 50
155
+ - **Male speakers:** 49
156
+ - **Unknown speakers:** 0
157
+
158
+
159
+ #### Deeply Korean read speech corpus public sample
160
+
161
+ - **Source:** https://www.openslr.org/97/
162
+ - **License:** [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/)
163
+ - **Dialects:** N/A
164
+ - **Number of hours:** 2.93
165
+ - **Number of utterances:** 1,697
166
+ - **Number of speakers:** 2
167
+ - **Female speakers:** 2
168
+ - **Male speakers:** 0
169
+ - **Unknown speakers:** 0
170
+
171
+
172
+ #### ASR-KCSC A Korean Conversational Speech Corpus
173
+
174
+ - **Source:** https://magichub.com/datasets/korean-conversational-speech-corpus/
175
+ - **License:** [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/)
176
+ - **Dialects:** N/A
177
+ - **Number of hours:** 2.38
178
+ - **Number of utterances:** 4,404
179
+ - **Number of speakers:** 14
180
+ - **Female speakers:** 4
181
+ - **Male speakers:** 10
182
+ - **Unknown speakers:** 0
183
+
184
+
185
+ #### ASR-SKDuSC A Scripted Korean Daily-use Speech Corpus
186
+
187
+ - **Source:** https://magichub.com/datasets/korean-scripted-speech-corpus-daily-use-sentence/
188
+ - **License:** [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/)
189
+ - **Dialects:** N/A
190
+ - **Number of hours:** 3.90
191
+ - **Number of utterances:** 5,240
192
+ - **Number of speakers:** 10
193
+ - **Female speakers:** 5
194
+ - **Male speakers:** 5
195
+ - **Unknown speakers:** 0
196
+
197
+
198
+ #### Korean Single Speaker Speech Dataset
199
+
200
+ - **Source:** https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset
201
+ - **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
202
+ - **Dialects:** N/A
203
+ - **Number of hours:** 12.86
204
+ - **Number of utterances:** 12,854
205
+ - **Number of speakers:** 40
206
+ - **Female speakers:** 1
207
+ - **Male speakers:** 0
208
+ - **Unknown speakers:** 0
209
+
210
+
211
+ #### Pansori TEDxKR
212
+
213
+ - **Source:** http://www.openslr.org/58/
214
+ - **License:** [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/)
215
+ - **Dialects:** N/A
216
+ - **Number of hours:** 2.69
217
+ - **Number of utterances:** 2,989
218
+ - **Number of speakers:** 41
219
+ - **Female speakers:** 9
220
+ - **Male speakers:** 32
221
+ - **Unknown speakers:** 0
222
+
223
+
224
+ #### Zeroth Korean
225
+
226
+ - **Source:** https://openslr.org/40/
227
+ - **License:** [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
228
+ - **Dialects:** N/A
229
+ - **Number of hours:** 52.87
230
+ - **Number of utterances:** 22,720
231
+ - **Number of speakers:** 115
232
+ - **Female speakers:** 70
233
+ - **Male speakers:** 45
234
+ - **Unknown speakers:** 0
235
+
236
+
237
+ ### Training Procedure
238
+
239
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
240
+
241
+ #### Preprocessing
242
+
243
+ Preprocessing include fixes and orthographic standardization to various corpora.
244
+
245
+
246
+ #### Training Hyperparameters
247
+
248
+ - **Training regime:** [Training configuration](config.yaml)
249
+
250
+ ## Evaluation
251
+
252
+ <!-- This section describes the evaluation protocols and provides the results. -->
253
+
254
+ ### Testing Data, Factors & Metrics
255
+
256
+ #### Testing Data
257
+
258
+ <!-- This should link to a Dataset Card if possible. -->
259
+
260
+ N/A
261
+
262
+ #### Factors
263
+
264
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
265
+
266
+ N/A
267
+
268
+ #### Metrics
269
+
270
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
271
+
272
+ N/A
273
+
274
+ ### Results
275
+
276
+ N/A
277
+
278
+ #### Summary
279
+
280
+
281
+
282
+ ## Technical Specifications
283
+
284
+ ### Model Architecture and Objective
285
+
286
+ HMM-GMM model
287
+
288
+ #### Software
289
+
290
+ This model was trained via the [Montreal Forced Aligner](https://montreal-forced-aligner.readthedocs.io/).
291
+
292
+ ## Citation
293
+
294
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
295
+
296
+ **BibTeX:**
297
+
298
+ ```
299
+ @techreport{mfa_korean_mfa_acoustic_2026,
300
+ author={McAuliffe, Michael and Sonderegger, Morgan},
301
+ title={Korean MFA acoustic model v3.3.0},
302
+ address={\url{https://huggingface.co/MontrealCorpusTools/korean_mfa}},
303
+ year={2026},
304
+ month={Jun},
305
+ }
306
+ ```
307
+
308
+ **APA:**
309
+
310
+ ```
311
+ McAuliffe, M. & Sonderegger, M. (2026). Korean MFA acoustic model v3.3.0. Available at https://huggingface.co/MontrealCorpusTools/korean_mfa.
312
+ ```
313
+
314
+ ## Contact
315
+
316
+ For questions and issues, please file an issue either for this model at https://huggingface.co/MontrealCorpusTools/korean_mfa/discussions or for larger MFA issues at https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues.
acoustic/final.alimdl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30f0ae1f6117978348b8d8c0b1130105cf856b6932bcf6e7eee1b9e444e07c84
3
+ size 33868683
acoustic/final.mdl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e26c3ffd8b7ff641d4a4026b4f144d5dd5c9b255a0a95b4e4e585f39d0f7c2e5
3
+ size 33868683
acoustic/lda.mat ADDED
Binary file (14.6 kB). View file
 
acoustic/meta.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"phones": ["b", "bʲ", "bʷ", "c", "cʰ", "cʰː", "c͈", "d", "dʑ", "dʑʷ", "dʲ", "dʷ", "e", "eː", "h", "i", "iː", "j", "k", "kʰ", "kʰː", "kʷ", "kʷː", "k̚", "k͈", "k͈ʷ", "k͈ː", "m", "mʲ", "mʲː", "mː", "n", "nː", "o", "oː", "p", "pʰ", "pʰː", "pʲ", "pʲː", "pʷ", "p̚", "p͈", "p͈ʲ", "p͈ʷ", "p͈ː", "s", "sʰ", "sʰː", "sʷ", "sː", "s͈", "s͈ʷ", "s͈ː", "t", "tɕ", "tɕʰ", "tɕʰː", "tɕʷ", "tɕʷː", "tɕː", "tɕ͈", "tɕ͈ʷ", "tɕ͈ː", "tʰ", "tʰː", "tʲ", "tʷ", "tʷː", "t̚", "t͈", "t͈ʲ", "t͈ː", "u", "uː", "w", "x", "ç", "ŋ", "ɐ", "ɕʰ", "ɕ͈", "ɛ", "ɛː", "ɟ", "ɡ", "ɡʷ", "ɣ", "ɥ", "ɦ", "ɨ", "ɨː", "ɭ", "ɭː", "ɰ", "ɲ", "ɸʷ", "ɾ", "ɾʲ", "ɾʷ", "ʌ", "ʌː", "ʎ", "ʎː", "ʝ", "β", "βʷ"], "phone_mapping": {"<eps>": 0, "sil": 1, "spn": 2, "b": 3, "bʲ": 4, "bʷ": 5, "c": 6, "cʰ": 7, "cʰː": 8, "c͈": 9, "d": 10, "dʑ": 11, "dʑʷ": 12, "dʲ": 13, "dʷ": 14, "e": 15, "eː": 16, "h": 17, "i": 18, "iː": 19, "j": 20, "k": 21, "kʰ": 22, "kʰː": 23, "kʷ": 24, "kʷː": 25, "k̚": 26, "k͈": 27, "k͈ʷ": 28, "k͈ː": 29, "m": 30, "mʲ": 31, "mʲː": 32, "mː": 33, "n": 34, "nː": 35, "o": 36, "oː": 37, "p": 38, "pʰ": 39, "pʰː": 40, "pʲ": 41, "pʲː": 42, "pʷ": 43, "p̚": 44, "p͈": 45, "p͈ʲ": 46, "p͈ʷ": 47, "p͈ː": 48, "s": 49, "sʰ": 50, "sʰː": 51, "sʷ": 52, "sː": 53, "s͈": 54, "s͈ʷ": 55, "s͈ː": 56, "t": 57, "tɕ": 58, "tɕʰ": 59, "tɕʰː": 60, "tɕʷ": 61, "tɕʷː": 62, "tɕː": 63, "tɕ͈": 64, "tɕ͈ʷ": 65, "tɕ͈ː": 66, "tʰ": 67, "tʰː": 68, "tʲ": 69, "tʷ": 70, "tʷː": 71, "t̚": 72, "t͈": 73, "t͈ʲ": 74, "t͈ː": 75, "u": 76, "uː": 77, "w": 78, "x": 79, "ç": 80, "ŋ": 81, "ɐ": 82, "ɕʰ": 83, "ɕ͈": 84, "ɛ": 85, "ɛː": 86, "ɟ": 87, "ɡ": 88, "ɡʷ": 89, "ɣ": 90, "ɥ": 91, "ɦ": 92, "ɨ": 93, "ɨː": 94, "ɭ": 95, "ɭː": 96, "ɰ": 97, "ɲ": 98, "ɸʷ": 99, "ɾ": 100, "ɾʲ": 101, "ɾʷ": 102, "ʌ": 103, "ʌː": 104, "ʎ": 105, "ʎː": 106, "ʝ": 107, "β": 108, "βʷ": 109}, "phone_groups": {"0": ["c", "cʰ", "cʰː", "c͈", "k", "kʰ", "kʰː", "kʷ", "kʷː", "k̚", "k͈", "k͈ʷ", "k͈ː", "ɟ", "ɡ", "ɡʷ"], "1": ["m", "mʲ", "mʲː", "mː"], "2": ["n", "nː"], "3": ["ŋ"], "4": ["ɲ"], "5": ["b", "bʲ", "bʷ", "p", "pʰ", "pʰː", "pʲ", "pʲː", "pʷ", "p̚", "p͈", "p͈ʲ", "p͈ʷ", "p͈ː"], "6": ["s", "sʰ", "sʰː", "sʷ", "sː", "s͈", "s͈ʷ", "s͈ː", "ɕʰ", "ɕ͈"], "7": ["d", "dʲ", "dʷ", "t", "tʰ", "tʰː", "tʲ", "tʷ", "tʷː", "t̚", "t͈", "t͈ʲ", "t͈ː"], "8": ["dʑ", "dʑʷ", "tɕ", "tɕʰ", "tɕʰː", "tɕʷ", "tɕʷː", "tɕː", "tɕ͈", "tɕ͈ʷ", "tɕ͈ː"], "9": ["j"], "10": ["w", "ɥ"], "11": ["h", "x", "ç", "ɣ", "ɦ", "ɸʷ", "ʝ", "β", "βʷ"], "12": ["ɰ"], "13": ["ɭ", "ɭː", "ɾ", "ɾʲ", "ɾʷ", "ʎ", "ʎː"], "14": ["e", "eː"], "15": ["i", "iː"], "16": ["o", "oː"], "17": ["u", "uː"], "18": ["ɐ"], "19": ["ɛ", "ɛː"], "20": ["ɨ", "ɨː"], "21": ["ʌ", "ʌː"]}, "version": "3.3.0", "architecture": "gmm-hmm", "train_date": "2026-05-29 15:42:40.100545", "training": {"audio_duration": 358795.7063812503, "num_speakers": 328, "num_utterances": 58673, "num_oovs": 0, "average_log_likelihood": -53.21066345886001}, "dictionaries": {"names": ["korean_mfa"], "default": "korean_mfa", "silence_word": "<eps>", "oov_word": "<unk>", "bracketed_word": "[bracketed]", "laughter_word": "[laughter]", "clitic_marker": "'", "position_dependent_phones": false}, "language": "korean", "tokenization": "ko", "features": {"type": "mfcc", "use_energy": false, "frame_shift": 10, "frame_length": 25, "snip_edges": false, "low_frequency": 20, "high_frequency": 7800, "sample_frequency": 16000, "dither": 0.0, "energy_floor": 0.0, "num_coefficients": 13, "num_mel_bins": 23, "cepstral_lifter": 22, "preemphasis_coefficient": 0.97, "uses_cmvn": true, "uses_deltas": true, "uses_voiced": false, "uses_splices": false, "uses_speaker_adaptation": true, "use_pitch": false, "use_voicing": false, "min_f0": 50, "max_f0": 800, "delta_pitch": 0.005, "penalty_factor": 0.1, "silence_weight": 0.0, "splice_left_context": 3, "splice_right_context": 3}, "oov_phone": "spn", "optional_silence_phone": "sil", "phone_set_type": "UNKNOWN", "silence_probability": 0.19, "initial_silence_probability": 0.19, "final_silence_correction": 0.99, "final_non_silence_correction": 0.67, "duration_information": {"ɐ": [0.0944537167478265, 0.05624239545008239], "b": [0.045465315594578636, 0.017200353185726262], "bʲ": [0.04608616130336301, 0.0204179624755777], "bʷ": [0.044351451566528574, 0.014388719925052905], "c": [0.06867935076794375, 0.023232459440357317], "ç": [0.027030774414688034, 0.04083320565627037], "c͈": [0.08543933938404173, 0.03492565783441358], "ɕ͈": [0.11110341170496002, 0.03449956374581792], "cʰ": [0.08932478845568807, 0.037007782195684326], "ɕʰ": [0.09586567811346058, 0.028249081128633922], "cʰː": [0.09531727662136934, 0.041883415044999504], "i": [0.07826705302827526, 0.046659471147952145], "k͈ː": [0.09818180853670294, 0.04875951612104652], "kʰ": [0.09046606580246702, 0.03381925212208435], "kʰː": [0.09226140592693317, 0.04119448169342685], "kʷ": [0.06951336209547695, 0.028298600278185877], "k͈ʷ": [0.07865022648514414, 0.03711956372900321], "kʷː": [0.09920636055961488, 0.05756134985072817], "ɭ": [0.08203608569897247, 0.04334541140635632], "ɭː": [0.09590888991649743, 0.03141607191912898], "m": [0.0729560726020955, 0.034631159114643824], "o": [0.09900500805587517, 0.0620920102428152], "oː": [0.07946007181420488, 0.04155943193648842], "p": [0.060894016544105464, 0.023380399319306452], "p̚": [0.06987341826913612, 0.03752675317882545], "p͈": [0.06609010059084441, 0.0376953296952393], "pʰ": [0.0747211751576758, 0.031926825700980085], "pʰː": [0.10071793825198443, 0.04756435837349148], "pʲ": [0.07522526203354213, 0.028423201397624377], "d": [0.01017631164212778, 0.0037781221201501737], "dʲ": [0.011904091410055146, 0.009123288874256985], "dʷ": [0.010000050067901611, 1.192092895508776e-07], "dʑ": [0.057641291586674255, 0.017810593879277727], "dʑʷ": [0.060849518741218786, 0.015738336333924033], "e": [0.09979085237729697, 0.0596493986928503], "eː": [0.08770729971566765, 0.045740261442416906], "ɛ": [0.09008620782145138, 0.02764668794752714], "ɛː": [0.08906200543833825, 0.04866974070609454], "ɣ": [0.018397060913198135, 0.027367367796218976], "ɡ": [0.050642381545299806, 0.018368975860795066], "ɡʷ": [0.056764111132165784, 0.024193185438085754], "h": [0.03257033395074134, 0.04045475538543148], "ɦ": [0.06427389557472125, 0.04139362981009892], "ɥ": [0.07165965575608262, 0.03467807771539542], "ɨ": [0.026330169682997336, 0.03792009919079325], "iː": [0.0780450382953678, 0.04836308037551828], "ɨː": [0.021917517654394543, 0.01269141738901721], "j": [0.04105291593131313, 0.035686453178253315], "ɟ": [0.054495761434496234, 0.018093372547313316], "ʝ": [0.013112597448468337, 0.015817996080474097], "k": [0.06421705267113606, 0.02452775974191083], "k̚": [0.06988320494896039, 0.031981344955316425], "k͈": [0.07110328814396384, 0.030803619500458783], "ɰ": [0.07406577923038739, 0.051344104378341957], "mː": [0.1594552130521612, 0.12326802401990637], "mʲ": [0.06490239861908034, 0.022623000513712924], "mʲː": [0.11466019506593353, 0.025660513010120206], "n": [0.08365191108909147, 0.046970114742603986], "ɲ": [0.07004780962979791, 0.03201699051627825], "nː": [0.11551804782975698, 0.03236190150181309], "ŋ": [0.08125365809697314, 0.03678386933872775], "p͈ʲ": [0.054244301301969895, 0.029290836025396904], "pʲː": [0.09137095078345268, 0.03448426171274466], "pʷ": [0.061922496667614686, 0.028696670556932733], "ɸʷ": [0.06200548321456946, 0.04534951964769658], "ɾ": [0.04712568975536784, 0.029000241153552566], "ɾʲ": [0.04805648569119328, 0.020789691150278577], "ɾʷ": [0.06045454740524292, 0.029516273524186395], "s": [0.10866681674679557, 0.03205544226433253], "s͈": [0.10525017068714292, 0.04032349965042587], "sː": [0.1224999651312828, 0.03201562866531356], "sʰ": [0.08223212593819165, 0.027652994436785962], "sʰː": [0.12400004794964424, 0.034496364948967184], "sʷ": [0.10370360051138212, 0.03213177151108678], "s͈ʷ": [0.0985713005065918, 0.01215007060959388], "t": [0.010079160986214643, 0.0018928840526264703], "t̚": [0.02187951506475963, 0.02891007998275782], "t͈": [0.010010298976927765, 0.0003203801685719552], "t͈ː": [0.010517258068610882, 0.0022340576258376237], "tɕ": [0.07845359283959281, 0.025667396791969922], "tɕ͈": [0.07273222023727431, 0.024602600801936807], "tɕː": [0.12099995017051697, 0.051521283753670565], "tɕ͈ː": [0.10726561432238668, 0.0490976044082387], "tɕʰ": [0.09057514629506896, 0.03055671934072764], "tɕʰː": [0.11189653398031164, 0.041777732331943145], "tɕʷ": [0.09714214383162183, 0.02620062098284465], "tɕ͈ʷ": [0.08333341280619304, 0.025166153942597885], "tɕʷː": [0.10783580256931817, 0.0308369826878683], "tʰ": [0.010664928242436497, 0.005659367437305361], "tʰː": [0.010087813887078741, 0.0009352197188529931], "tʲ": [0.04789021083368262, 0.017753557267540316], "t͈ʲ": [0.02610194851449654, 0.011183040221802408], "tʷ": [0.010214764725898823, 0.00345002669464492], "u": [0.06830854615978124, 0.03768164592076931], "uː": [0.06961065109572424, 0.03926154283915598], "ʌ": [0.0850075056317423, 0.047978405823394944], "ʌː": [0.06624785185147336, 0.03231563180959204], "w": [0.06914171017706394, 0.03461604461195489], "x": [0.0727416016098656, 0.04127757395048006], "ʎ": [0.0801888655981805, 0.031024323014842328], "ʎː": [0.10377037294306181, 0.033531738476904265], "β": [0.015177283128934353, 0.02066268482302303], "βʷ": [0.017770505425069413, 0.025791821543289108]}}
acoustic/phones.txt ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ sil 1
3
+ spn 2
4
+ b 3
5
+ bʲ 4
6
+ bʷ 5
7
+ c 6
8
+ cʰ 7
9
+ cʰː 8
10
+ c͈ 9
11
+ d 10
12
+ dʑ 11
13
+ dʑʷ 12
14
+ dʲ 13
15
+ dʷ 14
16
+ e 15
17
+ eː 16
18
+ h 17
19
+ i 18
20
+ iː 19
21
+ j 20
22
+ k 21
23
+ kʰ 22
24
+ kʰː 23
25
+ kʷ 24
26
+ kʷː 25
27
+ k̚ 26
28
+ k͈ 27
29
+ k͈ʷ 28
30
+ k͈ː 29
31
+ m 30
32
+ mʲ 31
33
+ mʲː 32
34
+ mː 33
35
+ n 34
36
+ nː 35
37
+ o 36
38
+ oː 37
39
+ p 38
40
+ pʰ 39
41
+ pʰː 40
42
+ pʲ 41
43
+ pʲː 42
44
+ pʷ 43
45
+ p̚ 44
46
+ p͈ 45
47
+ p͈ʲ 46
48
+ p͈ʷ 47
49
+ p͈ː 48
50
+ s 49
51
+ sʰ 50
52
+ sʰː 51
53
+ sʷ 52
54
+ sː 53
55
+ s͈ 54
56
+ s͈ʷ 55
57
+ s͈ː 56
58
+ t 57
59
+ tɕ 58
60
+ tɕʰ 59
61
+ tɕʰː 60
62
+ tɕʷ 61
63
+ tɕʷː 62
64
+ tɕː 63
65
+ tɕ͈ 64
66
+ tɕ͈ʷ 65
67
+ tɕ͈ː 66
68
+ tʰ 67
69
+ tʰː 68
70
+ tʲ 69
71
+ tʷ 70
72
+ tʷː 71
73
+ t̚ 72
74
+ t͈ 73
75
+ t͈ʲ 74
76
+ t͈ː 75
77
+ u 76
78
+ uː 77
79
+ w 78
80
+ x 79
81
+ ç 80
82
+ ŋ 81
83
+ ɐ 82
84
+ ɕʰ 83
85
+ ɕ͈ 84
86
+ ɛ 85
87
+ ɛː 86
88
+ ɟ 87
89
+ ɡ 88
90
+ ɡʷ 89
91
+ ɣ 90
92
+ ɥ 91
93
+ ɦ 92
94
+ ɨ 93
95
+ ɨː 94
96
+ ɭ 95
97
+ ɭː 96
98
+ ɰ 97
99
+ ɲ 98
100
+ ɸʷ 99
101
+ ɾ 100
102
+ ɾʲ 101
103
+ ɾʷ 102
104
+ ʌ 103
105
+ ʌː 104
106
+ ʎ 105
107
+ ʎː 106
108
+ ʝ 107
109
+ β 108
110
+ βʷ 109
acoustic/tree ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31e25e8a47cfe489edb9319ad10dd973e8c6afbad97853deb09509302ad11be8
3
+ size 344276
dictionary/korean_mfa.dict ADDED
The diff for this file is too large to render. See raw diff
 
dictionary/rules.yaml ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ dialects:
2
+ us:
3
+ - following_context: '[tpkc].*'
4
+ non_silence_before_correction: 0.07
5
+ preceding_context: ''
6
+ probability: 0.08
7
+ replacement: ''
8
+ segment: k̚
9
+ silence_after_probability: 0.88
10
+ silence_before_correction: -0.29
11
+ - following_context: '[tpkc].*'
12
+ non_silence_before_correction: 0.02
13
+ preceding_context: ''
14
+ probability: 0.06
15
+ replacement: ''
16
+ segment: p̚
17
+ silence_after_probability: 1.44
18
+ silence_before_correction: -0.12
19
+ - following_context: '[tpkc].*'
20
+ non_silence_before_correction: -0.03
21
+ preceding_context: ''
22
+ probability: 0.72
23
+ replacement: ''
24
+ segment: t̚
25
+ silence_after_probability: 1.35
26
+ silence_before_correction: 0.14
27
+ - following_context: $
28
+ non_silence_before_correction: 0.07
29
+ preceding_context: ''
30
+ probability: 0.14
31
+ replacement: ''
32
+ segment: k̚
33
+ silence_after_probability: 1.55
34
+ silence_before_correction: -0.19
35
+ - following_context: $
36
+ non_silence_before_correction: 0.02
37
+ preceding_context: ''
38
+ probability: 0.13
39
+ replacement: ''
40
+ segment: p̚
41
+ silence_after_probability: 1.07
42
+ silence_before_correction: -0.1
43
+ - following_context: $
44
+ non_silence_before_correction: 0.01
45
+ preceding_context: ''
46
+ probability: 0.39
47
+ replacement: ''
48
+ segment: t̚
49
+ silence_after_probability: 1.11
50
+ silence_before_correction: -0.03
51
+ - following_context: ''
52
+ non_silence_before_correction: -0.01
53
+ preceding_context: ''
54
+ probability: 0.67
55
+ replacement: n
56
+ segment: n h
57
+ silence_after_probability: 0.78
58
+ silence_before_correction: 0.05
59
+ - following_context: ''
60
+ non_silence_before_correction: 0.03
61
+ preceding_context: ^
62
+ probability: 0.11
63
+ replacement: d
64
+ segment: n
65
+ silence_after_probability: 1.04
66
+ silence_before_correction: -0.09
67
+ - following_context: ''
68
+ non_silence_before_correction: -0.03
69
+ preceding_context: ^
70
+ probability: 0.03
71
+ replacement: dʲ
72
+ segment: ɲ
73
+ silence_after_probability: 0.89
74
+ silence_before_correction: 0.07
75
+ - following_context: ''
76
+ non_silence_before_correction: 0.08
77
+ preceding_context: ^
78
+ probability: 0.1
79
+ replacement: bʲ
80
+ segment: mʲ
81
+ silence_after_probability: 0.41
82
+ silence_before_correction: -0.19
83
+ - following_context: ''
84
+ non_silence_before_correction: -0.09
85
+ preceding_context: ''
86
+ probability: 0.84
87
+ replacement: ɲ
88
+ segment: n ʝ
89
+ silence_after_probability: 0.75
90
+ silence_before_correction: 0.26
91
+ - following_context: ''
92
+ non_silence_before_correction: 0.07
93
+ preceding_context: ''
94
+ probability: 0.35
95
+ replacement: dʑ
96
+ segment: dʲ
97
+ silence_after_probability: 1.33
98
+ silence_before_correction: -0.26
99
+ - following_context: ''
100
+ non_silence_before_correction: 0.12
101
+ preceding_context: ''
102
+ probability: 0.36
103
+ replacement: tɕʰ
104
+ segment: tʲ
105
+ silence_after_probability: 0.7
106
+ silence_before_correction: -0.33
107
+ rules: []
g2p/korean_mfa/graphemes.sym ADDED
@@ -0,0 +1,570 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ | 1
3
+ _ 2
4
+ ᆫ 3
5
+ ᄃ|ᅡ 4
6
+ ᄃ|ᅦ 5
7
+ ᆫ|ᄌ 6
8
+ ᅵ 7
9
+ ᆯ 8
10
+ ᄀ|ᅦ 9
11
+ ᆯ|ᄌ 10
12
+ ᆸ|ᄂ 11
13
+ ᄁ|ᅡ 12
14
+ ᆸ|ᄉ 13
15
+ ᄀ 14
16
+ ᄂ 15
17
+ ᄃ 16
18
+ ᄉ|ᅮ 17
19
+ ᄋ 18
20
+ ᄅ 19
21
+ ᄌ 20
22
+ ᄉ 21
23
+ ᄏ 22
24
+ ᅳ 23
25
+ ᄒ|ᅢ 24
26
+ ᄐ|ᅳ 25
27
+ ᄀ|ᅡ 26
28
+ ᄅ|ᅵ 27
29
+ ᄋ|ᅮ 28
30
+ ᄉ|ᅳ 29
31
+ ᄋ|ᅥ 30
32
+ ᆷ 31
33
+ ᆼ 32
34
+ ᄅ|ᅣ 33
35
+ ᅣ 34
36
+ ᄉ|ᅵ 35
37
+ ᄌ|ᅥ 36
38
+ ᄀ|ᅥ 37
39
+ ᄃ|ᅳ 38
40
+ ᆫ|ᄋ 39
41
+ ᅭ 40
42
+ ᄂ|ᅳ 41
43
+ ᄋ|ᅦ 42
44
+ ᄉ|ᅥ 43
45
+ ᆻ|ᄃ 44
46
+ ᅡ 45
47
+ ᄀ|ᅩ 46
48
+ ᆻ|ᄉ 47
49
+ ᅳ|ᆸ 48
50
+ ᄂ|ᅵ 49
51
+ ᄀ|ᅧ 50
52
+ ᆨ 51
53
+ ᆨ|ᄃ 52
54
+ ᅢ 53
55
+ ᄅ|ᅩ 54
56
+ ᆨ|ᄋ 55
57
+ ᅳ|ᆫ 56
58
+ ᅳ|ᆯ 57
59
+ ᄋ|ᅴ 58
60
+ ᄑ|ᅩ 59
61
+ ᆨ|ᄅ 60
62
+ ᄅ|ᅳ 61
63
+ ᄀ|ᅨ 62
64
+ ᄃ|ᅢ 63
65
+ ᄎ|ᅮ 64
66
+ ᄇ|ᅮ 65
67
+ ᄎ|ᅢ 66
68
+ ᅭ|ᆼ 67
69
+ ᄐ|ᅩ 68
70
+ ᆫ|ᄇ 69
71
+ ᄉ|ᅡ 70
72
+ ᆼ|ᄑ 71
73
+ ᅮ|ᆷ 72
74
+ ᄒ|ᅡ 73
75
+ ᄀ|ᅪ 74
76
+ ᄀ|ᅮ 75
77
+ ᄌ|ᅮ 76
78
+ ᄀ|ᅳ 77
79
+ ᆸ|ᄌ 78
80
+ ᅥ|ᆨ 79
81
+ ᄀ|ᅵ 80
82
+ ᄃ|ᅩ 81
83
+ ᄋ|ᅪ 82
84
+ ᄋ|ᅯ 83
85
+ ᄌ|ᅧ 84
86
+ ᆻ 85
87
+ ᄋ|ᅵ 86
88
+ ᆸ 87
89
+ ᆸ|ᄀ 88
90
+ ᅩ 89
91
+ ᆸ|ᄃ 90
92
+ ᄁ|ᅩ 91
93
+ ᄁ|ᅮ 92
94
+ ᄁ|ᅯ 93
95
+ ᄁ|ᅳ 94
96
+ ᄂ|ᅡ 95
97
+ ᄒ|ᅧ 96
98
+ ᄆ|ᅡ 97
99
+ ᄒ|ᅵ 98
100
+ ᄂ|ᅦ 99
101
+ ᄂ|ᅮ 100
102
+ ᄅ|ᅡ 101
103
+ ᆫ|ᄀ 102
104
+ ᄋ|ᅭ 103
105
+ ᄌ|ᅱ 104
106
+ ᄌ|ᅡ 105
107
+ ᄋ|ᅳ 106
108
+ ᆼ|ᄌ 107
109
+ ᅮ|ᆼ 108
110
+ ᅮ 109
111
+ ᄆ|ᅧ 110
112
+ ᄌ|ᅭ 111
113
+ ᄌ|ᅵ 112
114
+ ᆺ 113
115
+ ᄃ|ᅥ 114
116
+ ᆨ|ᄉ 115
117
+ ᅥ|ᆼ 116
118
+ ᆯ|ᄅ 117
119
+ ᄂ|ᅲ 118
120
+ ᄃ|ᅬ 119
121
+ ᄃ|ᅮ 120
122
+ ᄃ|ᅯ 121
123
+ ᆨ|ᄀ 122
124
+ ᄎ|ᅡ 123
125
+ ᆨ|ᄒ 124
126
+ ᄄ|ᅡ 125
127
+ ᄄ|ᅳ 126
128
+ ᄋ|ᅡ 127
129
+ ᆬ 128
130
+ ᆬ|ᄃ 129
131
+ ᅳ|ᆺ 130
132
+ ᆬ|ᄒ 131
133
+ ᅧ|ᆻ 132
134
+ ᄋ|ᅩ 133
135
+ ᄏ|ᅦ 134
136
+ ᅩ|ᆼ 135
137
+ ᆨ|ᄌ 136
138
+ ᅥ|ᆷ 137
139
+ ᄈ|ᅡ 138
140
+ ᆼ|ᄇ 139
141
+ ᄅ|ᅧ 140
142
+ ᄂ|ᅢ 141
143
+ ᅧ 142
144
+ ᅥ 143
145
+ ᄅ|ᅮ 144
146
+ ᄎ|ᅧ 145
147
+ ᄋ|ᅣ 146
148
+ ᄌ|ᅯ 147
149
+ ᄎ|ᅵ 148
150
+ ᄏ|ᅵ 149
151
+ ᄇ|ᅵ 150
152
+ ᄏ|ᅧ 151
153
+ ᆷ|ᄆ 152
154
+ ᅡ|ᆨ 153
155
+ ᄉ|ᅩ 154
156
+ ᇀ 155
157
+ ᇂ|ᄃ 156
158
+ ᄆ|ᅢ 157
159
+ ᄆ|ᅥ 158
160
+ ᅳ|ᆨ 159
161
+ ᄆ|ᅩ 160
162
+ ᄆ|ᅮ 161
163
+ ᆯ|ᄎ 162
164
+ ᆺ|ᄀ 163
165
+ ᄆ|ᅵ 164
166
+ ᄏ|ᅡ 165
167
+ ᄌ|ᅦ 166
168
+ ᄇ|ᅡ 167
169
+ ᄇ|ᅥ 168
170
+ ᄇ|ᅧ 169
171
+ ᅦ 170
172
+ ᄇ|ᅩ 171
173
+ ᄇ|ᅳ 172
174
+ ᆯ|ᄋ 173
175
+ ᄉ|ᅦ 174
176
+ ᄉ|ᅧ 175
177
+ ᅵ|ᆫ 176
178
+ ᄅ|ᅦ 177
179
+ ᄑ|ᅡ 178
180
+ ᄎ|ᅩ 179
181
+ ᄑ|ᅳ 180
182
+ ᄈ|ᅧ 181
183
+ ᆷ|ᄋ 182
184
+ ᄀ|ᅲ 183
185
+ ᄎ|ᅥ 184
186
+ ᄅ|ᅥ 185
187
+ ᄒ|ᅪ 186
188
+ ᄊ|ᅳ 187
189
+ ᄋ|ᅢ 188
190
+ ᄋ|ᅧ 189
191
+ ᆹ|ᄃ 190
192
+ ᆹ 191
193
+ ᄉ|ᅲ 192
194
+ ᄋ|ᅬ 193
195
+ ᄋ|ᅱ 194
196
+ ᄀ|ᅢ 195
197
+ ᄐ|ᅡ 196
198
+ ᆸ|ᄒ 197
199
+ ᅢ|ᆻ 198
200
+ ᄎ|ᅦ 199
201
+ ᄑ|ᅮ 200
202
+ ᆼ|ᄃ 201
203
+ ᅥ|ᆸ 202
204
+ ᅵ|ᆸ 203
205
+ ᆼ|ᄐ 204
206
+ ᆻ|ᄋ 205
207
+ ᄌ|ᅩ 206
208
+ ᅪ 207
209
+ ᅡ|ᆼ 208
210
+ ᅢ|ᆼ 209
211
+ ᆨ|ᄁ 210
212
+ ᆼ|ᄎ 211
213
+ ᄌ|ᅳ 212
214
+ ᅳ|ᆷ 213
215
+ ᄈ|ᅮ 214
216
+ ᄍ|ᅡ 215
217
+ ᆨ|ᄇ 216
218
+ ᄏ|ᅳ 217
219
+ ᄐ|ᅢ 218
220
+ ᄒ|ᅬ 219
221
+ ᄑ|ᅧ 220
222
+ ᄒ|ᅩ 221
223
+ ᅨ 222
224
+ ᅮ|ᆨ 223
225
+ ᄀ|ᅱ 224
226
+ ᅥ|ᆫ 225
227
+ ᅣ|ᆼ 226
228
+ ᅢ|ᆨ 227
229
+ ᄎ|ᅳ 228
230
+ ᅩ|ᆨ 229
231
+ ᆫ|ᄂ 230
232
+ ᄃ|ᅵ 231
233
+ ᆫ|ᄅ 232
234
+ ᄊ|ᅮ 233
235
+ ᅲ 234
236
+ ᅥ|ᆯ 235
237
+ ᆫ|ᄎ 236
238
+ ᆫ|ᄑ 237
239
+ ᄒ|ᅥ 238
240
+ ᆮ|ᄒ 239
241
+ ᅲ|ᆷ 240
242
+ ᅵ|ᆷ 241
243
+ ᇀ|ᄌ 242
244
+ ᅱ 243
245
+ ᄍ|ᅵ 244
246
+ ᄉ|ᅢ 245
247
+ ᄄ|ᅮ 246
248
+ ᅩ|ᆺ 247
249
+ ᅳ|ᆼ 248
250
+ ᄏ|ᅱ 249
251
+ ᆰ|ᄃ 250
252
+ ᄁ|ᅦ 251
253
+ ᅯ|ᆫ 252
254
+ ᆷ|ᄅ 253
255
+ ᄀ|ᅭ 254
256
+ ᅧ|ᆫ 255
257
+ ᅧ|ᆼ 256
258
+ ᇁ 257
259
+ ᄅ|ᅭ 258
260
+ ᄆ|ᅦ 259
261
+ ᄊ|ᅡ 260
262
+ ᄃ|ᅫ 261
263
+ ᄐ|ᅱ 262
264
+ ᄂ|ᅩ 263
265
+ ᅮ|ᆯ 264
266
+ ᄎ|ᅯ 265
267
+ ᄐ|ᅬ 266
268
+ ᆼ|ᄒ 267
269
+ ᆸ|ᄋ 268
270
+ ᆸ|ᄑ 269
271
+ ᆹ|ᄋ 270
272
+ ᅥ|ᆻ 271
273
+ ᄄ|ᅢ 272
274
+ ᆻ|ᄌ 273
275
+ ᄀ|ᅯ 274
276
+ ᅧ|ᆨ 275
277
+ ᆼ|ᄅ 276
278
+ ᄒ|ᅣ 277
279
+ ᄊ|ᅵ 278
280
+ ᄌ|ᅢ 279
281
+ ᄒ|ᅮ 280
282
+ ᆮ 281
283
+ ᄋ|ᅲ 282
284
+ ᅣ|ᆨ 283
285
+ ᄒ|ᅴ 284
286
+ ᆽ|ᄀ 285
287
+ ᆽ 286
288
+ ᆽ|ᄃ 287
289
+ ᆽ|ᄎ 288
290
+ ᅯ 289
291
+ ᅯ|ᆻ 290
292
+ ᇀ|ᄀ 291
293
+ ᇀ|ᄃ 292
294
+ ᇀ|ᄉ 293
295
+ ᇀ|ᄋ 294
296
+ ᅡ|ᆻ 295
297
+ ᆭ|ᄃ 296
298
+ ᇁ|ᄃ 297
299
+ ᇁ|ᄋ 298
300
+ ᆾ 299
301
+ ᄂ|ᅧ 300
302
+ ᄄ|ᅩ 301
303
+ ᄐ|ᅮ 302
304
+ ᄎ|ᅬ 303
305
+ ᄏ|ᅩ 304
306
+ ᄑ|ᅭ 305
307
+ ᆺ|ᄇ 306
308
+ ᄂ|ᅣ 307
309
+ ᆸ|ᄊ 308
310
+ ᄅ|ᅢ 309
311
+ ᄅ|ᅲ 310
312
+ ᄁ|ᅵ 311
313
+ ᆺ|ᄆ 312
314
+ ᄐ|ᅦ 313
315
+ ᄐ|ᅵ 314
316
+ ᄇ|ᅦ 315
317
+ ᄋ|ᅨ 316
318
+ ᆭ|ᄋ 317
319
+ ᆩ 318
320
+ ᆺ|ᄋ 319
321
+ ᄎ|ᅱ 320
322
+ ᄍ|ᅥ 321
323
+ ᆮ|ᄀ 322
324
+ ᆮ|ᄃ 323
325
+ ᆮ|ᄌ 324
326
+ ᅡ|ᆸ 325
327
+ ᆯ|ᄍ 326
328
+ ᄑ|ᅵ 327
329
+ ᆰ 328
330
+ ᆰ|ᄋ 329
331
+ ᅧ|ᆯ 330
332
+ ᆸ|ᄆ 331
333
+ ᆸ|ᄇ 332
334
+ ᆺ|ᄃ 333
335
+ ᆺ|ᄌ 334
336
+ ᆴ 335
337
+ ᆴ|ᄃ 336
338
+ ᄉ|ᅣ 337
339
+ ᆽ|ᄋ 338
340
+ ᄇ|ᅢ 339
341
+ ᆯ|ᄉ 340
342
+ ᅧ|ᆸ 341
343
+ ᄒ|ᅳ 342
344
+ ᄐ|ᅥ 343
345
+ ᆼ|ᄋ 344
346
+ ᅡ|ᆯ 345
347
+ ᄌ|ᅪ 346
348
+ ᄏ|ᅫ 347
349
+ ᆫ|ᄍ 348
350
+ ᄄ|ᅥ 349
351
+ ᇂ|ᄀ 350
352
+ ᄂ|ᅭ 351
353
+ ᄈ|ᅵ 352
354
+ ᄂ|ᅥ 353
355
+ ᄍ|ᅩ 354
356
+ ᄒ 355
357
+ ᅫ 356
358
+ ᆮ|ᄇ 357
359
+ ᆮ|ᄋ 358
360
+ ᆱ 359
361
+ ᆼ|ᄂ 360
362
+ ᅬ|ᆫ 361
363
+ ᄆ|ᅭ 362
364
+ ᄆ|ᅳ 363
365
+ ᄒ|ᅭ 364
366
+ ᄌ|ᅬ 365
367
+ ᄒ|ᅲ 366
368
+ ᄈ|ᅳ 367
369
+ ᄄ|ᅦ 368
370
+ ᆯ|ᄂ 369
371
+ ᅫ|ᄊ 370
372
+ ᅫ|ᆫ 371
373
+ ᄊ|ᅥ 372
374
+ ᅡ|ᆭ 373
375
+ ᄀ|ᅬ 374
376
+ ᅬ 375
377
+ ᄉ|ᅬ 376
378
+ ᅲ|ᆼ 377
379
+ ᄈ|ᅩ 378
380
+ ᆱ|ᄀ 379
381
+ ᆸ|ᄅ 380
382
+ ᅰ 381
383
+ ᄄ|ᅴ 382
384
+ ᆭ 383
385
+ ᆺ|ᄉ 384
386
+ ᅵ|ᇁ 385
387
+ ᆺ|ᄒ 386
388
+ ᅥ|ᇂ 387
389
+ ᇂ|ᄉ 388
390
+ ᇂ|ᄌ 389
391
+ ᅩ|ᇂ 390
392
+ ᄍ|ᅳ 391
393
+ ᄎ 392
394
+ ᅲ|ᆨ 393
395
+ ᄒ|ᅨ 394
396
+ ᆮ|ᄎ 395
397
+ ᄊ|ᅦ 396
398
+ ᅭ|ᆨ 397
399
+ ᄁ|ᅥ 398
400
+ ᄅ|ᅨ 399
401
+ ᄄ|ᅵ 400
402
+ ᄈ|ᅥ 401
403
+ ᄑ|ᅥ 402
404
+ ᆾ|ᄃ 403
405
+ ᄁ|ᅱ 404
406
+ ᄑ|ᅦ 405
407
+ ᆩ|ᄋ 406
408
+ ᄍ 407
409
+ ᄁ|ᅢ 408
410
+ ᇂ 409
411
+ ᄁ 410
412
+ ᆩ|ᄃ 411
413
+ ᄁ|ᅧ 412
414
+ ᆾ|ᄀ 413
415
+ ᆾ|ᄇ 414
416
+ ᆾ|ᄉ 415
417
+ ᆾ|ᄋ 416
418
+ ᄁ|ᅪ 417
419
+ ᄁ|ᅭ 418
420
+ ᆶ|ᄃ 419
421
+ ᆶ 420
422
+ ᆭ|ᄀ 421
423
+ ᆶ|ᄋ 422
424
+ ᇀ|ᄂ 423
425
+ ᆬ|ᄋ 424
426
+ ᄇ|ᅪ 425
427
+ ᄇ|ᅲ 426
428
+ ᅪ|ᆫ 427
429
+ ᆩ|ᄉ 428
430
+ ᅵ|ᆼ 429
431
+ ᆯ|ᄄ 430
432
+ ᆿ 431
433
+ ᆽ|ᄇ 432
434
+ ᅳ|ᄅ 433
435
+ ᆽ|ᄌ 434
436
+ ᅡ|ᇀ 435
437
+ ᇂ|ᄂ 436
438
+ ᇂ|ᄋ 437
439
+ ᄉ|ᅱ 438
440
+ ᄑ|ᅢ 439
441
+ ᅤ 440
442
+ ᆨ|ᄂ 441
443
+ ᆪ|ᄃ 442
444
+ ᆲ 443
445
+ ᆲ|ᄌ 444
446
+ ᆲ|ᄒ 445
447
+ ᄍ|ᅮ 446
448
+ ᄆ|ᅲ 447
449
+ ᄍ|ᅢ 448
450
+ ᄋ|ᅰ 449
451
+ ᆺ|ᄂ 450
452
+ ᄉ|ᅭ 451
453
+ ᅧ|ᄂ 452
454
+ ᇂ|ᄎ 453
455
+ ᄏ|ᅮ 454
456
+ ᅮ|ᆫ 455
457
+ ᅩ|ᇁ 456
458
+ ᄏ|ᅥ 457
459
+ ᅥ|ᆹ 458
460
+ ᄊ 459
461
+ ᄐ|ᅲ 460
462
+ ᆺ|ᄍ 461
463
+ ᄏ|ᅲ 462
464
+ ᄐ 463
465
+ ᆯ|ᄀ 464
466
+ ᆰ|ᄀ 465
467
+ ᆰ|ᄊ 466
468
+ ᆱ|ᄋ 467
469
+ ᅥ|ᆺ 468
470
+ ᅬ|ᆼ 469
471
+ ᄏ|ᅢ 470
472
+ ᅮ|ᇀ 471
473
+ ᄊ|ᅴ 472
474
+ ᄇ 473
475
+ ᄃ|ᅱ 474
476
+ ᆯ|ᄃ 475
477
+ ᄃ|ᅲ 476
478
+ ᆮ|ᄂ 477
479
+ ᅵ|ᄇ 478
480
+ ᅡ|ᆮ 479
481
+ ᆫ|ᄃ 480
482
+ ᆲ|ᄃ 481
483
+ ᆺ|ᄄ 482
484
+ ᄄ 483
485
+ ᆶ|ᄅ 484
486
+ ᄄ|ᅱ 485
487
+ ᄂ|ᅴ 486
488
+ ᅢ|ᆫ 487
489
+ ᅰ|ᆯ 488
490
+ ᅭ|ᄀ 489
491
+ ᅦ|ᆺ 490
492
+ ᅲ|ᄂ 491
493
+ ᅲ|ᄒ 492
494
+ ᆼ|ᄀ 493
495
+ ᆭ|ᄌ 494
496
+ ᆽ|ᄂ 495
497
+ ᆽ|ᄆ 496
498
+ ᆽ|ᄉ 497
499
+ ᅯ|ᄌ 498
500
+ ᆽ|ᄒ 499
501
+ ᄆ 500
502
+ ᄊ|ᅩ 501
503
+ ᆾ|ᄂ 502
504
+ ᆪ|ᄆ 503
505
+ ᆪ 504
506
+ ᅫ|ᆻ 505
507
+ ᅩ|ᆯ 506
508
+ ᄆ|ᅯ 507
509
+ ᄒ|ᅦ 508
510
+ ᅩ|ᄉ 509
511
+ ᄑ|ᅨ 510
512
+ ᇀ|ᄇ 511
513
+ ᇀ|ᄎ 512
514
+ ᆨ|ᄆ 513
515
+ ᆭ|ᄂ 514
516
+ ᅦ|ᆻ 515
517
+ ᅲ|ᆯ 516
518
+ ᄈ|ᅢ 517
519
+ ᆯ|ᄒ 518
520
+ ᆰ|ᄒ 519
521
+ ᆽ|ᄁ 520
522
+ ᅩ|ᆾ 521
523
+ ᅦ|ᄒ 522
524
+ ᅱ|ᄒ 523
525
+ ᄉ|ᅫ 524
526
+ ᅬ|ᄃ 525
527
+ ᄈ|ᅣ 526
528
+ ᆫ|ᄁ 527
529
+ ᅩ|ᄋ 528
530
+ ᄈ 529
531
+ ᄈ|ᅭ 530
532
+ ᅵ|ᆨ 531
533
+ ᅪ|ᆼ 532
534
+ ᅨ|ᄃ 533
535
+ ᅢ|ᆺ 534
536
+ ᄏ|ᅯ 535
537
+ ᅯ|ᆯ 536
538
+ ᇁ|ᄌ 537
539
+ ᅡ|ᄂ 538
540
+ ᅥ|ᆮ 539
541
+ ᄊ|ᅧ 540
542
+ ᄎ|ᅪ 541
543
+ ᆫ|ᄏ 542
544
+ ᅳ|ᄇ 543
545
+ ᇁ|ᄆ 544
546
+ ᇁ|ᄉ 545
547
+ ᄋ|ᅤ 546
548
+ ᅥ|ᄂ 547
549
+ ᆯ|ᄐ 548
550
+ ᅧ|ᄀ 549
551
+ ᅰ|ᆫ 550
552
+ ᄑ|ᅲ 551
553
+ ᆵ|ᄃ 552
554
+ ᆵ|ᄌ 553
555
+ ᅴ 554
556
+ ᆫ|ᄉ 555
557
+ ᅮ|ᆺ 556
558
+ ᅡ|ᆺ 557
559
+ ᅨ|ᄋ 558
560
+ ᆯ|ᄇ 559
561
+ ᅦ|ᄇ 560
562
+ ᆼ|ᄆ 561
563
+ ᅥ|ᄇ 562
564
+ ᄑ 563
565
+ ᅪ|ᆨ 564
566
+ ᅥ|ᄒ 565
567
+ ᅨ|ᄌ 566
568
+ ᆺ|ᄅ 567
569
+ ᄒ|ᅱ 568
570
+ ᆵ 569
g2p/korean_mfa/meta.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"version": "3.2.0", "architecture": "phonetisaurus", "train_date": "2026-02-18 13:58:19.082128", "phones": ["t\u0255\u02b0", "u", "\u028e\u02d0", "d\u02b2", "\u0261", "m\u02b2", "m\u02b2\u02d0", "t\u0255\u02b7\u02d0", "\u03b2\u02b7", "p\u02b2", "\u0255\u02b0", "\u0266", "t\u02b7", "\u025b\u02d0", "p\u02b0\u02d0", "\u014b", "c\u02b0", "\u0263", "k\u0348\u02b7", "p\u0348\u02b7", "m", "p\u0348", "p\u031a", "\u029d", "\u0250", "t\u02b0\u02d0", "\u026d", "k\u0348", "\u0272", "t\u02b2", "k\u02b7", "k\u02b7\u02d0", "t\u02b0", "\u00e7", "s\u0348\u02d0", "d\u0291", "t\u0255\u0348\u02b7", "s", "c", "b\u02b2", "\u028c", "k\u02b0", "s\u0348\u02b7", "\u0268\u02d0", "\u027e\u02b2", "\u026d\u02d0", "k\u02b0\u02d0", "\u0278\u02b7", "\u027e\u02b7", "n", "p\u02b2\u02d0", "s\u02b0", "t\u0255\u02b0\u02d0", "o\u02d0", "i", "c\u02b0\u02d0", "j", "\u028c\u02d0", "k", "o", "\u0268", "p\u02b7", "t\u0255\u02d0", "t\u02b7\u02d0", "t\u0255\u0348\u02d0", "p\u0348\u02d0", "e", "\u0270", "t\u0255\u0348", "t\u0348", "t", "u\u02d0", "k\u031a", "e\u02d0", "t\u0255", "\u0261\u02b7", "\u025f", "t\u0348\u02b2", "p\u0348\u02b2", "t\u031a", "\u028e", "t\u0255\u02b7", "\u0265", "\u03b2", "b\u02b7", "x", "i\u02d0", "p", "m\u02d0", "w", "\u025b", "d", "s\u02b7", "s\u02d0", "\u0255\u0348", "b", "n\u02d0", "p\u02b0", "s\u0348", "\u027e", "s\u02b0\u02d0", "h", "d\u0291\u02b7", "t\u0348\u02d0", "k\u0348\u02d0", "d\u02b7", "c\u0348"], "graphemes": ["\u116c", "\u11ac", "\u1110", "\u1171", "\u1163", "\u1166", "\u11c1", "\u11ad", "\u11c2", "\u11ba", "\u116d", "\u110b", "\u116a", "\u11b4", "\u1174", "\u11b2", "\u1106", "\u1169", "\u11b5", "\u1107", "\u110a", "\u110d", "\u1167", "\u1101", "\u110f", "\u1109", "\u1168", "\u11bc", "\u116b", "\u1173", "\u11b9", "\u11bd", "\u11be", "\u11af", "\u1165", "\u11ae", "\u1164", "\u11a8", "\u1100", "\u11ab", "\u1162", "\u1103", "\u11bb", "\u1104", "\u11b0", "\u1102", "\u11b7", "\u116e", "\u11a9", "\u11bf", "\u110e", "\u1161", "\u1170", "\u11b1", "\u1108", "\u1172", "\u11c0", "\u11aa", "\u1111", "\u110c", "\u1112", "\u11b8", "\u1105", "\u11b6", "\u116f", "\u1175"], "grapheme_order": 2, "phone_order": 2, "sequence_separator": "|", "unicode_decomposition": true, "evaluation": {"num_words": 5348, "word_error_rate": null, "phone_error_rate": null}, "training": {"num_words": 48032, "num_graphemes": 66, "num_phones": 107}}
g2p/korean_mfa/model.fst ADDED

Git LFS Details

  • SHA256: f9c7ad663bae63a324aa22a5cec2344b0fd29b71f06d246b4126851aac24434c
  • Pointer size: 133 Bytes
  • Size of remote file: 17.5 MB
g2p/korean_mfa/phones.sym ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ n 1
3
+ d 2
4
+ ɐ 3
5
+ e 4
6
+ ɲ 5
7
+ dʑ 6
8
+ i 7
9
+ ɭ 8
10
+ k 9
11
+ ʎ 10
12
+ tɕ 11
13
+ m 12
14
+ k͈ 13
15
+ p 14
16
+ ɕ͈ 15
17
+ sʰ 16
18
+ u 17
19
+ o 18
20
+ ɡ 19
21
+ ʌ 20
22
+ ɾ 21
23
+ ɨ 22
24
+ ŋ 23
25
+ tɕʷ 24
26
+ kʰ 25
27
+ ɦ 26
28
+ ɛː 27
29
+ tʰ 28
30
+ ɾʲ 29
31
+ ɟ 30
32
+ ɕʰ 31
33
+ tʰː 32
34
+ sː 33
35
+ k̚ 34
36
+ ɰ 35
37
+ pʰ 36
38
+ tɕʰ 37
39
+ bʲ 38
40
+ h 39
41
+ ɡʷ 40
42
+ p̚ 41
43
+ w 42
44
+ t̚ 43
45
+ k͈ʷ 44
46
+ ʝ 45
47
+ j 46
48
+ dʑʷ 47
49
+ s 48
50
+ mʲ 49
51
+ s͈ 50
52
+ ʎː 51
53
+ dʷ 52
54
+ kʰː 53
55
+ t͈ 54
56
+ t 55
57
+ p͈ 56
58
+ cʰ 57
59
+ ɭː 58
60
+ ʌː 59
61
+ mː 60
62
+ b 61
63
+ cʰː 62
64
+ p͈ʲ 63
65
+ βʷ 64
66
+ ɛ 65
67
+ tʷ 66
68
+ ɥ 67
69
+ ɨː 68
70
+ kʷː 69
71
+ k͈ː 70
72
+ tɕ͈ 71
73
+ pʲ 72
74
+ β 73
75
+ c 74
76
+ nː 75
77
+ dʲ 76
78
+ pʷ 77
79
+ tɕʷː 78
80
+ kʷ 79
81
+ mʲː 80
82
+ eː 81
83
+ pʰː 82
84
+ tɕʰː 83
85
+ oː 84
86
+ c͈ 85
87
+ iː 86
88
+ tʲ 87
89
+ tɕː 88
90
+ p͈ː 89
91
+ ɣ 90
92
+ ɾʷ 91
93
+ bʷ 92
94
+ uː 93
95
+ sʷ 94
96
+ t͈ʲ 95
97
+ sʰː 96
98
+ pʲː 97
99
+ tʷː 98
100
+ tɕ͈ː 99
101
+ t͈ː 100
102
+ ɸʷ 101
103
+ s͈ː 102
104
+ s͈ʷ 103
105
+ ç 104
106
+ x 105