CLTMPSE / readme.txt
KairongLiu's picture
Upload 6 files
5ad246d verified
--------------------------------------------------------------------------------------------------------------------------------------------
#step1. preprocess
处理数据,构造音标词典和词替换规则
1. open utils/IPA_sim_statistic_analysis.py
2. set statistic_conclusion_exist=False (row 96)
3. run , then generate IPA_lo_dict and IPA_th_dict
4. set statistic_conclusion_exist=True
5. run, then generate same_list
--------------------------------------------------------------------------------------------------------------------------------------------
#step2. train
开始训练
1. set params
2. run main.py
--do_train
--evaluate_during_training
--train_data_file
train.json
--eval_data_file
valid.json
--test_data_file
test_lo.json
--data_dir
data
--lang
replace_word_level
--mode
73
--data_dir
data
--output_dir
./models
--num_train_epochs
20
--max_seq_length
200
--gradient_accumulation_steps
1
--warmup_steps
500
--train_batch_size
32
--eval_batch_size
16
--------------------------------------------------------------------------------------------------------------------------------------------
#step3. test
测试
1. set params
2. run main.py
# test lo
--do_test
--test_data_file
test_lo.json
--data_dir
data
--lang
replace_word_level
--mode
73
--data_dir
data
--output_dir
./models
--max_seq_length
200
--eval_batch_size 16
# test th
--do_test
--test_data_file
test_th.json
--data_dir
data
--lang
replace_word_level
--mode
73
--data_dir
data
--output_dir
./models
--max_seq_length
200
--eval_batch_size 16
--------------------------------------------------------------------------------------------------------------------------------------------