| -------------------------------------------------------------------------------------------------------------------------------------------- | |
| #step1. preprocess | |
| 处理数据,构造音标词典和词替换规则 | |
| 1. open utils/IPA_sim_statistic_analysis.py | |
| 2. set statistic_conclusion_exist=False (row 96) | |
| 3. run , then generate IPA_lo_dict and IPA_th_dict | |
| 4. set statistic_conclusion_exist=True | |
| 5. run, then generate same_list | |
| -------------------------------------------------------------------------------------------------------------------------------------------- | |
| #step2. train | |
| 开始训练 | |
| 1. set params | |
| 2. run main.py | |
| --do_train | |
| --evaluate_during_training | |
| --train_data_file | |
| train.json | |
| --eval_data_file | |
| valid.json | |
| --test_data_file | |
| test_lo.json | |
| --data_dir | |
| data | |
| --lang | |
| replace_word_level | |
| --mode | |
| 73 | |
| --data_dir | |
| data | |
| --output_dir | |
| ./models | |
| --num_train_epochs | |
| 20 | |
| --max_seq_length | |
| 200 | |
| --gradient_accumulation_steps | |
| 1 | |
| --warmup_steps | |
| 500 | |
| --train_batch_size | |
| 32 | |
| --eval_batch_size | |
| 16 | |
| -------------------------------------------------------------------------------------------------------------------------------------------- | |
| #step3. test | |
| 测试 | |
| 1. set params | |
| 2. run main.py | |
| # test lo | |
| --do_test | |
| --test_data_file | |
| test_lo.json | |
| --data_dir | |
| data | |
| --lang | |
| replace_word_level | |
| --mode | |
| 73 | |
| --data_dir | |
| data | |
| --output_dir | |
| ./models | |
| --max_seq_length | |
| 200 | |
| --eval_batch_size 16 | |
| # test th | |
| --do_test | |
| --test_data_file | |
| test_th.json | |
| --data_dir | |
| data | |
| --lang | |
| replace_word_level | |
| --mode | |
| 73 | |
| --data_dir | |
| data | |
| --output_dir | |
| ./models | |
| --max_seq_length | |
| 200 | |
| --eval_batch_size 16 | |
| -------------------------------------------------------------------------------------------------------------------------------------------- |