Experience on finetuning for specific language

#13
by Huy227 - opened

Hi, i've tested the model performed well for almost of my cases. However, there still small issue with tone marks. My language is Vietnamese - is based on the Latin alphabet. Model sometime will confuse example "Phường" -> "Phòng", "Tỉnh" -> "Tinh" so i want to enhance performance on Vietnamese but also prevent catastrophic forgetting. I've checked the finetuning notebook and i think i should freeze the vision weights and unfreeze language weights for my case maybe i should LoRA instead of full fine-tuning to adapt correct tone mark? And how many sample will be a good starting point? I should collect pdf files with diverse structure to maintain the generic of model?

LightOn AI org

Hi,
Yes, i think finetuning makes a lot of sense in this case!

Hi, i've tested the model performed well for almost of my cases. However, there still small issue with tone marks. My language is Vietnamese - is based on the Latin alphabet. Model sometime will confuse example "Phường" -> "Phòng", "Tỉnh" -> "Tinh" so i want to enhance performance on Vietnamese but also prevent catastrophic forgetting. I've checked the finetuning notebook and i think i should freeze the vision weights and unfreeze language weights for my case maybe i should LoRA instead of full fine-tuning to adapt correct tone mark? And how many sample will be a good starting point? I should collect pdf files with diverse structure to maintain the generic of model?

Yes! Freeze vision weights and train only on language weights with LoRA, works like magic! Tested.

Sign up or log in to comment