|
|
--- |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- phi3 |
|
|
- LLM |
|
|
library_name: transformers |
|
|
--- |
|
|
# Phi 3 Model with Extended Vocabulary and Fine-Tuning for Japanese |
|
|
|
|
|
## Overview |
|
|
|
|
|
This project is a proof of concept that extends the base vocabulary of the Phi 3 model and then applies supervised fine-tuning to teach it a new language (Japanese). Despite using a very small custom dataset, the improvement in Japanese language understanding is substantial. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Phi 3 |
|
|
- **Objective**: Extend the base vocabulary and fine-tune for Japanese language understanding. |
|
|
- **Dataset**: Custom dataset of 1,000 entries generated using ChatGPT-4. |
|
|
- **Language**: Japanese |
|
|
|
|
|
## Dataset |
|
|
|
|
|
The dataset used for this project was generated with the assistance of ChatGPT-4. It comprises 1,000 entries, carefully curated to cover a diverse range of topics and linguistic structures. |
|
|
|
|
|
## Training |
|
|
|
|
|
### Vocabulary Extension |
|
|
|
|
|
The base vocabulary of the Phi 3 model was extended to include new Japanese tokens. This was a crucial step to enable the model to comprehend and generate Japanese text more effectively. |
|
|
|
|
|
### Fine-Tuning |
|
|
|
|
|
Supervised fine-tuning was performed on the extended model using the custom dataset. Despite the small dataset size, the model showed significant improvement in understanding and generating Japanese text. |
|
|
|
|
|
## Results |
|
|
|
|
|
Even with the limited dataset and vocabulary size, the fine-tuned model demonstrated substantial improvements over the base model in terms of Japanese language understanding and generation. |
|
|
|
|
|
## Future Work |
|
|
|
|
|
1. **Dataset Expansion**: Increase the size and diversity of the dataset to further enhance model performance. |
|
|
2. **Evaluation**: Conduct comprehensive evaluation and benchmarking against standard Japanese language tasks. |
|
|
3. **Optimization**: Optimize the model for better performance and efficiency. |