Frinkles
/

Phi3JPInitial

Text Generation

text-generation-inference

Model card Files Files and versions

Phi3JPInitial / README.md

Frinkles's picture

Update README.md

8bb068f verified over 1 year ago

|

history blame contribute delete

1.85 kB

	---
	pipeline_tag: text-generation
	tags:
	- phi3
	- LLM
	library_name: transformers
	---
	# Phi 3 Model with Extended Vocabulary and Fine-Tuning for Japanese

	## Overview

	This project is a proof of concept that extends the base vocabulary of the Phi 3 model and then applies supervised fine-tuning to teach it a new language (Japanese). Despite using a very small custom dataset, the improvement in Japanese language understanding is substantial.

	## Model Details

	- Base Model: Phi 3
	- Objective: Extend the base vocabulary and fine-tune for Japanese language understanding.
	- Dataset: Custom dataset of 1,000 entries generated using ChatGPT-4.
	- Language: Japanese

	## Dataset

	The dataset used for this project was generated with the assistance of ChatGPT-4. It comprises 1,000 entries, carefully curated to cover a diverse range of topics and linguistic structures.

	## Training

	### Vocabulary Extension

	The base vocabulary of the Phi 3 model was extended to include new Japanese tokens. This was a crucial step to enable the model to comprehend and generate Japanese text more effectively.

	### Fine-Tuning

	Supervised fine-tuning was performed on the extended model using the custom dataset. Despite the small dataset size, the model showed significant improvement in understanding and generating Japanese text.

	## Results

	Even with the limited dataset and vocabulary size, the fine-tuned model demonstrated substantial improvements over the base model in terms of Japanese language understanding and generation.

	## Future Work

	1. Dataset Expansion: Increase the size and diversity of the dataset to further enhance model performance.
	2. Evaluation: Conduct comprehensive evaluation and benchmarking against standard Japanese language tasks.
	3. Optimization: Optimize the model for better performance and efficiency.