Spaces:
Runtime error
Runtime error
| # Finetune TinyLLaVA with Custom Datasets | |
| Given the needs of finetuning with custom datasets, we provide a tutorial on how to custom finetune on our trained model, e.g. tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B (HF path). | |
| ## Dataset Format | |
| Convert your data to a JSON file of a List of all samples. Sample metadata should contain `id` (a unique identifier), `image` (the path to the image), and `conversations` (the conversation data between human and AI). | |
| Here's an example of the [pokemon dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) turned into the data format: | |
| ```json | |
| [ | |
| { | |
| "id": "meiKqU2auAVK2vrtLhKGoJ", | |
| "image": "pokemon/image/meiKqU2auAVK2vrtLhKGoJ.jpg", | |
| "conversations": [ | |
| { | |
| "from": "human", | |
| "value": "<image>\nProvide a brief description of the given image." | |
| }, | |
| { | |
| "from": "gpt", | |
| "value": "a drawing of a green pokemon with red eyes" | |
| } | |
| ] | |
| } | |
| ] | |
| ``` | |
| <details> | |
| You can use the following scripts to convert the Pokemon dataset to the above data format. | |
| <summary>converting data format</summary> | |
| ```python | |
| import shortuuid | |
| from datasets import load_dataset | |
| from PIL import Image | |
| import random | |
| import json | |
| import tqdm | |
| import os | |
| ds = load_dataset('lambdalabs/pokemon-blip-captions') | |
| pokemon_data = [] | |
| pokemon_image_path = '/path/to/your/data/pokemon/image' | |
| pokemon_data_path = '/path/to/your/pokemon_blip_captions.json' | |
| description_list = [ | |
| "Describe the image concisely.", | |
| "Provide a brief description of the given image.", | |
| "Offer a succinct explanation of the picture presented.", | |
| "Summarize the visual content of the image.", | |
| "Give a short and clear explanation of the subsequent image.", | |
| "Share a concise interpretation of the image provided.", | |
| "Present a compact description of the photo's key features.", | |
| "Relay a brief, clear account of the picture shown.", | |
| "Render a clear and concise summary of the photo.", | |
| "Write a terse but informative summary of the picture.", | |
| "Create a compact narrative representing the image presented." | |
| ] | |
| for sample in tqdm.tqdm(ds['train']): | |
| uuid = shortuuid.uuid() | |
| sample_dict = dict() | |
| sample_dict['id'] = uuid | |
| sample_dict['image'] = 'pokemon/image/' + uuid + '.jpg' | |
| sample['image'].save(os.path.join(pokemon_image_path, uuid + '.jpg')) | |
| conversations = [ | |
| {"from": "human", "value": "<image>\n" + random.choice(description_list)}, | |
| {"from": "gpt", "value": sample['text']} | |
| ] | |
| sample_dict['conversations'] = conversations | |
| pokemon_data.append(sample_dict) | |
| with open(pokemon_data_path, 'w') as f: | |
| json.dump(pokemon_data, f, indent=4) | |
| ``` | |
| </details> | |
| ## Custom Finetune | |
| After acquiring the dataset following the above data format, you can finetune our trained model TinyLLaVA-Phi-2-SigLIP-3.1B checkpoint by using lora. | |
| - Replace data paths and `output_dir` with yours in `scripts/train/custom_finetune.sh` | |
| - Adjust your GPU ids (localhost) and `per_device_train_batch_size` in `scripts/train/custom_finetune.sh`. | |
| ```bash | |
| bash scripts/train/custom_finetune.sh | |
| ``` | |
| ## Evaluation with Custom Finetuned Model | |
| All of the models trained by TinyLLaVA Factory have the same evaluation procedure, no matter it is trained through custom finetune or through normal training. Please see the [Evaluation](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html) section in our Doc. | |