File size: 2,429 Bytes

0f8467e

# MiniGPTv2 Project

### Overview
MiniGPTv2 is a multimodal large language model that combines vision and language capabilities. This repository contains the implementation of MiniGPTv2 fine-tuned on facial emotion recognition and detailed image understanding tasks.

Model Architecture
Base Architecture: MiniGPTv2
LLM Backbone: Llama-2-7b-chat
Image Size: 448×448
Max Text Length: 3072 tokens
LoRA Configuration: r=64, alpha=16
Gradient Checkpointing: Enabled for the vision encoder, disabled for LLM
Training Configuration
Training Checkpoint: Epoch 88 (56,320 steps)
Steps per Epoch: 640
Batch Size: 1 (with gradient accumulation)
Gradient Accumulation Steps: 16
Learning Rate:
Initial: 3e-5
Minimum: 1e-6
Warmup: 1e-6
LR Schedule: Linear warmup with cosine decay
Warmup Steps: 1000
Weight Decay: 0.05
Mixed Precision Training: Enabled
Dataset Composition
The model was trained on a mixture of datasets with the following sampling ratios:

ShareGPT Detail: 30%
General visual conversation data
GPT4Vision Face Detail: 10%
Facial analysis and description data
Realistic Emotions Detail: 20%
Emotion recognition and interpretation data
Usage
Requirements


Text Only
torch>=2.0.0
transformers>=4.28.0
timm
fairscale
accelerate
Loading the Model


Python
from minigptv2.model import MiniGPTv2

# Initialize the model
model = MiniGPTv2.from_pretrained(
    llama_model_path="/path/to/Llama-2-7b-chat-hf",
    checkpoint_path="/path/to/minigptv2_checkpoint.pth",
    image_size=448,
    max_txt_len=3072
)

# Set to evaluation mode
model.eval()
Inference


Python
from PIL import Image
import torch

# Load image
image = Image.open("example.jpg").convert("RGB")

# Process input
response = model.generate(
    image=image,
    prompt="What emotions is this person expressing?",
    max_new_tokens=512
)

print(response)
Training
To continue training from the epoch 88 checkpoint:



Bash
python train.py --config /path/to/config.yaml --resume_ckpt_path /path/to/epoch88_checkpoint.pth
Evaluation


Bash
python evaluate.py --config /path/to/eval_config.yaml --checkpoint /path/to/epoch88_checkpoint.pth
License
[Specify license information]

Citation


Text Only
[Citation information for MiniGPTv2 and any relevant papers]
Acknowledgements
This project builds upon the MiniGPTv2 architecture and utilizes the Llama-2-7b-chat model. We thank the original authors for their contributions to the field.

---
license: apache-2.0
---