FaceVLM / README.md
ValerianFourel's picture
Upload folder using huggingface_hub
0f8467e verified
# MiniGPTv2 Project
### Overview
MiniGPTv2 is a multimodal large language model that combines vision and language capabilities. This repository contains the implementation of MiniGPTv2 fine-tuned on facial emotion recognition and detailed image understanding tasks.
Model Architecture
Base Architecture: MiniGPTv2
LLM Backbone: Llama-2-7b-chat
Image Size: 448×448
Max Text Length: 3072 tokens
LoRA Configuration: r=64, alpha=16
Gradient Checkpointing: Enabled for the vision encoder, disabled for LLM
Training Configuration
Training Checkpoint: Epoch 88 (56,320 steps)
Steps per Epoch: 640
Batch Size: 1 (with gradient accumulation)
Gradient Accumulation Steps: 16
Learning Rate:
Initial: 3e-5
Minimum: 1e-6
Warmup: 1e-6
LR Schedule: Linear warmup with cosine decay
Warmup Steps: 1000
Weight Decay: 0.05
Mixed Precision Training: Enabled
Dataset Composition
The model was trained on a mixture of datasets with the following sampling ratios:
ShareGPT Detail: 30%
General visual conversation data
GPT4Vision Face Detail: 10%
Facial analysis and description data
Realistic Emotions Detail: 20%
Emotion recognition and interpretation data
Usage
Requirements
Text Only
torch>=2.0.0
transformers>=4.28.0
timm
fairscale
accelerate
Loading the Model
Python
from minigptv2.model import MiniGPTv2
# Initialize the model
model = MiniGPTv2.from_pretrained(
llama_model_path="/path/to/Llama-2-7b-chat-hf",
checkpoint_path="/path/to/minigptv2_checkpoint.pth",
image_size=448,
max_txt_len=3072
)
# Set to evaluation mode
model.eval()
Inference
Python
from PIL import Image
import torch
# Load image
image = Image.open("example.jpg").convert("RGB")
# Process input
response = model.generate(
image=image,
prompt="What emotions is this person expressing?",
max_new_tokens=512
)
print(response)
Training
To continue training from the epoch 88 checkpoint:
Bash
python train.py --config /path/to/config.yaml --resume_ckpt_path /path/to/epoch88_checkpoint.pth
Evaluation
Bash
python evaluate.py --config /path/to/eval_config.yaml --checkpoint /path/to/epoch88_checkpoint.pth
License
[Specify license information]
Citation
Text Only
[Citation information for MiniGPTv2 and any relevant papers]
Acknowledgements
This project builds upon the MiniGPTv2 architecture and utilizes the Llama-2-7b-chat model. We thank the original authors for their contributions to the field.
---
license: apache-2.0
---