ValerianFourel
/

FaceVLM

Model card Files Files and versions

FaceVLM / README.md

ValerianFourel's picture

Upload folder using huggingface_hub

0f8467e verified about 1 year ago

|

history blame contribute delete

2.43 kB

	# MiniGPTv2 Project

	### Overview
	MiniGPTv2 is a multimodal large language model that combines vision and language capabilities. This repository contains the implementation of MiniGPTv2 fine-tuned on facial emotion recognition and detailed image understanding tasks.

	Model Architecture
	Base Architecture: MiniGPTv2
	LLM Backbone: Llama-2-7b-chat
	Image Size: 448×448
	Max Text Length: 3072 tokens
	LoRA Configuration: r=64, alpha=16
	Gradient Checkpointing: Enabled for the vision encoder, disabled for LLM
	Training Configuration
	Training Checkpoint: Epoch 88 (56,320 steps)
	Steps per Epoch: 640
	Batch Size: 1 (with gradient accumulation)
	Gradient Accumulation Steps: 16
	Learning Rate:
	Initial: 3e-5
	Minimum: 1e-6
	Warmup: 1e-6
	LR Schedule: Linear warmup with cosine decay
	Warmup Steps: 1000
	Weight Decay: 0.05
	Mixed Precision Training: Enabled
	Dataset Composition
	The model was trained on a mixture of datasets with the following sampling ratios:

	ShareGPT Detail: 30%
	General visual conversation data
	GPT4Vision Face Detail: 10%
	Facial analysis and description data
	Realistic Emotions Detail: 20%
	Emotion recognition and interpretation data
	Usage
	Requirements


	Text Only
	torch>=2.0.0
	transformers>=4.28.0
	timm
	fairscale
	accelerate
	Loading the Model


	Python
	from minigptv2.model import MiniGPTv2

	# Initialize the model
	model = MiniGPTv2.from_pretrained(
	llama_model_path="/path/to/Llama-2-7b-chat-hf",
	checkpoint_path="/path/to/minigptv2_checkpoint.pth",
	image_size=448,
	max_txt_len=3072
	)

	# Set to evaluation mode
	model.eval()
	Inference


	Python
	from PIL import Image
	import torch

	# Load image
	image = Image.open("example.jpg").convert("RGB")

	# Process input
	response = model.generate(
	image=image,
	prompt="What emotions is this person expressing?",
	max_new_tokens=512
	)

	print(response)
	Training
	To continue training from the epoch 88 checkpoint:



	Bash
	python train.py --config /path/to/config.yaml --resume_ckpt_path /path/to/epoch88_checkpoint.pth
	Evaluation


	Bash
	python evaluate.py --config /path/to/eval_config.yaml --checkpoint /path/to/epoch88_checkpoint.pth
	License
	[Specify license information]

	Citation


	Text Only
	[Citation information for MiniGPTv2 and any relevant papers]
	Acknowledgements
	This project builds upon the MiniGPTv2 architecture and utilizes the Llama-2-7b-chat model. We thank the original authors for their contributions to the field.

	---
	license: apache-2.0
	---