chaubeyG
/

FaceLLaVA

PyTorch

English

llava

Model card Files Files and versions

xet

Community

chaubeyG commited on Oct 22, 2025

Commit

dd1661d

verified ·

1 Parent(s): b83ca73

Update README.md

Browse files

Files changed (1) hide show

README.md +48 -10

README.md CHANGED Viewed

@@ -9,6 +9,8 @@ base_model:
 [![arXiv](https://img.shields.io/badge/arXiv-2504.07198-b31b1b.svg)](https://arxiv.org/abs/2504.07198)
 [![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
 This is the official released weights of of the **WACV 2026 Round 1** Early Accept paper (6.4% acceptance rate) - Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning. Please refer to the [official github repository](https://github.com/ihp-lab/face-llava) for instructions to run inference.
@@ -21,6 +23,11 @@ The human face plays a central role in social communication, necessitating the u
 ---
 ## 📦 Repository Structure
 ```bash
@@ -78,26 +85,57 @@ The human face plays a central role in social communication, necessitating the u
 ## 🎯 Inference
-1. Download the model weights from [here (Use USC Email)](https://drive.google.com/file/d/1TAZE70WlqY1rQJIzdJ9x7P7IopyYSlfk/view?usp=sharing) and unzip them inside a `checkpoints/` folder so that the structure becomes - `./checkpoints/facellava-7b-wolm`.
-2. ***Make sure that the input video or image is already face-cropped as the current version does not support automatic cropping.***
 3. Run the following command for inference.
     ```bash
-    CUDA_VISIBLE_DEVICES=0 python inference.py --model_path="./checkpoints/facellava-7b-wolm" \
     --file_path="./assets/demo_inputs/face_attr_example_1.png" --prompt="What are the facial attributes in the given image?"
     ```
-4. Currently the following face perception tasks are supported along with the best modality suited for that task - Emotion(Video), Age(Image), Facial Attributes(Image), Facial Action Units(Image)
 5. A list of prompts that work well for different tasks is present in `./assets/good_prompts`.
 ### ✅ Repository Progress
-- [ ] Training Script
-- [ ] Evaluation Metrics
-- [ ] Dataset Release & Preprocessing Code
-- [ ] Inference Code (with Landmarks & Auto Face Cropping)
-- [x] Inference Code (Basic)
-- [x] Model Weights (w/o Landmarks)

 [![arXiv](https://img.shields.io/badge/arXiv-2504.07198-b31b1b.svg)](https://arxiv.org/abs/2504.07198)
+[![Model Weights](https://img.shields.io/badge/%F0%9F%A4%97%20Weights-FaceLLaVA-orange)](https://huggingface.co/chaubeyG/FaceLLaVA)
+[![License](https://img.shields.io/badge/license-USC%20Research-green)](LICENSE.rst)
 [![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
 This is the official released weights of of the **WACV 2026 Round 1** Early Accept paper (6.4% acceptance rate) - Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning. Please refer to the [official github repository](https://github.com/ihp-lab/face-llava) for instructions to run inference.
 ---
+## 📣 News
+- [Oct. 2025] Initial release of official codebase and model wirghts. Stay tuned for more details and the dataset.
+- [Sept. 2025] FaceLLaVA accepted in the first round of WACV 2026 (6.4% acceptance rate). See you in Tucson!
 ## 📦 Repository Structure
 ```bash
 ## 🎯 Inference
+1. Download the model weights from [huggingface](https://huggingface.co/chaubeyG/FaceLLaVA) inside `checkpoints/` folder so that the structure becomes - `./checkpoints/FaceLLaVA`.
+2. Crop the input image/video using `tools/crop_face.py` before further processing.
+    Use the following command to crop an image
+    ```python
+    python crop_face.py \
+        --mode image \
+        --image_path "/path/to/input.jpg" \
+        --output_image_path "/path/to/output_cropped.jpg"
+    ```
+    Use the following command to crop a video
+    ```python
+    python crop_face.py \
+        --mode video \
+        --video_path "/path/to/input/video.mp4" \
+        --output_video_path "/path/to/output/cropped_video.mp4" \
+        --temp_dir "/path/to/temp"
+    ```
 3. Run the following command for inference.
     ```bash
+    CUDA_VISIBLE_DEVICES=0 python inference.py --model_path="./checkpoints/FaceLLaVA" \
     --file_path="./assets/demo_inputs/face_attr_example_1.png" --prompt="What are the facial attributes in the given image?"
     ```
+4. **Currently the following face perception tasks are supported along with the best modality suited for that task - Emotion(Video), Age(Image), Facial Attributes(Image), Facial Action Units(Image)**
 5. A list of prompts that work well for different tasks is present in `./assets/good_prompts`.
 ### ✅ Repository Progress
+- [ ] Dataset Release
+- [x] Training Script
+- [x] Inference Code
+- [x] Model Weights
+## ⚖️ License
+This code is distributed under the USC Research license. See [LICENSE.rst](LICENSE.rst) for more details.
+## 🪶 Citation
+```latex
+@article{chaubey2025face,
+  title={Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning},
+  author={Chaubey, Ashutosh and Guan, Xulang and Soleymani, Mohammad},
+  journal={arXiv preprint arXiv:2504.07198},
+  year={2025}
+}
+```