| ## Model Performance Evaluation (`evaluate.py`) | |
| This script is used to evaluate the model's performance on a test set. It can operate in two modes: | |
| - **`pair`**: Calculates pairwise accuracy. | |
| - **`ranking`**: Calculates ranking accuracy. | |
| **Pair-wise Sample** | |
| We set path1's image is better than path2's image for simplicity. | |
| ```json | |
| [ | |
| { | |
| "prompt": ".....", | |
| "path1": ".....", | |
| "path2": "....." | |
| }, | |
| { | |
| "prompt": ".....", | |
| "path1": ".....", | |
| "path2": "....." | |
| }, | |
| ... | |
| ] | |
| ``` | |
| **Rank-wise Sample** | |
| ```json | |
| [ | |
| { | |
| "id": "005658-0040", | |
| "prompt": ".....", | |
| "generations": [ | |
| "path to image1", | |
| "path to image2", | |
| "path to image3", | |
| "path to image4" | |
| ], | |
| "ranking": [ | |
| 1, | |
| 2, | |
| 5, | |
| 3 | |
| ] | |
| }, | |
| ... | |
| ] | |
| ``` | |
| ### Usage | |
| ```bash | |
| python evaluate/evaluate.py \ | |
| --test_json /path/to/your/test_data.json \ | |
| --config_path config/HPSv3_7B.yaml \ | |
| --checkpoint_path checkpoints/HPSv3_7B/model.pth \ | |
| --mode pair \ | |
| --batch_size 8 \ | |
| --num_processes 8 | |
| ``` | |
| **Arguments:** | |
| - `--test_json`: (Required) Path to the JSON file containing evaluation data. | |
| - `--config_path`: (Required) Path to the model's configuration file. | |
| - `--checkpoint_path`: (Required) Path to the model checkpoint. | |
| - `--mode`: The evaluation mode. Can be `pair` or `ranking`. (Default: `pair`) | |
| - `--batch_size`: Batch size for inference. (Default: 8) | |
| - `--num_processes`: Number of parallel processes to use. (Default: 8) | |
| --- | |
| ## Reward Benchmarking (`benchmark.py`) | |
| This script is used to run inference with a reward model over one or more folders of images. It calculates a reward score for each image based on its corresponding text prompt (expected in a `.txt` file with the same name). The script then outputs statistics (mean, std, min, max) for each folder and saves the detailed results to a JSON file. | |
| It supports multiple reward models through the `--model_type` argument. | |
| ### Usage | |
| The script is run using `argparse`. Below is a command-line example: | |
| ```bash | |
| python evaluate/benchmark.py \ | |
| --config_path config/HPSv3_7B.yaml \ | |
| --checkpoint_path checkpoints/HPSv3_7B/model.pth \ | |
| --model_type hpsv3 \ | |
| --image_folders /path/to/images/folder1 /path/to/images/folder2 \ | |
| --output_path ./benchmark_results.json \ | |
| --batch_size 16 \ | |
| --num_processes 8 | |
| ``` | |
| **Arguments:** | |
| - `--config_path`: (Required) Path to the model's configuration file. | |
| - `--checkpoint_path`: (Required) Path to the model checkpoint. | |
| - `--model_type`: The reward model to use. Choices: `hpsv3`, `hpsv2`, `imagereward`. (Default: `hpsv3`) | |
| - `--image_folders`: (Required) One or more paths to folders containing the images to benchmark. | |
| - `--output_path`: (Required) Path to save the output JSON file with results. | |
| - `--batch_size`: Batch size for processing. (Default: 16) | |
| - `--num_processes`: Number of parallel processes to use. (Default: 8) | |
| - `--num_machines`: For distributed inference, the total number of machines. (Default: 1) | |
| - `--machine_id`: For distributed inference, the ID of the current machine. (Default: 0) | |