A newer version of the Gradio SDK is available:
6.1.0
Model Performance Evaluation (evaluate.py)
This script is used to evaluate the model's performance on a test set. It can operate in two modes:
pair: Calculates pairwise accuracy.ranking: Calculates ranking accuracy.
Pair-wise Sample
We set path1's image is better than path2's image for simplicity.
[
{
"prompt": ".....",
"path1": ".....",
"path2": "....."
},
{
"prompt": ".....",
"path1": ".....",
"path2": "....."
},
...
]
Rank-wise Sample
[
{
"id": "005658-0040",
"prompt": ".....",
"generations": [
"path to image1",
"path to image2",
"path to image3",
"path to image4"
],
"ranking": [
1,
2,
5,
3
]
},
...
]
Usage
python evaluate/evaluate.py \
--test_json /path/to/your/test_data.json \
--config_path config/HPSv3_7B.yaml \
--checkpoint_path checkpoints/HPSv3_7B/model.pth \
--mode pair \
--batch_size 8 \
--num_processes 8
Arguments:
--test_json: (Required) Path to the JSON file containing evaluation data.--config_path: (Required) Path to the model's configuration file.--checkpoint_path: (Required) Path to the model checkpoint.--mode: The evaluation mode. Can bepairorranking. (Default:pair)--batch_size: Batch size for inference. (Default: 8)--num_processes: Number of parallel processes to use. (Default: 8)
Reward Benchmarking (benchmark.py)
This script is used to run inference with a reward model over one or more folders of images. It calculates a reward score for each image based on its corresponding text prompt (expected in a .txt file with the same name). The script then outputs statistics (mean, std, min, max) for each folder and saves the detailed results to a JSON file.
It supports multiple reward models through the --model_type argument.
Usage
The script is run using argparse. Below is a command-line example:
python evaluate/benchmark.py \
--config_path config/HPSv3_7B.yaml \
--checkpoint_path checkpoints/HPSv3_7B/model.pth \
--model_type hpsv3 \
--image_folders /path/to/images/folder1 /path/to/images/folder2 \
--output_path ./benchmark_results.json \
--batch_size 16 \
--num_processes 8
Arguments:
--config_path: (Required) Path to the model's configuration file.--checkpoint_path: (Required) Path to the model checkpoint.--model_type: The reward model to use. Choices:hpsv3,hpsv2,imagereward. (Default:hpsv3)--image_folders: (Required) One or more paths to folders containing the images to benchmark.--output_path: (Required) Path to save the output JSON file with results.--batch_size: Batch size for processing. (Default: 16)--num_processes: Number of parallel processes to use. (Default: 8)--num_machines: For distributed inference, the total number of machines. (Default: 1)--machine_id: For distributed inference, the ID of the current machine. (Default: 0)