HPSv3 / evaluate /README.md
sdsdgwe's picture
update
9b57ce7

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Model Performance Evaluation (evaluate.py)

This script is used to evaluate the model's performance on a test set. It can operate in two modes:

  • pair: Calculates pairwise accuracy.
  • ranking: Calculates ranking accuracy.

Pair-wise Sample

We set path1's image is better than path2's image for simplicity.

[
    {
        "prompt": ".....",
        "path1": ".....",
        "path2": "....."
    },
    {
        "prompt": ".....",
        "path1": ".....",
        "path2": "....."
    },
  ...
]

Rank-wise Sample

[
    {
        "id": "005658-0040",
        "prompt": ".....",
        "generations": [
            "path to image1",
            "path to image2",
            "path to image3",
            "path to image4"
        ],
        "ranking": [
            1,
            2,
            5,
            3
        ]
    },
  ...
]

Usage

python evaluate/evaluate.py \
  --test_json /path/to/your/test_data.json \
  --config_path config/HPSv3_7B.yaml \
  --checkpoint_path checkpoints/HPSv3_7B/model.pth \
  --mode pair \
  --batch_size 8 \
  --num_processes 8

Arguments:

  • --test_json: (Required) Path to the JSON file containing evaluation data.
  • --config_path: (Required) Path to the model's configuration file.
  • --checkpoint_path: (Required) Path to the model checkpoint.
  • --mode: The evaluation mode. Can be pair or ranking. (Default: pair)
  • --batch_size: Batch size for inference. (Default: 8)
  • --num_processes: Number of parallel processes to use. (Default: 8)

Reward Benchmarking (benchmark.py)

This script is used to run inference with a reward model over one or more folders of images. It calculates a reward score for each image based on its corresponding text prompt (expected in a .txt file with the same name). The script then outputs statistics (mean, std, min, max) for each folder and saves the detailed results to a JSON file.

It supports multiple reward models through the --model_type argument.

Usage

The script is run using argparse. Below is a command-line example:

python evaluate/benchmark.py \
  --config_path config/HPSv3_7B.yaml \
  --checkpoint_path checkpoints/HPSv3_7B/model.pth \
  --model_type hpsv3 \
  --image_folders /path/to/images/folder1 /path/to/images/folder2 \
  --output_path ./benchmark_results.json \
  --batch_size 16 \
  --num_processes 8

Arguments:

  • --config_path: (Required) Path to the model's configuration file.
  • --checkpoint_path: (Required) Path to the model checkpoint.
  • --model_type: The reward model to use. Choices: hpsv3, hpsv2, imagereward. (Default: hpsv3)
  • --image_folders: (Required) One or more paths to folders containing the images to benchmark.
  • --output_path: (Required) Path to save the output JSON file with results.
  • --batch_size: Batch size for processing. (Default: 16)
  • --num_processes: Number of parallel processes to use. (Default: 8)
  • --num_machines: For distributed inference, the total number of machines. (Default: 1)
  • --machine_id: For distributed inference, the ID of the current machine. (Default: 0)