HPSv3 / evaluate /README.md
sdsdgwe's picture
update
9b57ce7
## Model Performance Evaluation (`evaluate.py`)
This script is used to evaluate the model's performance on a test set. It can operate in two modes:
- **`pair`**: Calculates pairwise accuracy.
- **`ranking`**: Calculates ranking accuracy.
**Pair-wise Sample**
We set path1's image is better than path2's image for simplicity.
```json
[
{
"prompt": ".....",
"path1": ".....",
"path2": "....."
},
{
"prompt": ".....",
"path1": ".....",
"path2": "....."
},
...
]
```
**Rank-wise Sample**
```json
[
{
"id": "005658-0040",
"prompt": ".....",
"generations": [
"path to image1",
"path to image2",
"path to image3",
"path to image4"
],
"ranking": [
1,
2,
5,
3
]
},
...
]
```
### Usage
```bash
python evaluate/evaluate.py \
--test_json /path/to/your/test_data.json \
--config_path config/HPSv3_7B.yaml \
--checkpoint_path checkpoints/HPSv3_7B/model.pth \
--mode pair \
--batch_size 8 \
--num_processes 8
```
**Arguments:**
- `--test_json`: (Required) Path to the JSON file containing evaluation data.
- `--config_path`: (Required) Path to the model's configuration file.
- `--checkpoint_path`: (Required) Path to the model checkpoint.
- `--mode`: The evaluation mode. Can be `pair` or `ranking`. (Default: `pair`)
- `--batch_size`: Batch size for inference. (Default: 8)
- `--num_processes`: Number of parallel processes to use. (Default: 8)
---
## Reward Benchmarking (`benchmark.py`)
This script is used to run inference with a reward model over one or more folders of images. It calculates a reward score for each image based on its corresponding text prompt (expected in a `.txt` file with the same name). The script then outputs statistics (mean, std, min, max) for each folder and saves the detailed results to a JSON file.
It supports multiple reward models through the `--model_type` argument.
### Usage
The script is run using `argparse`. Below is a command-line example:
```bash
python evaluate/benchmark.py \
--config_path config/HPSv3_7B.yaml \
--checkpoint_path checkpoints/HPSv3_7B/model.pth \
--model_type hpsv3 \
--image_folders /path/to/images/folder1 /path/to/images/folder2 \
--output_path ./benchmark_results.json \
--batch_size 16 \
--num_processes 8
```
**Arguments:**
- `--config_path`: (Required) Path to the model's configuration file.
- `--checkpoint_path`: (Required) Path to the model checkpoint.
- `--model_type`: The reward model to use. Choices: `hpsv3`, `hpsv2`, `imagereward`. (Default: `hpsv3`)
- `--image_folders`: (Required) One or more paths to folders containing the images to benchmark.
- `--output_path`: (Required) Path to save the output JSON file with results.
- `--batch_size`: Batch size for processing. (Default: 16)
- `--num_processes`: Number of parallel processes to use. (Default: 8)
- `--num_machines`: For distributed inference, the total number of machines. (Default: 1)
- `--machine_id`: For distributed inference, the ID of the current machine. (Default: 0)