# LLM auto annotation for HICO-DET dataset (Pose from [Halpe](https://github.com/Fang-Haoshu/Halpe-FullBody), Part State from [HAKE](https://github.com/DirtyHarryLYL/HAKE)). ## Environment The code is developed using python 3.11.11 on Ubuntu 21.xx with torch==2.6.0+cu124, transformers==4.57.3 (with Qwen3 series) ## Quick start ### Installation 1. Install required packges and dependencies. 2. Clone this repo, and we'll call the directory that you cloned as ${ROOT}. 3. Creat necessary directories: ``` mkdir outputs mkdir model_weights ``` 4. Download LLM's weights into model_weights from hugging face. ### Prepare Dataset 5. Install COCO API: ``` pip install pycocotools ``` 6. Download [dataset](https://huggingface.co/datasets/ayh015/HICO-Det_Halpe_HAKE). 7. Organize dataset, your directory tree of dataset should look like this (there maybe extra files.): ``` {DATA_ROOT} |-- Annotation | |--hico-det-instance-level | | |--hico-det-training-set-instance-level.json | `--hico-fullbody-pose | |--halpe_train_v1.json | `--halpe_val_v1.json |── Configs | |--hico_hoi_list.txt | `--Part_State_76.txt |── Images | |--images | |--test2015 | | |--HICO_test2015_00000001.jpg | | |--HICO_test2015_00000002.jpg | | ... | `--train2015 | |--HICO_train2015_00000001.jpg | |--HICO_train2015_00000002.jpg | ... `── Logic_Rules |--gather_rule.pkl `--read_rules.py ``` ### Start annotation #### Modify the data_path, model_path, output_dir='outputs' by your configuration in "{ROOT}/scripts/annotate.sh". ``` IDX={YOUR_GPU_IDS} export PYTHONPATH=$PYTHONPATH:./ data_path={DATA_ROOT} model_path={ROOT}/model_weights/{YOUR_MODEL_NAME} output_dir={ROOT}/outputs if [ -d ${output_dir} ];then echo "dir already exists" else mkdir ${output_dir} fi CUDA_VISIBLE_DEVICES=$IDX OMP_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node={NUM_YOUR_GPUs} --master_port=25005 \ tools/annotate.py \ --model-path ${model_path} \ --data-path ${data_path} \ --output-dir ${output_dir} \ ``` #### Start auto-annotation ``` bash scripts/annotate.sh ``` ## Annotation format A list of dict that contains the following keys: ``` { 'file_name': 'HICO_train2015_00009511.jpg', 'image_id': 0, 'keypoints': a 51-elements list (17x3 keypoints with x, y, v), 'vis': a 51-elements list (17 keypionts, each has 3 visiblity flags), 'instance_id':0, 'action_labels': [{'human_part': part_id, 'partstate': state_id}, ...], 'height': 640, 'width': 480, 'human_bbox': [126, 258, 150, 305], 'object_bbox': [128, 276, 144, 313], 'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal." } ```