| ## Regex to remove iou scores in `infer.json` | |
| ```json | |
| }, | |
| "logits": { | |
| "iou_scores": [ | |
| 0.95166015625, | |
| 0.94873046875, | |
| 0.82177734375 | |
| ] | |
| } | |
| ``` | |
| ```re | |
| ,\n\s*"logits": \{\n\s*"iou_scores":\s*\[\n\s*([\d.]+)\s*,\n\s*([\d.]+)\s*,\n\s*([\d.]+)\n\s*\]\n\s*\} | |
| ``` | |
| ## List of captioner models | |
| Salesforce/blip-image-captioning-large | |
| Salesforce/blip-image-captioning-base | |
| Salesforce/blip2-opt-2.7b | |
| Salesforce/blip2-opt-6.7b-coco | |
| Salesforce/blip2-opt-6.7b | |
| Salesforce/blip2-opt-2.7b-coco | |
| <!-- Need prompts --> | |
| <!-- Salesforce/instructblip-vicuna-7b --> | |
| <!-- Salesforce/instructblip-vicuna-13b --> | |
| microsoft/git-large-coco | |
| microsoft/git-large-textcaps | |
| microsoft/git-base | |
| microsoft/git-base-coco | |
| microsoft/git-base-textcaps | |
| microsoft/git-large | |
| microsoft/git-large-r | |
| microsoft/git-large-r-coco | |
| microsoft/git-large-r-textcaps | |
| <!-- No official code --> | |
| <!-- laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k --> | |
| <!-- laion/CoCa-ViT-B-32-laion2B-s13B-b90k --> | |
| <!-- laion/CoCa-ViT-L-14-laion2B-s13B-b90k --> | |
| <!-- laion/mscoco_finetuned_CoCa-ViT-B-32-laion2B-s13B-b90k --> | |
| ```shell | |
| for model in \ | |
| Salesforce/blip2-opt-2.7b \ | |
| Salesforce/blip2-opt-2.7b-coco \ | |
| Salesforce/blip2-opt-6.7b \ | |
| Salesforce/blip2-opt-6.7b-coco | |
| do | |
| python \ | |
| -m src.train \ | |
| train_data='[vg-densecap-local]' eval_data='[vg-densecap-local]' \ | |
| +model=base_sam_captioner \ | |
| training.do_train=False \ | |
| training.do_eval=False \ | |
| training.do_inference=True \ | |
| +data.streaming=False \ | |
| training.fp16=True \ | |
| training.output_dir=tmp/sam_captioner/$model \ | |
| training.dataloader_num_workers=4 \ | |
| model.captioner_model_name_or_path=$model | |
| done | |
| ``` | |
| ## The process of batch generation of language model | |
| `transformers/generation/utils.py:GenerationMixin:generate` | |
| ## Chunckified inference | |
| Regional chunk size is set to 16 | |
| | SAM Model | Captioner | fp16 | region chunk size | Memory (GB) | Speed (s/it) | | |
| | --------- | -------------------------------------- | ---- | ----------------- | ----------- | ------------ | | |
| | ViT-huge | Salesforce/blip-image-captioning-base | Yes | 16 | ~ 9 | ~ 5.02 | | |
| | ViT-huge | Salesforce/blip-image-captioning-base | No | 16 | ~ 8 | ~ 8.29 | | |
| | ViT-huge | Salesforce/blip-image-captioning-large | Yes | 16 | ~ 10 | ~ 6.28 | | |
| | ViT-huge | Salesforce/blip-image-captioning-large | No | 16 | ~ 9.7 | ~ 14.99 | | |
| | ViT-huge | Salesforce/blip2-opt-2.7b | Yes | 16 | ~ 34 | ~ 5.82 | | |
| | ViT-huge | Salesforce/blip2-opt-2.7b | No | 16 | ~ 32 | ~ 18.19 | | |
| | ViT-huge | Salesforce/blip2-opt-2.7b | Yes | 4 | ~ 34 | ~ 11.56 | | |
| | ViT-huge | microsoft/git-large-coco | Yes | 16 | ~ 14 | ~ 7.06 | | |
| | ViT-huge | microsoft/git-base-coco | Yes | 16 | ~ 12 | ~ 3.26 | | |
| ## Bugs in SAM batch inference when transformers<=4.30.2 | |
| Remember to update the `requirements.txt` file. Otherwise we should always set batch_size=1. | |
| Here is the fixing pr which was merged already after version 4.30.2: https://github.com/huggingface/transformers/pull/25074 | |
| ## Debug the distributed training | |
| Inside the trainer, we can access the main process by: | |
| ```python | |
| if args.local_process_index == 0: | |
| breakpoint() | |
| torch.distributed.barrier() | |
| # the problematic line | |
| labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100) | |
| ``` | |
| `try-catch` does not trigger the pdb interface: | |
| ```python | |
| try: | |
| # the problematic line | |
| labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100) | |
| except Error as e: | |
| if args.local_process_index == 0: | |
| breakpoint() | |
| finally: | |
| torch.distributed.barrier() | |
| ``` | |
| ## Amulet T4 instance is maintained into wrong information about the number of GPUs | |
| Wrong T4 instance information is maintained by singularity, where `Standard_NC{4,8,16,32}as_T4_v3` only have 1, 1, 1, and 2 GPUs separately, but they are showed to have 1, 2, 4, and 4 GPUs separately. | |
| in `amlt/helpers/sing_instances.py`, we add the below code: | |
| ```python | |
| # add at 377, in amlt/helpers/sing_instances.py:fetch_instances_for_series | |
| # NOTE(xiaoke): Fix T4 wrong number of GPU | |
| if accelerator == "T4": | |
| instance_name_to_num_gpu = { | |
| "NC8as_T4_v3": ["2", "1"], | |
| "NC16as_T4_v3": ["4", "1"], | |
| "NC32as_T4_v3": ["4", "2"], | |
| } | |
| if instance_name in instance_name_to_num_gpu: | |
| description = description.replace(f"GPU x {instance_name_to_num_gpu[instance_name][0]}", f"GPU x {instance_name_to_num_gpu[instance_name][1]}") | |
| info = re.search(match, description) | |
| ``` | |
| Note that we need to print the instance out explicitly. Sometimes we fail to get 4 cards while only get 1 card. | |
| ```python | |
| # add at 422, amlt/client/sing_client.py:_setup_script_run_config | |
| print(f"instance: {job.sku.instance}, sku: {job.sku}") | |
| ``` | |
| How to check: | |
| ```shell | |
| amlt cache instance-types | |
| amlt cache instance-types -s NCast4v3 | |
| ``` | |
| ## Debug the commands generated by amlt | |
| ``` | |
| amlt show EXP JOB | |
| ``` | |
| (deprecated) | |
| ```python | |
| # /anaconda/envs/sca-v2/lib/python3.9/site-packages/amlt/client/aml_client.py:create_context | |
| # At the end of this function | |
| inspect_amlt_job_dir = "tmp/amlt_job/" | |
| try: | |
| print(f"Copy code from {code_resource.remote_dir} to {temp_dir}.") | |
| if os.path.exists(inspect_amlt_job_dir): | |
| shutil.rmtree(inspect_amlt_job_dir, ignore_errors=True) | |
| shutil.copytree(temp_dir, inspect_amlt_job_dir) | |
| except Exception as e: | |
| print(f"Cannot copy code from {temp_dir} to {inspect_amlt_job_dir} due to {e}") | |
| yield temp_dir | |
| ``` | |
| ## Test tokenizer | |
| ```python | |
| from transformers import AutoProcessor | |
| gpt2_large_tokenizer_cfg = dict( | |
| pretrained_model_name_or_path="gpt2-large", | |
| use_fast=True) | |
| openllama_tokenizer_cfg = dict( | |
| pretrained_model_name_or_path='openlm-research/open_llama_3b_v2', | |
| use_fast=False) | |
| def print_func(tokenizer, list_of_str): | |
| print(f"{list_of_str}: {tokenizer(list_of_str)['input_ids']}") | |
| tokenizer = AutoProcessor.from_pretrained(**gpt2_large_tokenizer_cfg) | |
| print_func(tokenizer, ["car", "Car", "CAR"]) | |
| print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"]) | |
| tokenizer = AutoProcessor.from_pretrained(**openllama_tokenizer_cfg) | |
| print_func(tokenizer, ["car", "Car", "CAR"]) | |
| print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"]) | |
| ``` |