File size: 6,814 Bytes
002bd9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
## Regex to remove iou scores in `infer.json`

```json
        },
        "logits": {
            "iou_scores": [
                0.95166015625,
                0.94873046875,
                0.82177734375
            ]
        }
```

```re
,\n\s*"logits": \{\n\s*"iou_scores":\s*\[\n\s*([\d.]+)\s*,\n\s*([\d.]+)\s*,\n\s*([\d.]+)\n\s*\]\n\s*\}
```

## List of captioner models

Salesforce/blip-image-captioning-large
Salesforce/blip-image-captioning-base

Salesforce/blip2-opt-2.7b
Salesforce/blip2-opt-6.7b-coco
Salesforce/blip2-opt-6.7b
Salesforce/blip2-opt-2.7b-coco

<!-- Need prompts -->
<!-- Salesforce/instructblip-vicuna-7b -->
<!-- Salesforce/instructblip-vicuna-13b -->

microsoft/git-large-coco
microsoft/git-large-textcaps
microsoft/git-base
microsoft/git-base-coco
microsoft/git-base-textcaps
microsoft/git-large
microsoft/git-large-r
microsoft/git-large-r-coco
microsoft/git-large-r-textcaps

<!-- No official code -->
<!-- laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k -->
<!-- laion/CoCa-ViT-B-32-laion2B-s13B-b90k -->
<!-- laion/CoCa-ViT-L-14-laion2B-s13B-b90k -->
<!-- laion/mscoco_finetuned_CoCa-ViT-B-32-laion2B-s13B-b90k -->

```shell
for model in \
Salesforce/blip2-opt-2.7b \
Salesforce/blip2-opt-2.7b-coco \
Salesforce/blip2-opt-6.7b \
Salesforce/blip2-opt-6.7b-coco 
do
python \
    -m src.train \
    train_data='[vg-densecap-local]' eval_data='[vg-densecap-local]' \
    +model=base_sam_captioner \
    training.do_train=False \
    training.do_eval=False \
    training.do_inference=True \
    +data.streaming=False \
    training.fp16=True \
    training.output_dir=tmp/sam_captioner/$model \
    training.dataloader_num_workers=4 \
    model.captioner_model_name_or_path=$model
done
```

## The process of batch generation of language model

`transformers/generation/utils.py:GenerationMixin:generate`



## Chunckified inference

Regional chunk size is set to 16

| SAM Model | Captioner                              | fp16 | region chunk size | Memory (GB) | Speed (s/it) |
| --------- | -------------------------------------- | ---- | ----------------- | ----------- | ------------ |
| ViT-huge  | Salesforce/blip-image-captioning-base  | Yes  | 16                | ~ 9         | ~ 5.02       |
| ViT-huge  | Salesforce/blip-image-captioning-base  | No   | 16                | ~ 8         | ~ 8.29       |
| ViT-huge  | Salesforce/blip-image-captioning-large | Yes  | 16                | ~ 10        | ~ 6.28       |
| ViT-huge  | Salesforce/blip-image-captioning-large | No   | 16                | ~ 9.7       | ~ 14.99      |
| ViT-huge  | Salesforce/blip2-opt-2.7b              | Yes  | 16                | ~ 34        | ~ 5.82       |
| ViT-huge  | Salesforce/blip2-opt-2.7b              | No   | 16                | ~ 32        | ~ 18.19      |
| ViT-huge  | Salesforce/blip2-opt-2.7b              | Yes  | 4                 | ~ 34        | ~ 11.56      |
| ViT-huge  | microsoft/git-large-coco               | Yes  | 16                | ~ 14        | ~ 7.06       |
| ViT-huge  | microsoft/git-base-coco                | Yes  | 16                | ~ 12        | ~ 3.26       |

## Bugs in SAM batch inference when transformers<=4.30.2

Remember to update the `requirements.txt` file. Otherwise we should always set batch_size=1.

Here is the fixing pr which was merged already after version 4.30.2: https://github.com/huggingface/transformers/pull/25074

## Debug the distributed training

Inside the trainer, we can access the main process by:

```python
if args.local_process_index == 0:
    breakpoint()
torch.distributed.barrier()
# the problematic line
labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
```

`try-catch` does not trigger the pdb interface:

```python
try:
    # the problematic line
    labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
except Error as e:
    if args.local_process_index == 0:
        breakpoint()
finally:
    torch.distributed.barrier()
```

## Amulet T4 instance is maintained into wrong information about the number of GPUs

Wrong T4 instance information is maintained by singularity, where `Standard_NC{4,8,16,32}as_T4_v3` only have 1, 1, 1, and 2 GPUs separately, but they are showed to have 1, 2, 4, and 4 GPUs separately.

in `amlt/helpers/sing_instances.py`, we add the below code:

```python
# add at 377, in amlt/helpers/sing_instances.py:fetch_instances_for_series
              # NOTE(xiaoke): Fix T4 wrong number of GPU
              if accelerator == "T4":
                instance_name_to_num_gpu = {
                  "NC8as_T4_v3": ["2", "1"],
                  "NC16as_T4_v3": ["4", "1"],
                  "NC32as_T4_v3": ["4", "2"],
                }
                if instance_name in instance_name_to_num_gpu:
                  description = description.replace(f"GPU x {instance_name_to_num_gpu[instance_name][0]}", f"GPU x {instance_name_to_num_gpu[instance_name][1]}")
                  info = re.search(match, description)
```

Note that we need to print the instance out explicitly. Sometimes we fail to get 4 cards while only get 1 card.

```python
# add at 422, amlt/client/sing_client.py:_setup_script_run_config
print(f"instance: {job.sku.instance}, sku: {job.sku}")
```

How to check:

```shell
amlt cache instance-types
amlt cache instance-types -s NCast4v3
```

## Debug the commands generated by amlt

```
amlt show EXP JOB
```

(deprecated)
```python
# /anaconda/envs/sca-v2/lib/python3.9/site-packages/amlt/client/aml_client.py:create_context
# At the end of this function
      inspect_amlt_job_dir = "tmp/amlt_job/"
      try:
        print(f"Copy code from {code_resource.remote_dir} to {temp_dir}.")
        if os.path.exists(inspect_amlt_job_dir):
            shutil.rmtree(inspect_amlt_job_dir, ignore_errors=True)
        shutil.copytree(temp_dir, inspect_amlt_job_dir)
      except Exception as e:
        print(f"Cannot copy code from {temp_dir} to {inspect_amlt_job_dir} due to {e}")
      yield temp_dir
```

## Test tokenizer


```python
from transformers import AutoProcessor

gpt2_large_tokenizer_cfg = dict(
    pretrained_model_name_or_path="gpt2-large",
    use_fast=True)

openllama_tokenizer_cfg = dict(
    pretrained_model_name_or_path='openlm-research/open_llama_3b_v2',
    use_fast=False)

def print_func(tokenizer, list_of_str):
    print(f"{list_of_str}: {tokenizer(list_of_str)['input_ids']}")

tokenizer = AutoProcessor.from_pretrained(**gpt2_large_tokenizer_cfg)
print_func(tokenizer, ["car", "Car", "CAR"])
print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"])

tokenizer = AutoProcessor.from_pretrained(**openllama_tokenizer_cfg)
print_func(tokenizer, ["car", "Car", "CAR"])
print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"])
```