deepspeed / docs /MISC.md

init

002bd9b about 1 year ago

6.81 kB

	## Regex to remove iou scores in `infer.json`

	```json
	},
	"logits": {
	"iou_scores": [
	0.95166015625,
	0.94873046875,
	0.82177734375
	]
	}
	```

	```re
	,\n\s"logits": \{\n\s"iou_scores":\s\[\n\s([\d.]+)\s,\n\s([\d.]+)\s,\n\s([\d.]+)\n\s\]\n\s\}
	```

	## List of captioner models

	Salesforce/blip-image-captioning-large
	Salesforce/blip-image-captioning-base

	Salesforce/blip2-opt-2.7b
	Salesforce/blip2-opt-6.7b-coco
	Salesforce/blip2-opt-6.7b
	Salesforce/blip2-opt-2.7b-coco

	<!-- Need prompts -->
	<!-- Salesforce/instructblip-vicuna-7b -->
	<!-- Salesforce/instructblip-vicuna-13b -->

	microsoft/git-large-coco
	microsoft/git-large-textcaps
	microsoft/git-base
	microsoft/git-base-coco
	microsoft/git-base-textcaps
	microsoft/git-large
	microsoft/git-large-r
	microsoft/git-large-r-coco
	microsoft/git-large-r-textcaps

	<!-- No official code -->
	<!-- laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k -->
	<!-- laion/CoCa-ViT-B-32-laion2B-s13B-b90k -->
	<!-- laion/CoCa-ViT-L-14-laion2B-s13B-b90k -->
	<!-- laion/mscoco_finetuned_CoCa-ViT-B-32-laion2B-s13B-b90k -->

	```shell
	for model in \
	Salesforce/blip2-opt-2.7b \
	Salesforce/blip2-opt-2.7b-coco \
	Salesforce/blip2-opt-6.7b \
	Salesforce/blip2-opt-6.7b-coco
	do
	python \
	-m src.train \
	train_data='[vg-densecap-local]' eval_data='[vg-densecap-local]' \
	+model=base_sam_captioner \
	training.do_train=False \
	training.do_eval=False \
	training.do_inference=True \
	+data.streaming=False \
	training.fp16=True \
	training.output_dir=tmp/sam_captioner/$model \
	training.dataloader_num_workers=4 \
	model.captioner_model_name_or_path=$model
	done
	```

	## The process of batch generation of language model

	`transformers/generation/utils.py:GenerationMixin:generate`



	## Chunckified inference

	Regional chunk size is set to 16

	\| SAM Model \| Captioner \| fp16 \| region chunk size \| Memory (GB) \| Speed (s/it) \|
	\| --------- \| -------------------------------------- \| ---- \| ----------------- \| ----------- \| ------------ \|
	\| ViT-huge \| Salesforce/blip-image-captioning-base \| Yes \| 16 \| ~ 9 \| ~ 5.02 \|
	\| ViT-huge \| Salesforce/blip-image-captioning-base \| No \| 16 \| ~ 8 \| ~ 8.29 \|
	\| ViT-huge \| Salesforce/blip-image-captioning-large \| Yes \| 16 \| ~ 10 \| ~ 6.28 \|
	\| ViT-huge \| Salesforce/blip-image-captioning-large \| No \| 16 \| ~ 9.7 \| ~ 14.99 \|
	\| ViT-huge \| Salesforce/blip2-opt-2.7b \| Yes \| 16 \| ~ 34 \| ~ 5.82 \|
	\| ViT-huge \| Salesforce/blip2-opt-2.7b \| No \| 16 \| ~ 32 \| ~ 18.19 \|
	\| ViT-huge \| Salesforce/blip2-opt-2.7b \| Yes \| 4 \| ~ 34 \| ~ 11.56 \|
	\| ViT-huge \| microsoft/git-large-coco \| Yes \| 16 \| ~ 14 \| ~ 7.06 \|
	\| ViT-huge \| microsoft/git-base-coco \| Yes \| 16 \| ~ 12 \| ~ 3.26 \|

	## Bugs in SAM batch inference when transformers<=4.30.2

	Remember to update the `requirements.txt` file. Otherwise we should always set batch_size=1.

	Here is the fixing pr which was merged already after version 4.30.2: https://github.com/huggingface/transformers/pull/25074

	## Debug the distributed training

	Inside the trainer, we can access the main process by:

	```python
	if args.local_process_index == 0:
	breakpoint()
	torch.distributed.barrier()
	# the problematic line
	labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
	```

	`try-catch` does not trigger the pdb interface:

	```python
	try:
	# the problematic line
	labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
	except Error as e:
	if args.local_process_index == 0:
	breakpoint()
	finally:
	torch.distributed.barrier()
	```

	## Amulet T4 instance is maintained into wrong information about the number of GPUs

	Wrong T4 instance information is maintained by singularity, where `Standard_NC{4,8,16,32}as_T4_v3` only have 1, 1, 1, and 2 GPUs separately, but they are showed to have 1, 2, 4, and 4 GPUs separately.

	in `amlt/helpers/sing_instances.py`, we add the below code:

	```python
	# add at 377, in amlt/helpers/sing_instances.py:fetch_instances_for_series
	# NOTE(xiaoke): Fix T4 wrong number of GPU
	if accelerator == "T4":
	instance_name_to_num_gpu = {
	"NC8as_T4_v3": ["2", "1"],
	"NC16as_T4_v3": ["4", "1"],
	"NC32as_T4_v3": ["4", "2"],
	}
	if instance_name in instance_name_to_num_gpu:
	description = description.replace(f"GPU x {instance_name_to_num_gpu[instance_name][0]}", f"GPU x {instance_name_to_num_gpu[instance_name][1]}")
	info = re.search(match, description)
	```

	Note that we need to print the instance out explicitly. Sometimes we fail to get 4 cards while only get 1 card.

	```python
	# add at 422, amlt/client/sing_client.py:_setup_script_run_config
	print(f"instance: {job.sku.instance}, sku: {job.sku}")
	```

	How to check:

	```shell
	amlt cache instance-types
	amlt cache instance-types -s NCast4v3
	```

	## Debug the commands generated by amlt

	```
	amlt show EXP JOB
	```

	(deprecated)
	```python
	# /anaconda/envs/sca-v2/lib/python3.9/site-packages/amlt/client/aml_client.py:create_context
	# At the end of this function
	inspect_amlt_job_dir = "tmp/amlt_job/"
	try:
	print(f"Copy code from {code_resource.remote_dir} to {temp_dir}.")
	if os.path.exists(inspect_amlt_job_dir):
	shutil.rmtree(inspect_amlt_job_dir, ignore_errors=True)
	shutil.copytree(temp_dir, inspect_amlt_job_dir)
	except Exception as e:
	print(f"Cannot copy code from {temp_dir} to {inspect_amlt_job_dir} due to {e}")
	yield temp_dir
	```

	## Test tokenizer


	```python
	from transformers import AutoProcessor

	gpt2_large_tokenizer_cfg = dict(
	pretrained_model_name_or_path="gpt2-large",
	use_fast=True)

	openllama_tokenizer_cfg = dict(
	pretrained_model_name_or_path='openlm-research/open_llama_3b_v2',
	use_fast=False)

	def print_func(tokenizer, list_of_str):
	print(f"{list_of_str}: {tokenizer(list_of_str)['input_ids']}")

	tokenizer = AutoProcessor.from_pretrained(**gpt2_large_tokenizer_cfg)
	print_func(tokenizer, ["car", "Car", "CAR"])
	print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"])

	tokenizer = AutoProcessor.from_pretrained(**openllama_tokenizer_cfg)
	print_func(tokenizer, ["car", "Car", "CAR"])
	print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"])
	```