File size: 2,578 Bytes
002bd9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Model

To add a new model, you need to modify the following parts of the code:

1. Modify `configuration_*.py` file, especially the `from_*` static method.
2. Add a `modeling_*.py` files. Note ahout the `from_*` static method, which calls the `from_*` static method from `configuration_*.py`.
3. Add an argument class in `arguments.py` accordingly.
4. Add the new model in `__init__.py`.
5. Import the new model and argument class in the main script like `train.py`, and call `from_*` according to its parameters.
6. May need to add processor.

## Architecture

### Multitaskv2

`base_sca_multitask_v2`

It uses `task_type` to activate different task tokens, which are `recognition` and `caption`

### DirectDecodingv2 (MultitaskV2)

`base_sca_direct_decoding_v2`

Like Multitaskv2, but the caption tokens are the query tokens of SAM.

### SplitMixer (Multitaskv2)

`base_sca_multitask_split_mixer`

Like Multitaskv2, but it does not based on the fused tokens from SAM's feature mixer.

### ROI Pooler (Multitaskv2)


### Other Image features (Multitaskv2)


## Inputs and Outputs

SCA trainer requires that every items in  `logits` should not be `None`.
When it gathers the results across devices during inference, it calls `self._pad_across_processes` which recursively concatenates tensors.

## Attributes and Methods

TBD

## HF Trainer Adaption

TBD

## SAM Models

https://huggingface.co/facebook/sam-vit-base
- facebook/sam-vit-base

https://huggingface.co/facebook/sam-vit-large
- facebook/sam-vit-large

https://huggingface.co/facebook/sam-vit-huge
- facebook/sam-vit-huge


## Language Models

https://huggingface.co/gpt2-large
- gpt2-large

https://huggingface.co/openlm-research/open_llama_3b_v2
- openlm-research/open_llama_3b_v2

https://huggingface.co/stabilityai/stablelm-3b-4e1t
- stabilityai/stablelm-3b-4e1t

https://huggingface.co/stabilityai/stablelm-zephyr-3b
- stabilityai/stablelm-zephyr-3b
- modle after SFT and RLAIF
- the tokenizer is update from `GPTNeoXTokenizer`
- Need the latest version of transformers.

https://huggingface.co/microsoft/phi-2
- microsoft/phi-2
- d3186761bf5c4409f7679359284066c25ab668ee

https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
- HuggingFaceH4/zephyr-7b-beta

https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
- HuggingFaceH4/zephyr-7b-alpha

https://huggingface.co/mistralai/Mistral-7B-v0.1
- mistralai/Mistral-7B-v0.1

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
- mistralai/Mistral-7B-Instruct-v0.1

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- mistralai/Mistral-7B-Instruct-v0.2