File size: 2,271 Bytes
29d1fb6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Result Analysis

## 1. Tokenizer

`AutoTokenizer`๋Š” ํ…์ŠคํŠธ ๋ฌธ์žฅ์„ ๋ชจ๋ธ์ด ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ •์ˆ˜ ID ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.
์‹คํ–‰ ๊ฒฐ๊ณผ `input_ids`, `token_type_ids`, `attention_mask`๊ฐ€ ์ƒ์„ฑ๋˜์—ˆ๋‹ค.

- `input_ids`: ํ† ํฐ์„ vocabulary index๋กœ ๋ฐ”๊พผ ๊ฐ’
- `token_type_ids`: BERT ๊ณ„์—ด์—์„œ ๋ฌธ์žฅ A/B๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๊ฐ’
- `attention_mask`: ์‹ค์ œ ํ† ํฐ์€ 1, padding ํ† ํฐ์€ 0

์ €์žฅ ํ›„ ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์˜จ tokenizer๋„ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ์ž…๋ ฅ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

## 2. ImageProcessor

`AutoImageProcessor` ๋˜๋Š” fallback `ViTImageProcessor`๋Š” PIL ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋ธ ์ž…๋ ฅ์šฉ tensor๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.
์‹คํ–‰ ๊ฒฐ๊ณผ `pixel_values`์˜ shape๋Š” `(1, 3, 224, 224)`๋กœ ํ™•์ธ๋˜์—ˆ๋‹ค.

์ด๋Š” ๋‹ค์Œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„๋‹ค.

- batch size: 1
- channel: 3, RGB ์ด๋ฏธ์ง€
- height, width: 224 x 224

์ €์žฅ ์‹œ `preprocessor_config.json`์ด ์ƒ์„ฑ๋˜๋ฉฐ, ์ด ์„ค์ • ํŒŒ์ผ์„ ํ†ตํ•ด ๊ฐ™์€ ์ „์ฒ˜๋ฆฌ ๋ฐฉ์‹์„ ๋‹ค์‹œ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋‹ค.

## 3. Processor / CLIP

`AutoProcessor` ๋˜๋Š” `CLIPProcessor`๋Š” tokenizer์™€ image processor๋ฅผ ํ•จ๊ป˜ ๋ฌถ์€ ๊ฐ์ฒด์ด๋‹ค.
๋”ฐ๋ผ์„œ ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€๋ฅผ ๋™์‹œ์— ์ž…๋ ฅ๋ฐ›์•„ ๋‹ค์Œ key๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

- `pixel_values`: ์ด๋ฏธ์ง€ ์ž…๋ ฅ
- `input_ids`: ํ…์ŠคํŠธ ํ† ํฐ ID
- `attention_mask`: ํ…์ŠคํŠธ padding mask
- `token_type_ids`: tokenizer ์ข…๋ฅ˜์— ๋”ฐ๋ผ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์žฅ ๊ตฌ๋ถ„ ID

Processor๋„ `save_pretrained()`์™€ `from_pretrained()` ๋ฐฉ์‹์œผ๋กœ ์ €์žฅํ•˜๊ณ  ๋ณต์›ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

## 4. Custom ImageProcessor

`ImageProcessingMixin`์„ ์ƒ์†ํ•˜์—ฌ ์ง์ ‘ `SimpleVisionImageProcessor`๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค.
์ด ํ”„๋กœ์„ธ์„œ๋Š” ๋‹ค์Œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

1. PIL ์ด๋ฏธ์ง€ ๋˜๋Š” numpy array ์ž…๋ ฅ ์ฒ˜๋ฆฌ
2. RGB ๋ณ€ํ™˜
3. resize
4. `[0, 1]` ๋ฒ”์œ„๋กœ rescale
5. mean/std normalize
6. `(B, C, H, W)` ํ˜•์‹์˜ `pixel_values` ์ƒ์„ฑ

์ €์žฅ ์ „๊ณผ ์ €์žฅ ํ›„ ๋‹ค์‹œ ๋กœ๋“œํ•œ processor์˜ ์ถœ๋ ฅ ์ฐจ์ด๋ฅผ ๋น„๊ตํ–ˆ์„ ๋•Œ `max diff = 0.0`์œผ๋กœ ํ™•์ธ๋˜์—ˆ๋‹ค.
์ฆ‰, ์ €์žฅ/๋ณต์› ํ›„์—๋„ ๊ฐ™์€ ์ „์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค.

## 5. ์‹คํ–‰ ๊ฒ€์ฆ

- ๋ชจ๋“  Python ํŒŒ์ผ ๋ฌธ๋ฒ• ๊ฒ€์‚ฌ ํ†ต๊ณผ
- ์ธํ„ฐ๋„ท์ด ์—†๋Š” ํ™˜๊ฒฝ์—์„œ fallback ๋ชจ๋“œ๋กœ `python scripts/run_all.py` ์‹คํ–‰ ์„ฑ๊ณต
- ์‹คํ–‰ ๋กœ๊ทธ๋Š” `outputs/logs/run_all_test.log`์— ์ €์žฅ๋จ