DrDavis's picture
Upload folder using huggingface_hub
17c6d62 verified

PaliGemma[[paligemma]]

๊ฐœ์š”[[overview]]

PaliGemma ๋ชจ๋ธ์€ ๊ตฌ๊ธ€์ด ์ œ์•ˆํ•œ PaliGemma โ€“ Google์˜ ์ตœ์ฒจ๋‹จ ์˜คํ”ˆ ๋น„์ „ ์–ธ์–ด ๋ชจ๋ธ์—์„œ ์†Œ๊ฐœ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. PaliGemma๋Š” SigLIP ๋น„์ „ ์ธ์ฝ”๋”์™€ Gemma ์–ธ์–ด ์ธ์ฝ”๋”๋กœ ๊ตฌ์„ฑ๋œ 3B ๊ทœ๋ชจ์˜ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ๋กœ, ๋‘ ์ธ์ฝ”๋”๊ฐ€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์„ ํ˜• ํ”„๋กœ์ ์…˜์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋ฅผ ๊ณ ์ •๋œ ์ˆ˜์˜ VITํ† ํฐ์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ  ์ด๋ฅผ ์„ ํƒ์  ํ”„๋กฌํ”„ํŠธ ์•ž์— ์ถ”๊ฐ€ ํ•˜๋ฉฐ, ๋ชจ๋“  ์ด๋ฏธ์ง€ ํ† ํฐ๊ณผ ์ž…๋ ฅ ํ…์ŠคํŠธ ํ† ํฐ์— ๋Œ€ํ•ด ์ „์ฒด ๋ธ”๋ก ์–ดํ…์…˜์„ ์‚ฌ์šฉํ•˜๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

PaliGemma๋Š” 224x224, 448x448, 896x896์˜ 3๊ฐ€์ง€ ํ•ด์ƒ๋„๋กœ ์ œ๊ณต๋˜๋ฉฐ, 3๊ฐœ์˜ ๊ธฐ๋ณธ ๋ชจ๋ธ๊ณผ 55๊ฐœ์˜ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ๋Œ€ํ•ด ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ฒ„์ „, ๊ทธ๋ฆฌ๊ณ  2๊ฐœ์˜ ํ˜ผํ•ฉ ๋ชจ๋ธ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

drawing

PaliGemma ์•„ํ‚คํ…์ฒ˜ ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ.

์ด ๋ชจ๋ธ์€ Molbap์— ์˜ํ•ด ๊ธฐ์—ฌ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ํŒ[[usage-tips]]

PaliGemma์˜ ์ถ”๋ก ์€ ๋‹ค์Œ์ฒ˜๋Ÿผ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:

from transformers import AutoProcessor, PaliGemmaForConditionalGeneration

model_id = "google/paligemma-3b-mix-224"
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

prompt = "What is on the flower?"
image_file = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg?download=true"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(raw_image, prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)

print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])
  • PaliGemma๋Š” ๋Œ€ํ™”์šฉ์œผ๋กœ ์„ค๊ณ„๋˜์ง€ ์•Š์•˜์œผ๋ฉฐ, ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•ด ๋ฏธ์„ธ ์กฐ์ •ํ•  ๋•Œ ๊ฐ€์žฅ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. PaliGemma๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•˜์œ„ ์ž‘์—…์—๋Š” ์ด๋ฏธ์ง€ ์บก์…”๋‹, ์‹œ๊ฐ์  ์งˆ๋ฌธ ๋‹ต๋ณ€(VQA), ์˜ค๋ธŒ์ ํŠธ ๋””ํ…์…˜, ์ฐธ์กฐ ํ‘œํ˜„ ๋ถ„ํ•  ๋ฐ ๋ฌธ์„œ ์ดํ•ด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ์— ํ•„์š”ํ•œ ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ ๋ฐ ์„ ํƒ์  ๋ ˆ์ด๋ธ”์„ ์ค€๋น„ํ•˜๋Š”๋ฐ PaliGemmaProcessor๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. PaliGemma ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ๋•Œ๋Š”, ํ”„๋กœ์„ธ์„œ์— suffix์ธ์ž๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ๋‹ค์Œ ์ฒ˜๋Ÿผ ๋ชจ๋ธ์˜ labels๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
prompt = "What is on the flower?"
answer = "a bee"
inputs = processor(images=raw_image, text=prompt, suffix=answer, return_tensors="pt")

์ž๋ฃŒ[[resources]]

PaliGemma๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” Hugging Face์™€ community ์ž๋ฃŒ ๋ชฉ๋ก(๐ŸŒŽ๋กœ ํ‘œ์‹œ๋จ) ์ž…๋‹ˆ๋‹ค.์—ฌ๊ธฐ์— ํฌํ•จ๋  ์ž๋ฃŒ๋ฅผ ์ œ์ถœํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด PR(Pull Request)๋ฅผ ์—ด์–ด์ฃผ์„ธ์š”. ๋ฆฌ๋ทฐ ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค! ์ž๋ฃŒ๋Š” ๊ธฐ์กด ์ž๋ฃŒ๋ฅผ ๋ณต์ œํ•˜๋Š” ๋Œ€์‹  ์ƒˆ๋กœ์šด ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • PaliGemma์˜ ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ์†Œ๊ฐœํ•˜๋Š” ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ๋Š” ์ด๊ณณ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŽ
  • Trainer API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ VQA(Visual Question Answering)๋ฅผ ์œ„ํ•ด PaliGemma๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ฐ๋ชจ ๋…ธํŠธ๋ถ์€ ์ด๊ณณ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŽ
  • ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹(์˜์ˆ˜์ฆ ์ด๋ฏธ์ง€ -> JSON)์— ๋Œ€ํ•ด PaliGemma๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ฐ๋ชจ ๋…ธํŠธ๋ถ์€ ์ด๊ณณ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŽ

PaliGemmaConfig[[transformers.PaliGemmaConfig]]

[[autodoc]] PaliGemmaConfig

PaliGemmaProcessor[[transformers.PaliGemmaProcessor]]

[[autodoc]] PaliGemmaProcessor

PaliGemmaForConditionalGeneration[[transformers.PaliGemmaForConditionalGeneration]]

[[autodoc]] PaliGemmaForConditionalGeneration - forward