AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified
PyTorch SDPA

Gemma 3 [[gemma3]]

Gemma 3๋Š” ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ฒ„์ „๊ณผ ์ง€์‹œ๋ฌธ ์กฐ์ • ๋ฒ„์ „์„ ๊ฐ–์ถ˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ๋กœ, 1B, 13B, 27B ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์•„ํ‚คํ…์ฒ˜๋Š” ์ด์ „ Gemma ๋ฒ„์ „๊ณผ ๋Œ€๋ถ€๋ถ„ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ์ฃผ์š” ์ฐจ์ด์ ์€ ๋ชจ๋“  ๊ธ€๋กœ๋ฒŒ ์…€ํ”„ ์–ดํ…์…˜ ๋ ˆ์ด์–ด๋งˆ๋‹ค 5๊ฐœ์˜ ๋กœ์ปฌ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ์…€ํ”„ ์–ดํ…์…˜ ๋ ˆ์ด์–ด๋ฅผ ๋ฒˆ๊ฐˆ์•„ ์‚ฌ์šฉํ•˜๋Š” ์ , 128K ํ† ํฐ์˜ ๋” ๊ธด ์ปจํ…์ŠคํŠธ ๊ธธ์ด๋ฅผ ์ง€์›ํ•˜๋Š” ์ , ๊ทธ๋ฆฌ๊ณ  ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋‚˜ ์ •์‚ฌ๊ฐํ˜•์ด ์•„๋‹Œ ์ข…ํšก๋น„์˜ ์ด๋ฏธ์ง€์—์„œ ์ •๋ณด๊ฐ€ ์‚ฌ๋ผ์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฅผ "ํŒจ๋‹ ๋ฐ ์Šค์บ๋‹"ํ•  ์ˆ˜ ์žˆ๋Š” SigLip ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

์ง€์‹œ๋ฌธ ์กฐ์ • ๋ฒ„์ „์€ ์ง€์‹ ์ฆ๋ฅ˜ ๋ฐ ๊ฐ•ํ™” ํ•™์Šต์œผ๋กœ ํ›„์† ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Gemma 3์˜ ๋ชจ๋“  ์›๋ณธ ์ฒดํฌํฌ์ธํŠธ๋Š” Gemma 3 ๋ฆด๋ฆฌ์Šค์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[!ํŒ] Gemma๋ฅผ ๋‹ค์–‘ํ•œ ๋น„์ „ ๋ฐ ์–ธ์–ด ์ž‘์—…์— ์ ์šฉํ•˜๋Š” ์ถ”๊ฐ€ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ ค๋ฉด ์˜ค๋ฅธ์ชฝ ์‚ฌ์ด๋“œ๋ฐ”์˜ Gemma 3 ๋ชจ๋ธ์„ ํด๋ฆญํ•˜์„ธ์š”.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” [Pipeline] ๋˜๋Š” [AutoModel] ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

import torch
from transformers import pipeline

pipeline = pipeline(
    task="image-text-to-text",
    model="google/gemma-3-4b-pt",
    device=0,
    dtype=torch.bfloat16
)
pipeline(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg",
    text="<start_of_image> What is shown in this image?"
)
import torch
from transformers import AutoProcessor, Gemma3ForConditionalGeneration

model = Gemma3ForConditionalGeneration.from_pretrained(
    "google/gemma-3-4b-it",
    dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="sdpa"
)
processor = AutoProcessor.from_pretrained(
    "google/gemma-3-4b-it",
    padding_side="left"
)

messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful assistant."}
        ]
    },
    {
        "role": "user", "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
]
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=50, cache_implementation="static")
print(processor.decode(output[0], skip_special_tokens=True))
echo -e "Plants create energy through a process known as" | transformers run --task text-generation --model google/gemma-3-1b-pt --device 0

์–‘์žํ™”๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ํ‘œํ˜„ํ•˜์—ฌ, ํฐ ๋ชจ๋ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€๋‹ด์„ ์ค„์—ฌ์ค๋‹ˆ๋‹ค. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์–‘์žํ™” ๋ฐฑ์—”๋“œ์— ๋Œ€ํ•œ ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์–‘์žํ™” ๊ฐœ์š”๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

์•„๋ž˜ ์˜ˆ์ œ์—์„œ๋Š” torchao๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ int4๋กœ๋งŒ ์–‘์žํ™”ํ•ฉ๋‹ˆ๋‹ค.

# pip install torchao
import torch
from transformers import TorchAoConfig, Gemma3ForConditionalGeneration, AutoProcessor

quantization_config = TorchAoConfig("int4_weight_only", group_size=128)
model = Gemma3ForConditionalGeneration.from_pretrained(
    "google/gemma-3-27b-it",
    dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=quantization_config
)
processor = AutoProcessor.from_pretrained(
    "google/gemma-3-27b-it",
    padding_side="left"
)

messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful assistant."}
        ]
    },
    {
        "role": "user", "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
]
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)

output = model.generate(**inputs, max_new_tokens=50, cache_implementation="static")
print(processor.decode(output[0], skip_special_tokens=True))

AttentionMaskVisualizer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์ฃผ๋ชฉํ•  ์ˆ˜ ์žˆ๋Š” ํ† ํฐ๊ณผ ์ฃผ๋ชฉํ•  ์ˆ˜ ์—†๋Š” ํ† ํฐ์„ ๋” ์ž˜ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers.utils.attention_visualizer import AttentionMaskVisualizer

visualizer = AttentionMaskVisualizer("google/gemma-3-4b-it")
visualizer("<img>What is shown in this image?")

๋…ธํŠธ [[notes]]

  • ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ๋ฐ ์ด๋ฏธ์ง€ ์ „์šฉ ์ž…๋ ฅ์—๋Š” [Gemma3ForConditionalGeneration]์„ ์‚ฌ์šฉํ•˜์„ธ์š”.

  • Gemma 3๋Š” ๋‹ค์ค‘ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์ง€์›ํ•˜์ง€๋งŒ, ํ”„๋กœ์„ธ์„œ์— ์ „๋‹ฌํ•˜๊ธฐ ์ „์— ์ด๋ฏธ์ง€๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ฐฐ์น˜๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ๊ฐ ๋ฐฐ์น˜๋Š” ํ•˜๋‚˜ ์ด์ƒ์˜ ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•œ ๋ฆฌ์ŠคํŠธ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

    url_cow = "https://media.istockphoto.com/id/1192867753/photo/cow-in-berchida-beach-siniscola.jpg?s=612x612&w=0&k=20&c=v0hjjniwsMNfJSuKWZuIn8pssmD5h5bSN1peBd1CmH4="
    url_cat = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
    
    messages =[
        {
            "role": "system",
            "content": [
                {"type": "text", "text": "You are a helpful assistant."}
            ]
        },
        {
            "role": "user",
            "content": [
                {"type": "image", "url": url_cow},
                {"type": "image", "url": url_cat},
                {"type": "text", "text": "Which image is cuter?"},
            ]
        },
    ]
    
  • ํ”„๋กœ์„ธ์„œ์— ์ „๋‹ฌ๋˜๋Š” ํ…์ŠคํŠธ์—๋Š” ์ด๋ฏธ์ง€๊ฐ€ ์‚ฝ์ž…๋˜์–ด์•ผ ํ•˜๋Š” ์œ„์น˜๋งˆ๋‹ค <start_of_image> ํ† ํฐ์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • ํ”„๋กœ์„ธ์„œ์—๋Š” ์ฑ„ํŒ… ๋ฉ”์‹œ์ง€๋ฅผ ๋ชจ๋ธ ์ž…๋ ฅ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ž์ฒด [~ProcessorMixin.apply_chat_template] ๋ฉ”์†Œ๋“œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋ฏธ์ง€๋Š” ์ž˜๋ฆฌ์ง€ ์•Š์œผ๋ฉฐ ๊ธฐ๋ณธ ์ด๋ฏธ์ง€๋งŒ ๋ชจ๋ธ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋‚˜ ์ •์‚ฌ๊ฐํ˜•์ด ์•„๋‹Œ ์ข…ํšก๋น„์˜ ์ด๋ฏธ์ง€์—์„œ๋Š” ๋น„์ „ ์ธ์ฝ”๋”๊ฐ€ 896x896์˜ ๊ณ ์ • ํ•ด์ƒ๋„๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์•„ํ‹ฐํŒฉํŠธ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์•„ํ‹ฐํŒฉํŠธ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ณ  ์ถ”๋ก  ์ค‘ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋ ค๋ฉด, do_pan_and_scan=True๋ฅผ ์„ค์ •ํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ž‘์€ ํŒจ์น˜๋กœ ์ž๋ฅด๊ณ  ๊ธฐ๋ณธ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ๊ณผ ์ด์–ด ๋ถ™์ž…๋‹ˆ๋‹ค. ๋” ๋น ๋ฅธ ์ถ”๋ก ์„ ์œ„ํ•ด ํŒฌ๊ณผ ์Šค์บ”์„ ๋น„ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    inputs = processor.apply_chat_template(
        messages,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
        add_generation_prompt=True,
    +   do_pan_and_scan=True,
        ).to(model.device)
    
  • ํ…์ŠคํŠธ ์ „์šฉ ๋ชจ๋“œ๋กœ ํ›ˆ๋ จ๋œ Gemma-3 1B ์ฒดํฌํฌ์ธํŠธ์˜ ๊ฒฝ์šฐ, [AutoModelForCausalLM]์„ ๋Œ€์‹  ์‚ฌ์šฉํ•˜์„ธ์š”.

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained(
        "google/gemma-3-1b-pt",
    )
    model = AutoModelForCausalLM.from_pretrained(
        "google/gemma-3-1b-pt",
        dtype=torch.bfloat16,
        device_map="auto",
        attn_implementation="sdpa"
    )
    input_ids = tokenizer("Plants create energy through a process known as", return_tensors="pt").to(model.device)
    
    output = model.generate(**input_ids, cache_implementation="static")
    print(tokenizer.decode(output[0], skip_special_tokens=True))
    

Gemma3ImageProcessor

[[autodoc]] Gemma3ImageProcessor

Gemma3ImageProcessorFast

[[autodoc]] Gemma3ImageProcessorFast

Gemma3Processor

[[autodoc]] Gemma3Processor

Gemma3TextConfig

[[autodoc]] Gemma3TextConfig

Gemma3Config

[[autodoc]] Gemma3Config

Gemma3TextModel

[[autodoc]] Gemma3TextModel - forward

Gemma3Model

[[autodoc]] Gemma3Model

Gemma3ForCausalLM

[[autodoc]] Gemma3ForCausalLM - forward

Gemma3ForConditionalGeneration

[[autodoc]] Gemma3ForConditionalGeneration - forward

Gemma3ForSequenceClassification

[[autodoc]] Gemma3ForSequenceClassification - forward