Model Card for Model ID

Model Details

Model Description

์ด ๋ชจ๋ธ์€ Google์˜ ๊ฐ•๋ ฅํ•œ ์†Œํ˜• ์–ธ์–ด ๋ชจ๋ธ์ธ Gemma-3-1B-it์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ํ•œ๊ตญ์–ด ๋ฒ”์ฃ„ ์‚ฌ๊ฑด ๋ณด๊ณ ์„œ ๋ถ„์„ ํƒœ์Šคํฌ์— ๋งž๊ฒŒ ๋ฏธ์„ธ ์กฐ์ •(Fine-tuning)๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

# ์ฃผ์š” ๊ธฐ๋Šฅ
์‚ฌ๊ฑด ์žฌ๊ตฌ์„ฑ (Context Generation): ์‚ฌ๊ฑด ๋ณด๊ณ ์„œ์˜ ๋‚ด์šฉ๊ณผ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋‹น์‹œ ์ƒํ™ฉ์„ ๋…ผ๋ฆฌ์ ์œผ๋กœ ์žฌ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ๊ฑด ์œ ํ˜• ๋ถ„๋ฅ˜ (Kind Classification): ์žฌ๊ตฌ์„ฑ๋œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ์‚ฌ๊ฑด์˜ ์ข…๋ฅ˜๋ฅผ ์ •ํ™•ํžˆ ๋ถ„๋ฅ˜ํ•˜์—ฌ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

# ํ•™์Šต ์ „๋žต
(QLoRA)๋ชจ๋ธ ํ•™์Šต์—๋Š” QLoRA (Quantized Low-Rank Adaptation) ๊ธฐ๋ฒ•์ด ์ ์šฉ๋˜์–ด, ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ๊ณผ ๋น ๋ฅธ ํ•™์Šต ์†๋„๋ฅผ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

์–‘์žํ™”: 4-bit NF4 ์–‘์žํ™” (BitsAndBytes)
PEFT: LoRA ์ ์šฉ (Rank $r=16$, $\alpha=32$)
์ตœ์ ํ™”: adamw_torch, Learning Rate $2e-4$, Cosine Scheduler

ํ•„์ˆ˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฐ ๋ฒ„์ „์ด ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฐ ์‚ฌ์šฉ์„ ์œ„ํ•ด์„œ๋Š” ๋‹ค์Œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฒ„์ „์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค.
๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฒ„์ „
transformers : 4.57.3
accelerate : 1.12.0
bitsandbytes : 0.48.2
peft : 0.15.2
torch : 2.9.0

Uses

from peft import prepare_model_for_kbit_training, LoraConfig, TaskType
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer, TrainingArguments, Trainer

from peft import PeftModel


base_model = "google/gemma-3-1b-it"

tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.add_special_tokens({
    "additional_special_tokens": ["<start_of_turn>", "<end_of_turn>"]
})

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model.config.pad_token_id = tokenizer.pad_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.eos_token_id = tokenizer.eos_token_id

model.resize_token_embeddings(len(tokenizer))

model = PeftModel.from_pretrained(model, f"/lora_adapter")


question = """๋‹ค์Œ ์‚ฌ๊ฑด ๋ณด๊ณ ์„œ๋ฅผ ํ†ตํ•ด์„œ ๋‹น์‹œ ๋ฒ”์ฃ„ ์‚ฌ๊ฑด์„ ์žฌ๊ตฌ์„ฑํ•ด์ฃผ๊ณ , ์‚ฌ๊ฑด ์œ ํ˜•์„ ๋ถ„๋ฅ˜ํ•ด์ค˜.

[์‚ฌ๊ฑด ๋ณด๊ณ ์„œ]
2024๋…„ 4์›” 25์ผ, ์„œ์šธ ์ˆญ์‹ค๋Œ€์ž…๊ตฌ์—ญ ์ธ๊ทผ์—์„œ ์ด์‚ฟ์ง ํ™”๋ฌผ์ฐจ๊ฐ€ ์ธ๋„๋กœ ๋Œ์ง„ํ•ด 60๋Œ€ ๋‚จ์„ฑ์„ ๋ถ€์ƒ์‹œํ‚จ ์‚ฌ๊ฑด์ด ๋ฐœ์ƒํ•˜์˜€๋‹ค.
ํ˜„์žฅ ์กฐ์‚ฌ ๊ฒฐ๊ณผ, ํ™”๋ฌผ์ฐจ ์šด์ „์ž ๊น€๋ฏผ์ˆ˜(35)๋Š” ์‚ฌ์ „์— ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ๋ฅผ ๋งˆ๋ชจ์‹œํ‚ค๊ณ  ๋ธŒ๋ ˆ์ดํฌ์•ก์— ๋ฌผ์„ ์„ž์–ด ๊ณ ์žฅ์„ ์œ ๋ฐœํ•œ ๊ฒƒ์œผ๋กœ ํ™•์ธ๋˜์—ˆ๋‹ค.
์‚ฌ๊ณ  ์งํ›„ ๊น€์€ ์ฐจ ๋‚ด๋ถ€์— ์ˆจ๊ฒจ์ง„ ๊ธˆ๊ณ ์—์„œ ํ˜„๊ธˆ 200๋งŒ ์›๊ณผ ๊ฐ€์ฃฝ ๊ฐ€๋ฐฉ, ์Šค๋งˆํŠธํฐ์„ ๊บผ๋‚ด ์ƒ๊ฐ€ ์•ˆ๊ฒฝ์› ์ง์›์—๊ฒŒ ์ „๋‹ฌํ•˜๊ณ , ์€ํ–‰ ๊ณ„์ขŒ๋กœ ์†ก๊ธˆํ•˜์˜€๋‹ค.
๊ฒฝ์ฐฐ์€ ์ฐจ๋Ÿ‰ GPS ๊ธฐ๋ก, ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ ๋งˆ๋ชจ ์ƒํƒœ, CCTV ์˜์ƒ ์กฐ์ž‘ ์—ฌ๋ถ€๋ฅผ ์กฐ์‚ฌ ์ค‘์ด๋ฉฐ, ๊น€์€ ๊ตํ†ต์‚ฌ๊ณ ์ฒ˜๋ฆฌํŠน๋ก€๋ฒ•์— ๋”ฐ๋ผ ์น˜์ƒ ํ˜์˜๋กœ ์ž…๊ฑด๋˜์—ˆ๋‹ค.
ํ˜„์žฌ ์ˆ˜์‚ฌ๋Š” ๊น€์˜ ์€ํ–‰ ์†ก๊ธˆ ๊ธฐ๋ก๊ณผ ๊ธˆ๊ณ  ๋‚ด์šฉ๋ฌผ ํ™•๋ณด๋ฅผ ํ†ตํ•ด ๋ฒ”์ฃ„ ๋™๊ธฐ์™€ ๋ฒ”ํ–‰ ๋ฐฉ๋ฒ•์„ ํŒŒ์•… ์ค‘์ด๋‹ค. ์‚ฌ๊ฑด์€ ์•„์ง ๊ฒฐ๋ง์ด ๋‚ด๋ ค์ง€์ง€ ์•Š์•˜์œผ๋ฉฐ, ์ถ”๊ฐ€ ์ฆ๊ฑฐ ํ™•๋ณด๊ฐ€ ํ•„์š”ํ•˜๋‹ค."""


messages = [
    {"role": "system", "content": """๋‹น์‹ ์€ ๋ฒ”ํ–‰๋ถ„์„ AI ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์งˆ๋ฌธ์— ์ •ํ™•ํ•˜๊ฒŒ ๋‹ต๋ณ€ํ•˜์„ธ์š”."""},
    {"role": "user", "content": question}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to(model.device)

model.eval()
outputs = model.generate(
    **inputs,
    max_new_tokens=1024 * 2,
    temperature=0.1,
    do_sample=True,
)

generated = outputs[0][inputs["input_ids"].shape[-1] :]
answer = tokenizer.decode(generated, skip_special_tokens=True)
print('์‚ฌ๊ฑด ์žฌ๊ตฌ์„ฑ : ', answer)


์‚ฌ๊ฑด ์žฌ๊ตฌ์„ฑ : ์„œ์šธ ์„œ๋ถ€์˜ ํ•œ ๋Œ€ํ˜• ์ด์‚ฟ์ง ํ™”๋ฌผ์ฐจ๊ฐ€ ์ธ๋„๋กœ ๋Œ์ง„ํ•˜๋ฉฐ 60๋Œ€ ๋‚จ์„ฑ์„ ์น˜๋ช…์ ์œผ๋กœ ๋ถ€์ƒ์‹œํ‚จ ์‚ฌ๊ฑด์€ ๋‹จ์ˆœํ•œ ๊ตํ†ต์‚ฌ๊ณ ๊ฐ€ ์•„๋‹ˆ๋ผ, ์ •๊ตํ•˜๊ฒŒ ๊ณ„ํš๋œ โ€˜์‚ฌ๊ณ โ€™์˜€๋‹ค. ์‚ฌ๊ฑด์˜ ๋ฐฐ๊ฒฝ์€ ๋ฐ”๋กœ ์ด์‚ฟ์ง์ด ์„œ์šธ ์ˆญ์‹ค๋Œ€์ž…๊ตฌ์—ญ ์ธ๋„๋กœ ์ง„์ž…ํ•˜๊ธฐ ์ง์ „, ํ™”๋ฌผ์ฐจ ์šด์ „์ž ๊น€๋ฏผ์ˆ˜(35)๊ฐ€ ์‚ฌ์ „์— ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ๋ฅผ ๋งˆ๋ชจ์‹œํ‚ค๊ณ  ๋ธŒ๋ ˆ์ดํฌ์•ก์— ๋ฌผ์„ ์„ž์–ด ๋‘์—ˆ๋‹ค๋Š” ์ ์ด๋‹ค. ๊น€์€ ์ด์ „์— ์ด์‚ฟ์ง์„ ์šดํ–‰ํ•˜๋ฉด์„œ ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ๊ฐ€ ๋งˆ๋ชจ๋˜๋Š” ๊ฒƒ์„ ์ž์ฃผ ๊ด€์ฐฐํ–ˆ๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด โ€˜๋ธŒ๋ ˆ์ดํฌ ๊ณ ์žฅโ€™์ด๋ผ๋Š” ๊ฐ€์งœ ์‚ฌ์œ ๋ฅผ ๋งŒ๋“ค ๊ณ„ํš์„ ์„ธ์› ๋‹ค.  

๊น€์€ ์‚ฌ์ „์— โ€˜๋ธŒ๋ ˆ์ดํฌ ๊ณ ์žฅโ€™์ด๋ผ๋Š” ๊ฐ€์งœ ์‚ฌ์œ ๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ์— ๋ฏธ์„ธํ•œ ๊ท ์—ด์„ ๋งŒ๋“ค๊ณ , ๋ธŒ๋ ˆ์ดํฌ์•ก์— ๋ฌผ์„ ๋„ฃ์–ด ๋ธŒ๋ ˆ์ดํฌ๊ฐ€ ์ž‘๋™ํ•˜์ง€ ์•Š๋„๋ก ์กฐ์ž‘ํ–ˆ๋‹ค. ์ด ๊ณผ์ •์€ 30๋ถ„ ๋‚ด์— ์™„์ˆ˜๋˜์—ˆ์œผ๋ฉฐ, ๋ธŒ๋ ˆ์ดํฌ๊ฐ€ ์ž‘๋™ํ•˜์ง€ ์•Š์ž ์ฐจ๊ฐ€ ๊ธ‰์†ํžˆ ๊ฐ€์†ํ•ด ์ธ๋„๋กœ ๋Œ์ง„ํ–ˆ๋‹ค. ์ด๋•Œ ๊น€์€ ์ฐจ ๋‚ด๋ถ€์— ์ˆจ๊ฒจ์ง„ ์ž‘์€ ๊ธˆ๊ณ ์— ํ˜„๊ธˆ 200๋งŒ ์›๊ณผ ๊ฐ€์ฃฝ ๊ฐ€๋ฐฉ, ๊ทธ๋ฆฌ๊ณ  ์Šค๋งˆํŠธํฐ์„ ์ˆจ๊ฒจ ๋‘์—ˆ๋‹ค.  

์‚ฌ๊ณ  ์งํ›„ ๊น€์€ ์ฐจ๋ฅผ ์ฆ‰์‹œ ์ •์ง€์‹œ์ผฐ๊ณ , ์ฃผ๋ณ€์— ์žˆ๋˜ ์ด์›ƒ์ง‘ ์ƒ๊ฐ€ ์•ˆ๊ฒฝ์› ์ง์›์—๊ฒŒ โ€˜์ฐจ๋Ÿ‰์ด ๊ณ ์žฅ๋‚ฌ์Šต๋‹ˆ๋‹คโ€™๋ผ๊ณ  ์•Œ๋ฆฌ๋ฉฐ ํ˜„์žฅ์„ ๋– ๋‚ฌ๋‹ค. ์ดํ›„ ๊น€์€ ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค, ์ฐจ๊ฐ€ ๋ฉˆ์ถ”์ž ๋ฐ”๋กœ ํ˜„๊ธˆ๊ณผ ๋ฌผ๊ฑด์„ ๊บผ๋‚ด ์ƒ๊ฐ€ ์•ˆ๊ฒฝ์› ์ง์›์—๊ฒŒ ์ „๋‹ฌํ–ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๊น€์€ ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค, ์ฐจ๊ฐ€ ์ •์ง€ํ•œ ์ƒํƒœ์—์„œ ์ฐจ๋ฅผ ๋Œ์–ด๋‚ด์–ด ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋‹ค.  

์ด ์‚ฌ๊ฑด์€ ๊ตํ†ต์‚ฌ๊ณ ์ฒ˜๋ฆฌํŠน๋ก€๋ฒ•์— ๋”ฐ๋ผ โ€˜์น˜์ƒโ€™์œผ๋กœ ๊ธฐ์†Œ๋  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊น€์€ ์‚ฌ๊ฑด์„ ์€ํํ•˜๊ธฐ ์œ„ํ•ด ์ฐจ๋ฅผ ์ •์ง€์‹œํ‚จ ๋’ค, ์ฐจ ๋‚ด๋ถ€์— ์ˆจ๊ฒจ์ง„ ๋ฌผ๊ฑด์„ ์ˆจ๊ธฐ๊ณ , ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค ์ฐจ๋ฅผ ๋Œ์–ด๋‚ด์–ด ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๊น€์€ ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค, ์ฐจ๋ฅผ ๋Œ์–ด๋‚ด์–ด ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋‹ค.  

์ˆ˜์‚ฌ ๊ณผ์ •์—์„œ ๊ฒฝ์ฐฐ์€ ์ฐจ์˜ GPS ๊ธฐ๋ก๊ณผ ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ ๋งˆ๋ชจ ์ƒํƒœ๋ฅผ ์กฐ์‚ฌํ•ด ๊น€์ด ์‚ฌ์ „์— ๋ธŒ๋ ˆ์ดํฌ ํŒจ๋“œ๋ฅผ ๋งˆ๋ชจ์‹œํ‚จ ์‚ฌ์‹ค์„ ํ™•์ธํ–ˆ๋‹ค. ๋˜ํ•œ, ๊น€์ด ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค ์ฐจ๋ฅผ ๋Œ์–ด๋‚ด์–ด ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋‹ค๋Š” ์ฆ๊ฑฐ๊ฐ€ ํ™•๋ณด๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊น€์€ ์‚ฌ๊ฑด ์งํ›„ ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค ์ฐจ๋ฅผ ๋Œ์–ด๋‚ด์–ด ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋‹ค.  

์ด ์‚ฌ๊ฑด์€ ๊ตํ†ต์‚ฌ๊ณ ๊ฐ€ ์•„๋‹Œ, ์‚ฌ์ „์— ๊ณ„ํš๋œ โ€˜์‚ฌ๊ณ โ€™๋กœ ๋ณด์ด๋„๋ก ์กฐ์ž‘๋œ ๊ฒƒ์ด์—ˆ๋‹ค๋Š” ์ ์ด ๊ฐ€์žฅ ํฐ ํŠน์ง•์ด๋‹ค. ๊น€์€ ์‚ฌ๊ฑด์„ ์€ํํ•˜๊ธฐ ์œ„ํ•ด ์ฐจ๋ฅผ ๋‹ค์‹œ ์ถœ๋ฐœํ•ด ์ธ๋„๋กœ ์ง„์ž…ํ•œ ๋’ค, ์ฐจ๋ฅผ ๋Œ์–ด๋‚ด์–ด ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋‹ค. ์ด ์‚ฌ๊ฑด์€ ์•„์ง ์ˆ˜์‚ฌ ์ค‘์ด๋ฉฐ, ๊น€์˜ ์€ํ–‰ ๊ณ„์ขŒ์™€ ๊ธˆ๊ณ  ๋‚ด๋ถ€ ๋ฌผ๊ฑด์ด ์•„์ง ํ™•๋ณด๋˜์ง€ ์•Š์•˜๋‹ค. ์‚ฌ๊ฑด์˜ ๊ฒฐ๋ง์€ ์•„์ง ๋ฏธ์ •์ด๋ฉฐ, ๊น€์ด ์‹ค์ œ๋กœ ๋ฌผ๊ฑด์„ ์ˆจ๊ฒผ๋Š”์ง€ ์—ฌ๋ถ€๊ฐ€ ์ˆ˜์‚ฌ์˜ ํ•ต์‹ฌ์ด๋‹ค. 
์‚ฌ๊ฑด ์œ ํ˜• : ์‚ด์ธ

Training Details

base_model = "./gemma-3-1b-it"
tokenizer = AutoTokenizer.from_pretrained(base_model)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

model2 = get_peft_model(model, lora_config)

Preprocessing [optional]

def format_prompt(ex, max_length=1775):
    global max_length_cal, row


    title = '๋‹ค์Œ ์‚ฌ๊ฑด ๋ณด๊ณ ์„œ๋ฅผ ํ†ตํ•ด์„œ ๋‹น์‹œ ์‚ฌ๊ฑด์„ ์žฌ๊ตฌ์„ฑํ•ด์ฃผ๊ณ , ์‚ฌ๊ฑด ์œ ํ˜•์„ ๋ถ„๋ฅ˜ํ•ด์ค˜\n'
    question = ex["report"]
    answer = ex["context"]
    kind = f'\n์‚ฌ๊ฑด ์œ ํ˜• : {ex["kind"]}'

    prompt = f"""<start_of_turn>system
๋‹น์‹ ์€ ๋ฒ”ํ–‰๋ถ„์„ AI ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
์งˆ๋ฌธ์— ์ •ํ™•ํ•˜๊ฒŒ ๋‹ต๋ณ€ํ•˜์„ธ์š”.
<end_of_turn>
<start_of_turn>user
{title}

[์‚ฌ๊ฑด ๋ณด๊ณ ์„œ]
{question}\n<end_of_turn>\n<start_of_turn>model\n"""

    prompt_ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]

    model_part = f"""{answer} {kind}<end_of_turn>"""
    answer_ids = tokenizer(model_part, add_special_tokens=False)["input_ids"]

    input_ids = prompt_ids + answer_ids
    labels = [-100] * len(prompt_ids) + answer_ids
    attention_mask = [1] * len(input_ids)

    pad_len = max_length - len(input_ids)
    input_ids += [tokenizer.pad_token_id] * pad_len
    attention_mask += [0] * pad_len
    labels += [-100] * pad_len

    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels,
    }

Training Hyperparameters

training_args = TrainingArguments(
  output_dir=model_path,
  per_device_train_batch_size=16,
  gradient_accumulation_steps=2,
  num_train_epochs=4, 
  learning_rate=2e-4, 
  bf16=True,
  fp16=False,
  gradient_checkpointing = False,
  logging_steps=5,
  eval_steps=300,                  
  save_strategy="steps",
  save_steps=300,                              
  save_total_limit=2,
  report_to="none",
  lr_scheduler_type="cosine",
  warmup_ratio=0.05, 
  optim="adamw_torch"
)

Speeds, Sizes, Times [optional]

[4844/4844 5:28:09, Epoch 4/4]
Step	Training Loss
10	2.668900
20	2.386700
30	2.226000
40	2.026500
50	1.896800
60	1.778100
70	1.733700
80	1.678000
90	1.622100
100	1.583400
200	1.399400
300	1.343400
400	1.290700
500	1.286000
600	1.229200
700	1.226100
800	1.216800
900	1.204500
1000	1.158700
1100	1.153700
1200	1.157400
1300	1.106100
1400	1.114300
1500	1.091600
1600	1.087400
1700	1.087200
1800	1.093100
1900	1.082200
2000	1.084200
2100	1.090000
2200	1.084300
2300	1.058100
2400	1.062100
2500	1.021800
2600	1.028800
2700	1.034700
2800	1.040300
2900	1.033000
3000	1.016200
3100	1.015800
3200	0.983400
3300	1.010900
3400	1.015900
3500	1.028600
3600	1.019400
3700	0.971100
3800	0.947100
3900	0.984500
4000	0.965500
4100	0.973800
4200	0.968100
4300	0.969000
4400	0.953500
4500	0.988400
4505	0.942400
4510	0.931300
4515	0.973200
4520	0.989900
4525	0.979900
4530	0.963000
4535	0.981700
4540	0.953700
4545	0.963700
4550	0.961500
4555	0.972000
4560	0.958600
4565	0.970500
4570	0.987800
4575	0.974500
4580	0.968700
4585	0.982000
4590	0.961300
4595	0.968400
4600	0.966000
4700	0.998300
4800	0.966200
4805	0.973900
4810	0.975200
4815	0.974400
4820	0.973400
4825	0.972000
4830	0.960200
4835	0.961700
4840	0.974000


Evaluation Result : 
{'eval_loss': 1.062408447265625, 'eval_runtime': 15.7428, 'eval_samples_per_second': 24.9, 'eval_steps_per_second': 3.113, 'epoch': 4.0}


Best checkpoint :
4510	0.931300

Model Card Authors [optional]

(์ฃผ)์ธ์ •๋ณด
ํ™ˆํŽ˜์ด์ง€ : http://www.ijbinfo.com

์ •๋ณดํ†ต์‹ ์‚ฐ์—…์ง„ํฅ์›์˜ ์ง€์›์„ ๋ฐ›์•„์„œ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

Model Card Contact

(์ฃผ)์ธ์ •๋ณด
์ฃผ์†Œ : ์„œ์šธ์‹œ ๊ธˆ์ฒœ๊ตฌ ๊ฐ€์‚ฐ๋™ 60-5 ๊ฐ‘์˜ฌ๊ทธ๋ ˆ์ดํŠธ๋ฐธ๋ฆฌA๋™ 805ํ˜ธ
์—ฐ๋ฝ์ฒ˜ : TEL : 02-3397-7765 FAX : 02-3397-7769 E-mail : sales@injungbo.co.kr
๋‹ด๋‹น์ž : ์žฅํ˜•์›(chyungwon@ijbinfo.com)

Framework versions

  • PEFT 0.15.2
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chyungwon/police-report-analysis-model-1b

Adapter
(171)
this model