license: apache-2.0
language:
- ko
- en
pipeline_tag: visual-question-answering
tags:
- text2text-generation
base_model: google/deplot
Ko-Deplot
Ko-Deplot is a korean Visual-QA model based on the Google's Pix2Struct architecture. It was fine-tuned from Deplot, using korean chart image-text pairs.
Ko-Deplot์ Google์ Pix2Struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ํ๊ตญ์ด Visual-QA ๋ชจ๋ธ์ ๋๋ค. Deplot ๋ชจ๋ธ์ ํ๊ตญ์ด ์ฐจํธ ์ด๋ฏธ์ง-ํ ์คํธ ์ ๋ฐ์ดํฐ์ ์ ์ด์ฉํ์ฌ ํ์ธํ๋ํ์์ต๋๋ค.
- Developed by: NUUA
- Model type: Visual Question Answering
- License: apache-2.0
- Finetuned from model: google/deplot
Model Usage
You can run a prediction by querying an input image together with a question as follows:
์๋์ ์ฝ๋๋ฅผ ์ด์ฉํ์ฌ ๋ชจ๋ธ ์ถ๋ก ์ ํ ์ ์์ต๋๋ค:
from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
from PIL import Image
processor = Pix2StructProcessor.from_pretrained('nuua/Ko-Deplot')
model = Pix2StructForConditionalGeneration.from_pretrained('nuua/Ko-Deplot')
IMAGE_PATH = "LOCAL_PATH_TO_IMAGE"
image = Image.open(IMAGE_PATH)
inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt")
predictions = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(predictions[0], skip_special_tokens=True))
Training Details
Training Data
Synthetic chart data from three libraries were used:
์ธ ๊ฐ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์์ ํฉ์ฑ ์ฐจํธ ๋ฐ์ดํฐ๋ฅผ ์์ฑํ์ฌ ์ฌ์ฉํ์์ต๋๋ค:
Training Procedure
The model was first exposed to a short warmup stage, following its original paper. It was then trained using the chart data for 50,000 steps.
ํ์ต์ ์ํด ์ฒ์ ์งง์ "warmup" ๋จ๊ณ๋ฅผ ๊ฑฐ์ณ ํ๊ธ์ ํ์ต์ํจ ํ 50,000 ์คํ ๋์ ์ฐจํธ ๋ฐ์ดํฐ๋ฅผ ํ์ต์์ผฐ์ต๋๋ค.
Technical Specifications
Hardware
Ko-Deplot was trained by using A100 80G.
A100 80G GPU๋ฅผ ์ด์ฉํ์ฌ ํ์ตํ์์ต๋๋ค.
Contact
Any questions and suggestions, please use the discussion tab. If you want to contact us directly, email robin@nuua.ai.