Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +142 -130
README.md CHANGED
@@ -1,131 +1,143 @@
1
- ---
2
- tags:
3
- - finance
4
- - accounting
5
- - stock
6
- - quant
7
- - economics
8
- language:
9
- - ko
10
- license: apache-2.0
11
- datasets:
12
- - aiqwe/FinShibainu
13
- base_model:
14
- - Qwen/Qwen2.5-7B-Instruct
15
- pipeline_tag: question-answering
16
- library_name: transformers
17
- ---
18
-
19
- # FinShibainu Model Card
20
-
21
- + github: [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)
22
- + dataset: [https://huggingface.co/datasets/aiqwe/FinShibainu](https://huggingface.co/datasets/aiqwe/FinShibainu)
23
-
24
- ๋ชจ๋ธ์€ [KRX LLM ๊ฒฝ์ง„๋Œ€ํšŒ ๋ฆฌ๋”๋ณด๋“œ](https://krxbench.koscom.co.kr/)์—์„œ ์šฐ์ˆ˜์ƒ์„ ์ˆ˜์ƒํ•œ shibainu24 ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๊ธˆ์œต, ํšŒ๊ณ„ ๋“ฑ ๊ธˆ์œต๊ด€๋ จ ์ง€์‹์— ๋Œ€ํ•œ Text Generation์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
25
-
26
- + Vanilla model : [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
27
-
28
- ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ ๋ฐ ํ•™์Šต์— ๊ด€๋ จ๋œ ์ฝ”๋“œ๋Š” [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)์— ์ž์„ธํ•˜๊ฒŒ ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
29
-
30
- # Usage
31
- [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)์˜ example์„ ์ฐธ์กฐํ•˜๋ฉด ์‰ฝ๊ฒŒ inference๋ฅผ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
32
- ๋Œ€๋ถ€๋ถ„์˜ Inference๋Š” RTX-3090 ์ด์ƒ์—์„œ ๋‹จ์ผ GPU ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
33
-
34
- ```shell
35
- pip install vllm
36
- ```
37
-
38
- ```python
39
- import pandas as pd
40
- from vllm import LLM
41
-
42
- inputs = [
43
- "์™ธํ™˜์‹œ์žฅ์—์„œ ์ผ๋ณธ ์—”ํ™”์™€ ๋ฏธ๊ตญ ๋‹ฌ๋Ÿฌ์˜ ํ™˜์œจ์ด ๋‘ ์‹œ์žฅ์—์„œ ์•ฝ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ์ด๋•Œ ๋ฌด์œ„ํ—˜ ์ด์ต์„ ์–ป๊ธฐ ์œ„ํ•œ ์ ์ ˆํ•œ ๊ฑฐ๋ž˜ ์ „๋žต์€ ๋ฌด์—‡์ธ๊ฐ€?",
44
- "์‹ ์ฃผ์ธ์ˆ˜๊ถŒ๋ถ€์‚ฌ์ฑ„(BW)์—์„œ ์ฑ„๊ถŒ์ž๊ฐ€ ์‹ ์ฃผ์ธ์ˆ˜๊ถŒ์„ ํ–‰์‚ฌํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์–ด๋–ค ์ผ์ด ๋ฐœ์ƒํ•˜๋Š”๊ฐ€?",
45
- "๊ณต๋งค๋„(Short Selling)์— ๋Œ€ํ•œ ์„ค๋ช…์œผ๋กœ ์˜ณ์ง€ ์•Š์€ ๊ฒƒ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?"
46
- ]
47
-
48
- llm = LLM(model="aiqwe/krx-llm-competition", tensor_parallel_size=1)
49
- sampling_params = SamplingParams(temperature=0.7, max_tokens=128)
50
- outputs = llm.generate(inputs, sampling_params)
51
- for o in outputs:
52
- print(o.prompt)
53
- print(o.outputs[0].text)
54
- print("*"*100)
55
- ```
56
-
57
- # Model Card
58
- | Contents | Spec |
59
- |--------------------------------|-------------------------------------|
60
- | Base model | Qwen2.5-7B-Instruct |
61
- | dtype | bfloat16 |
62
- | PEFT | LoRA (r=8, alpha=64) |
63
- | Learning Rate | 1e-5 (varies by further training) |
64
- | LRScheduler | Cosine (warm-up: 0.05%) |
65
- | Optimizer | AdamW |
66
- | Distributed / Efficient Tuning | DeepSpeed v3, Flash Attention |
67
-
68
- # Datset Card
69
- Reference ๋ฐ์ดํ„ฐ์…‹์€ ์ผ๋ถ€ ์ €์ž‘๊ถŒ ๊ด€๊ณ„๋กœ ์ธํ•ด Link๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
70
- MCQA์™€ QA ๋ฐ์ดํ„ฐ์…‹์€ [https://huggingface.co/datasets/aiqwe/FinShibainu](https://huggingface.co/datasets/aiqwe/FinShibainu)์œผ๋กœ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.
71
- ๋˜ํ•œ [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)๋ฅผ ์ด์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ์œ ํ‹ธ๋ฆฌํ‹ฐ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์†Œ์‹ฑ Pipeline์„ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
72
-
73
- ## References
74
- | ๋ฐ์ดํ„ฐ๋ช… | url |
75
- |-----------------------------------|------------------------------------------------------------------------------------------|
76
- | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | [Link](https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765) |
77
- | ์žฌ๋ฌดํšŒ๊ณ„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ | ์ž์ฒด ์ œ์ž‘ |
78
- | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=42088&categoryId=42088) |
79
- | web-text.synthetic.dataset-50k | [Link](https://huggingface.co/datasets/Cartinoe5930/web_text_synthetic_dataset_50k) |
80
- | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
81
- | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | [Link](http://open.krx.co.kr/contents/OPN04/04020000/OPN04020000.jsp#b8943a5f87282cde0d653d1ae73431c9=1) |
82
- | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | [Link](https://law.krx.co.kr/las/TopFrame.jsp&KRX) |
83
- | ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_beginner.pdf) |
84
- | ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒํˆฌ์ž | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_teen.pdf) |
85
- | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | [Link](https://opendart.fss.or.kr/) |
86
- | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
87
-
88
- ## MCQA
89
- MCQA ๋ฐ์ดํ„ฐ๋Š” Reference๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์ง€์„ ๋‹คํ˜• ๋ฌธ์ œ๋ฅผ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค. ๋ฌธ์ œ์™€ ๋‹ต ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Reasoning ํ…์ŠคํŠธ๊นŒ์ง€ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต์— ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
90
- ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋Š” ์•ฝ 4.5๋งŒ๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ด๋ฉฐ, tiktoken์˜ o200k_base(gpt-4o, gpt-4o-mini Tokenizer)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ด 2์ฒœ๋งŒ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
91
- | ๋ฐ์ดํ„ฐ๋ช… | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜ |
92
- |--------------------------------------|-----------|--------------|
93
- | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | 1,203 | 277,114 |
94
- | ์žฌ๋ฌดํšŒ๊ณ„ ๋ชฉ์ฐจ๋ฅผ ์ด์šฉํ•œ ํ•ฉ์„ฑ๋ฐ์ดํ„ฐ | 451 | 99,770 |
95
- | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | 827 | 214,297 |
96
- | hf_web_text_synthetic_dataset_50k | 25,461 | 7,563,529 |
97
- | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 2,314 | 589,763 |
98
- | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | 1,183 | 230,148 |
99
- | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | 3,015 | 580,556 |
100
- | ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ | 599 | 116,472 |
101
- | ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒ ํˆฌ์ž | 408 | 77,037 |
102
- | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | 3,574 | 629,807 |
103
- | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 7,410 | 1,545,842 |
104
- | **ํ•ฉ๊ณ„** | **46,445**| **19,998,931**|
105
-
106
- ## QA
107
- QA ๋ฐ์ดํ„ฐ๋Š” Reference์™€ ์งˆ๋ฌธ์„ ํ•จ๊ป˜ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€๊ณผ Reference ์—†์ด ์งˆ๋ฌธ๋งŒ์„ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€ 2๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
108
- Reference๋ฅผ ์ œ๊ณต๋ฐ›์œผ๋ฉด ๋ชจ๋ธ์€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ํ•˜์ง€๋งŒ ๋ชจ๋ธ๋งŒ์˜ ์ง€์‹์ด ์ œํ•œ๋˜์–ด ๋‹ต๋ณ€์ด ์ข€๋” ์งง์•„์ง€๊ฑฐ๋‚˜ ๋‹ค์–‘์„ฑ์ด ์ค„์–ด๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
109
- ์ด 4.8๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ 2์–ต๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
110
- | ๋ฐ์ดํ„ฐ๋ช… | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜ |
111
- |--------------------------------------|-----------|--------------|
112
- | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | 1,023 | 846,970 |
113
- | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | 4,128 | 3,181,831 |
114
- | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 6,526 | 5,311,890 |
115
- | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | 1,510 | 1,089,342 |
116
- | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | 4,858 | 3,587,059 |
117
- | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | 3,574 | 629,807 |
118
- | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 29,920 | 5,981,839 |
119
- | **ํ•ฉ๊ณ„** | **47,965**| **199,998,931**|
120
-
121
- # Citation
122
- ```bibitex
123
- @misc{jaylee2024finshibainu,
124
- author = {Jay Lee},
125
- title = {FinShibainu: Korean specified finance model},
126
- year = {2024},
127
- publisher = {GitHub},
128
- journal = {GitHub repository},
129
- url = {https://github.com/aiqwe/FinShibainu}
130
- }
 
 
 
 
 
 
 
 
 
 
 
 
131
  ```
 
1
+ ---
2
+ tags:
3
+ - finance
4
+ - accounting
5
+ - stock
6
+ - quant
7
+ - economics
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ license: apache-2.0
23
+ datasets:
24
+ - aiqwe/FinShibainu
25
+ base_model:
26
+ - Qwen/Qwen2.5-7B-Instruct
27
+ pipeline_tag: question-answering
28
+ library_name: transformers
29
+ ---
30
+
31
+ # FinShibainu Model Card
32
+
33
+ + github: [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)
34
+ + dataset: [https://huggingface.co/datasets/aiqwe/FinShibainu](https://huggingface.co/datasets/aiqwe/FinShibainu)
35
+
36
+ ๋ชจ๋ธ์€ [KRX LLM ๊ฒฝ์ง„๋Œ€ํšŒ ๋ฆฌ๋”๋ณด๋“œ](https://krxbench.koscom.co.kr/)์—์„œ ์šฐ์ˆ˜์ƒ์„ ์ˆ˜์ƒํ•œ shibainu24 ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๊ธˆ์œต, ํšŒ๊ณ„ ๋“ฑ ๊ธˆ์œต๊ด€๋ จ ์ง€์‹์— ๋Œ€ํ•œ Text Generation์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
37
+
38
+ + Vanilla model : [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
39
+
40
+ ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ ๋ฐ ํ•™์Šต์— ๊ด€๋ จ๋œ ์ฝ”๋“œ๋Š” [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)์— ์ž์„ธํ•˜๊ฒŒ ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
41
+
42
+ # Usage
43
+ [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)์˜ example์„ ์ฐธ์กฐํ•˜๋ฉด ์‰ฝ๊ฒŒ inference๋ฅผ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
44
+ ๋Œ€๋ถ€๋ถ„์˜ Inference๋Š” RTX-3090 ์ด์ƒ์—์„œ ๋‹จ์ผ GPU ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
45
+
46
+ ```shell
47
+ pip install vllm
48
+ ```
49
+
50
+ ```python
51
+ import pandas as pd
52
+ from vllm import LLM
53
+
54
+ inputs = [
55
+ "์™ธํ™˜์‹œ์žฅ์—์„œ ์ผ๋ณธ ์—”ํ™”์™€ ๋ฏธ๊ตญ ๋‹ฌ๋Ÿฌ์˜ ํ™˜์œจ์ด ๋‘ ์‹œ์žฅ์—์„œ ์•ฝ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ์ด๋•Œ ๋ฌด์œ„ํ—˜ ์ด์ต์„ ์–ป๊ธฐ ์œ„ํ•œ ์ ์ ˆํ•œ ๊ฑฐ๋ž˜ ์ „๋žต์€ ๋ฌด์—‡์ธ๊ฐ€?",
56
+ "์‹ ์ฃผ์ธ์ˆ˜๊ถŒ๋ถ€์‚ฌ์ฑ„(BW)์—์„œ ์ฑ„๊ถŒ์ž๊ฐ€ ์‹ ์ฃผ์ธ์ˆ˜๊ถŒ์„ ํ–‰์‚ฌํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์–ด๋–ค ์ผ์ด ๋ฐœ์ƒํ•˜๋Š”๊ฐ€?",
57
+ "๊ณต๋งค๋„(Short Selling)์— ๋Œ€ํ•œ ์„ค๋ช…์œผ๋กœ ์˜ณ์ง€ ์•Š์€ ๊ฒƒ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?"
58
+ ]
59
+
60
+ llm = LLM(model="aiqwe/krx-llm-competition", tensor_parallel_size=1)
61
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=128)
62
+ outputs = llm.generate(inputs, sampling_params)
63
+ for o in outputs:
64
+ print(o.prompt)
65
+ print(o.outputs[0].text)
66
+ print("*"*100)
67
+ ```
68
+
69
+ # Model Card
70
+ | Contents | Spec |
71
+ |--------------------------------|-------------------------------------|
72
+ | Base model | Qwen2.5-7B-Instruct |
73
+ | dtype | bfloat16 |
74
+ | PEFT | LoRA (r=8, alpha=64) |
75
+ | Learning Rate | 1e-5 (varies by further training) |
76
+ | LRScheduler | Cosine (warm-up: 0.05%) |
77
+ | Optimizer | AdamW |
78
+ | Distributed / Efficient Tuning | DeepSpeed v3, Flash Attention |
79
+
80
+ # Datset Card
81
+ Reference ๋ฐ์ดํ„ฐ์…‹์€ ์ผ๋ถ€ ์ €์ž‘๊ถŒ ๊ด€๊ณ„๋กœ ์ธํ•ด Link๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
82
+ MCQA์™€ QA ๋ฐ์ดํ„ฐ์…‹์€ [https://huggingface.co/datasets/aiqwe/FinShibainu](https://huggingface.co/datasets/aiqwe/FinShibainu)์œผ๋กœ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.
83
+ ๋˜ํ•œ [https://github.com/aiqwe/FinShibainu](https://github.com/aiqwe/FinShibainu)๋ฅผ ์ด์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ์œ ํ‹ธ๋ฆฌํ‹ฐ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์†Œ์‹ฑ Pipeline์„ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
84
+
85
+ ## References
86
+ | ๋ฐ์ดํ„ฐ๋ช… | url |
87
+ |-----------------------------------|------------------------------------------------------------------------------------------|
88
+ | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | [Link](https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765) |
89
+ | ์žฌ๋ฌดํšŒ๊ณ„ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ | ์ž์ฒด ์ œ์ž‘ |
90
+ | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=42088&categoryId=42088) |
91
+ | web-text.synthetic.dataset-50k | [Link](https://huggingface.co/datasets/Cartinoe5930/web_text_synthetic_dataset_50k) |
92
+ | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
93
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | [Link](http://open.krx.co.kr/contents/OPN04/04020000/OPN04020000.jsp#b8943a5f87282cde0d653d1ae73431c9=1) |
94
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | [Link](https://law.krx.co.kr/las/TopFrame.jsp&KRX) |
95
+ | ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_beginner.pdf) |
96
+ | ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒํˆฌ์ž | [Link](https://main.krxverse.co.kr/_contents/ACA/02010200/file/220104_teen.pdf) |
97
+ | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | [Link](https://opendart.fss.or.kr/) |
98
+ | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | [Link](https://terms.naver.com/list.naver?cid=43668&categoryId=43668) |
99
+
100
+ ## MCQA
101
+ MCQA ๋ฐ์ดํ„ฐ๋Š” Reference๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์ง€์„ ๋‹คํ˜• ๋ฌธ์ œ๋ฅผ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค. ๋ฌธ์ œ์™€ ๋‹ต ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Reasoning ํ…์ŠคํŠธ๊นŒ์ง€ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต์— ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
102
+ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋Š” ์•ฝ 4.5๋งŒ๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ด๋ฉฐ, tiktoken์˜ o200k_base(gpt-4o, gpt-4o-mini Tokenizer)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ด 2์ฒœ๋งŒ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
103
+ | ๋ฐ์ดํ„ฐ๋ช… | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜ |
104
+ |--------------------------------------|-----------|--------------|
105
+ | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | 1,203 | 277,114 |
106
+ | ์žฌ๋ฌดํšŒ๊ณ„ ๋ชฉ์ฐจ๋ฅผ ์ด์šฉํ•œ ํ•ฉ์„ฑ๋ฐ์ดํ„ฐ | 451 | 99,770 |
107
+ | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | 827 | 214,297 |
108
+ | hf_web_text_synthetic_dataset_50k | 25,461 | 7,563,529 |
109
+ | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 2,314 | 589,763 |
110
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | 1,183 | 230,148 |
111
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | 3,015 | 580,556 |
112
+ | ์ดˆ๋ณดํˆฌ์ž์ž ์ฆ๊ถŒ๋”ฐ๋ผ์žก๊ธฐ | 599 | 116,472 |
113
+ | ์ฒญ์†Œ๋…„์„ ์œ„ํ•œ ์ฆ๊ถŒ ํˆฌ์ž | 408 | 77,037 |
114
+ | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | 3,574 | 629,807 |
115
+ | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 7,410 | 1,545,842 |
116
+ | **ํ•ฉ๊ณ„** | **46,445**| **19,998,931**|
117
+
118
+ ## QA
119
+ QA ๋ฐ์ดํ„ฐ๋Š” Reference์™€ ์งˆ๋ฌธ์„ ํ•จ๊ป˜ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€๊ณผ Reference ์—†์ด ์งˆ๋ฌธ๋งŒ์„ Input์œผ๋กœ ๋ฐ›์•„ ์ƒ์„ฑํ•œ ๋‹ต๋ณ€ 2๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
120
+ Reference๋ฅผ ์ œ๊ณต๋ฐ›์œผ๋ฉด ๋ชจ๋ธ์€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ํ•˜์ง€๋งŒ ๋ชจ๋ธ๋งŒ์˜ ์ง€์‹์ด ์ œํ•œ๋˜์–ด ๋‹ต๋ณ€์ด ์ข€๋” ์งง์•„์ง€๊ฑฐ๋‚˜ ๋‹ค์–‘์„ฑ์ด ์ค„์–ด๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
121
+ ์ด 4.8๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ 2์–ต๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
122
+ | ๋ฐ์ดํ„ฐ๋ช… | ๋ฐ์ดํ„ฐ ์ˆ˜ | ํ† ํฐ ์ˆ˜ |
123
+ |--------------------------------------|-----------|--------------|
124
+ | ํ•œ๊ตญ์€ํ–‰ ๊ฒฝ์ œ๊ธˆ์œต ์šฉ์–ด 700์„  | 1,023 | 846,970 |
125
+ | ๊ธˆ์œต๊ฐ๋…์šฉ์–ด์‚ฌ์ „ | 4,128 | 3,181,831 |
126
+ | ์ง€์‹๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 6,526 | 5,311,890 |
127
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ ๋น„์ •๊ธฐ ๊ฐ„ํ–‰๋ฌผ | 1,510 | 1,089,342 |
128
+ | ํ•œ๊ตญ๊ฑฐ๋ž˜์†Œ๊ทœ์ • | 4,858 | 3,587,059 |
129
+ | ๊ธฐ์—…์‚ฌ์—…๋ณด๊ณ ์„œ ๊ณต์‹œ์ž๋ฃŒ | 3,574 | 629,807 |
130
+ | ์‹œ์‚ฌ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „ | 29,920 | 5,981,839 |
131
+ | **ํ•ฉ๊ณ„** | **47,965**| **199,998,931**|
132
+
133
+ # Citation
134
+ ```bibitex
135
+ @misc{jaylee2024finshibainu,
136
+ author = {Jay Lee},
137
+ title = {FinShibainu: Korean specified finance model},
138
+ year = {2024},
139
+ publisher = {GitHub},
140
+ journal = {GitHub repository},
141
+ url = {https://github.com/aiqwe/FinShibainu}
142
+ }
143
  ```