File size: 2,448 Bytes
2975847
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
629449c
 
d8d3851
 
 
2975847
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
language: ko
license: apache-2.0
tags:
- sql
- text-to-sql
- nl2sql
- financial-domain
- pytorch
datasets:
- custom
metrics:
- accuracy
- f1
---
## Colab Notebook


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vaGZTZ7y0SYLarCX0QemkUernLyohswz?usp=sharing)


## ν•™μŠ΅ 데이터셋
[AI hub][μžμ—°μ–΄ 기반 질의(NL2SQL) 검색 생성 데이터](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&searchKeyword=%EC%9E%90%EC%97%B0%EC%96%B4%20%EA%B8%B0%EB%B0%98%20%EC%A7%88%EC%9D%98(NL2SQL)%20%EA%B2%80%EC%83%89%20%EC%83%9D%EC%84%B1%20%EB%8D%B0%EC%9D%B4%ED%84%B0&aihubDataSe=data&dataSetSn=71351)

https://huggingface.co/combe4259/NHSQLNL/blob/main/TEXT_NL2SQL_label_nh_consultation.json
https://huggingface.co/combe4259/NHSQLNL/blob/main/nh_consultation_db_annotation.json
# NHSQLNL: 금육 μžμ—°μ–΄ β†’ SQL λ³€ν™˜ λͺ¨λΈ

`NHSQLNL`은 ν•œκ΅­μ–΄ 금육 μžμ—°μ–΄ 질의λ₯Ό SQL 쿼리둜 λ³€ν™˜ν•˜λŠ” **Text-to-SQL (NL2SQL)** λͺ¨λΈμž…λ‹ˆλ‹€.  
은행 및 금육ꢌ 도메인 질의λ₯Ό λ°μ΄ν„°λ² μ΄μŠ€ 질의(SQL)둜 μžλ™ λ³€ν™˜ν•˜μ—¬, 고객 질의 응닡 μ‹œμŠ€ν…œ 및 금육 데이터 뢄석에 ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

---

## μ£Όμš” κΈ°λŠ₯ (Features)

- ν•œκ΅­μ–΄ 금육 도메인 μžμ—°μ–΄ μž…λ ₯을 SQL 쿼리둜 λ³€ν™˜  
- 사전 μ •μ˜λœ μŠ€ν‚€λ§ˆμ— 맞좘 μ•ˆμ „ν•œ SQL 생성  
- PyTorch 및 Hugging Face `transformers` 기반  

---

## μ‚¬μš© 방법 (How to Use)

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# λͺ¨λΈ λ‘œλ“œ
MODEL_PATH = "combe4259/NHSQLNL"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_PATH)

# μž…λ ₯ 질의
query = "2023년에 κ°œμ„€λœ 예금 κ³„μ’Œ 수λ₯Ό μ•Œλ €μ€˜"

inputs = tokenizer(query, return_tensors="pt")

# SQL 예츑
outputs = model.generate(**inputs, max_length=128)
sql = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("μž…λ ₯:", query)
print("μƒμ„±λœ SQL:", sql)


---

## ν•™μŠ΅ 데이터 (Training Data)

- 자체 κ΅¬μΆ•ν•œ 금육 도메인 **μžμ—°μ–΄ ↔ SQL λ§€ν•‘ 데이터셋** μ‚¬μš©  
- 데이터 μ „μ²˜λ¦¬: SQL μŠ€ν‚€λ§ˆ μ •κ·œν™” 및 ν† ν¬λ‚˜μ΄μ € 기반 μž…λ ₯ λ³€ν™˜  

---
---

## ν™œμš© κ°€λŠ₯ λΆ„μ•Ό (Applications)

- 금육ꢌ 챗봇 및 상담 μžλ™ν™”  
- μžμ—°μ–΄ 기반 데이터 쑰회 및 리포트 생성  
- λΉ„μ „λ¬Έκ°€ λŒ€μƒ SQL ν•™μŠ΅/μ—°μŠ΅ 도ꡬ