|
|
--- |
|
|
language: ko |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- sql |
|
|
- text-to-sql |
|
|
- nl2sql |
|
|
- financial-domain |
|
|
- pytorch |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
--- |
|
|
## Colab Notebook |
|
|
|
|
|
|
|
|
[](https://colab.research.google.com/drive/1vaGZTZ7y0SYLarCX0QemkUernLyohswz?usp=sharing) |
|
|
|
|
|
|
|
|
## νμ΅ λ°μ΄ν°μ
|
|
|
[AI hub][μμ°μ΄ κΈ°λ° μ§μ(NL2SQL) κ²μ μμ± λ°μ΄ν°](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&searchKeyword=%EC%9E%90%EC%97%B0%EC%96%B4%20%EA%B8%B0%EB%B0%98%20%EC%A7%88%EC%9D%98(NL2SQL)%20%EA%B2%80%EC%83%89%20%EC%83%9D%EC%84%B1%20%EB%8D%B0%EC%9D%B4%ED%84%B0&aihubDataSe=data&dataSetSn=71351) |
|
|
|
|
|
https://huggingface.co/combe4259/NHSQLNL/blob/main/TEXT_NL2SQL_label_nh_consultation.json |
|
|
https://huggingface.co/combe4259/NHSQLNL/blob/main/nh_consultation_db_annotation.json |
|
|
# NHSQLNL: κΈμ΅ μμ°μ΄ β SQL λ³ν λͺ¨λΈ |
|
|
|
|
|
`NHSQLNL`μ νκ΅μ΄ κΈμ΅ μμ°μ΄ μ§μλ₯Ό SQL μΏΌλ¦¬λ‘ λ³ννλ **Text-to-SQL (NL2SQL)** λͺ¨λΈμ
λλ€. |
|
|
μν λ° κΈμ΅κΆ λλ©μΈ μ§μλ₯Ό λ°μ΄ν°λ² μ΄μ€ μ§μ(SQL)λ‘ μλ λ³ννμ¬, κ³ κ° μ§μ μλ΅ μμ€ν
λ° κΈμ΅ λ°μ΄ν° λΆμμ νμ©ν μ μμ΅λλ€. |
|
|
|
|
|
--- |
|
|
|
|
|
## μ£Όμ κΈ°λ₯ (Features) |
|
|
|
|
|
- νκ΅μ΄ κΈμ΅ λλ©μΈ μμ°μ΄ μ
λ ₯μ SQL μΏΌλ¦¬λ‘ λ³ν |
|
|
- μ¬μ μ μλ μ€ν€λ§μ λ§μΆ μμ ν SQL μμ± |
|
|
- PyTorch λ° Hugging Face `transformers` κΈ°λ° |
|
|
|
|
|
--- |
|
|
|
|
|
## μ¬μ© λ°©λ² (How to Use) |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
|
|
# λͺ¨λΈ λ‘λ |
|
|
MODEL_PATH = "combe4259/NHSQLNL" |
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_PATH) |
|
|
|
|
|
# μ
λ ₯ μ§μ |
|
|
query = "2023λ
μ κ°μ€λ μκΈ κ³μ’ μλ₯Ό μλ €μ€" |
|
|
|
|
|
inputs = tokenizer(query, return_tensors="pt") |
|
|
|
|
|
# SQL μμΈ‘ |
|
|
outputs = model.generate(**inputs, max_length=128) |
|
|
sql = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
print("μ
λ ₯:", query) |
|
|
print("μμ±λ SQL:", sql) |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## νμ΅ λ°μ΄ν° (Training Data) |
|
|
|
|
|
- μ체 ꡬμΆν κΈμ΅ λλ©μΈ **μμ°μ΄ β SQL λ§€ν λ°μ΄ν°μ
** μ¬μ© |
|
|
- λ°μ΄ν° μ μ²λ¦¬: SQL μ€ν€λ§ μ κ·ν λ° ν ν¬λμ΄μ κΈ°λ° μ
λ ₯ λ³ν |
|
|
|
|
|
--- |
|
|
--- |
|
|
|
|
|
## νμ© κ°λ₯ λΆμΌ (Applications) |
|
|
|
|
|
- κΈμ΅κΆ μ±λ΄ λ° μλ΄ μλν |
|
|
- μμ°μ΄ κΈ°λ° λ°μ΄ν° μ‘°ν λ° λ¦¬ν¬νΈ μμ± |
|
|
- λΉμ λ¬Έκ° λμ SQL νμ΅/μ°μ΅ λꡬ |
|
|
|