dan-text2sql commited on
Commit
d8bdce3
·
verified ·
1 Parent(s): 6ed66a3

Initial commit: Create v2 model card (Gemma-3)

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: unsloth/gemma-3-27b-it-bnb-4bit
4
+ tags:
5
+ - text-to-sql
6
+ - gemma-3
7
+ - unsloth
8
+ - trl
9
+ - finance
10
+ - real-estate
11
+ - nlp
12
+ datasets:
13
+ - dan-text2sql/seoul-realestate-sql-v1
14
+ library_name: transformers
15
+ ---
16
+
17
+ # seoul-realestate-sql-agent-v2
18
+
19
+ **Developed by:** dan-text2sql
20
+ **License:** apache-2.0
21
+ **Finetuned from model:** [unsloth/gemma-3-27b-it-bnb-4bit](https://huggingface.co/unsloth/gemma-3-27b-it-bnb-4bit)
22
+
23
+ This model is a **Text-to-SQL agent** specialized in **Korean Real Estate (Seoul)** data.
24
+ It was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
25
+
26
+ ## Model Description (v2)
27
+ This is the **v2** version of the Seoul Real Estate SQL Agent.
28
+ * **Base Model:** Gemma-3 27B (IT)
29
+ * **Improvement:** Unlike v1 (Mistral-7B), this model leverages the massive 27B parameter size of Gemma-3.
30
+ * **Objective:** Translate natural language queries about Seoul apartment real estate data into executable SQL queries.
31
+
32
+ ## Usage Example
33
+
34
+ ```python
35
+ from unsloth import FastLanguageModel
36
+
37
+ # Load the model
38
+ model, tokenizer = FastLanguageModel.from_pretrained(
39
+ model_name = "dan-text2sql/seoul-realestate-sql-agent-v2",
40
+ max_seq_length = 2048,
41
+ dtype = None,
42
+ load_in_4bit = True,
43
+ )
44
+ FastLanguageModel.for_inference(model)
45
+
46
+ # Test Prompt
47
+ prompt = """아래 질문에 대한 올바른 SQL 쿼리를 작성해주세요.
48
+
49
+ ### 질문:
50
+ 서울시 강남구 삼성동의 20억 이하 아파트 매물을 찾아줘.
51
+
52
+ ### SQL:
53
+ """
54
+
55
+ inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")
56
+ outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
57
+ print(tokenizer.batch_decode(outputs)[0])