Manju080 commited on
Commit
0d8581e
·
1 Parent(s): de4b07f

Initial deployment test_to_sql test1

Browse files
README.md CHANGED
@@ -1,13 +1,100 @@
1
- ---
2
- title: Text To Sql Converter
3
- emoji: 📚
4
- colorFrom: gray
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 5.35.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Text-to-SQL Converter
2
+
3
+ A powerful AI model that converts natural language questions into SQL queries. This model is fine-tuned on CodeT5 and provides an intuitive web interface for easy interaction.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **Natural Language to SQL**: Convert plain English questions to SQL queries
8
+ - **Web Interface**: Beautiful ChatGPT-like interface for easy interaction
9
+ - **Batch Processing**: Handle multiple queries at once
10
+ - **Real-time Generation**: Fast and accurate SQL generation
11
+ - **Health Monitoring**: Built-in health checks and monitoring
12
+
13
+ ## 🎯 Usage
14
+
15
+ ### Web Interface
16
+ Simply visit the web interface and:
17
+ 1. Enter your question in natural language
18
+ 2. Provide the table headers (comma-separated)
19
+ 3. Click "Generate SQL Query" to get your SQL
20
+
21
+ ### API Usage
22
+
23
+ #### Single Query
24
+ ```python
25
+ import requests
26
+
27
+ response = requests.post("https://your-space-url.hf.space/predict", json={
28
+ "question": "How many employees are older than 30?",
29
+ "table_headers": ["id", "name", "age", "department", "salary"]
30
+ })
31
+
32
+ sql_query = response.json()["sql_query"]
33
+ print(sql_query)
34
+ ```
35
+
36
+ #### Batch Queries
37
+ ```python
38
+ response = requests.post("https://your-space-url.hf.space/batch", json={
39
+ "queries": [
40
+ {
41
+ "question": "How many employees are older than 30?",
42
+ "table_headers": ["id", "name", "age", "department", "salary"]
43
+ },
44
+ {
45
+ "question": "Show all employees in IT department",
46
+ "table_headers": ["id", "name", "age", "department", "salary"]
47
+ }
48
+ ]
49
+ })
50
+
51
+ results = response.json()["results"]
52
+ ```
53
+
54
+ ## 📊 Example Queries
55
+
56
+ | Question | Table Headers | Generated SQL |
57
+ |----------|---------------|---------------|
58
+ | "How many employees are older than 30?" | id, name, age, department, salary | `SELECT COUNT(*) FROM table WHERE age > 30` |
59
+ | "Show all employees in IT department" | id, name, age, department, salary | `SELECT * FROM table WHERE department = 'IT'` |
60
+ | "What is the average salary by department?" | id, name, age, department, salary | `SELECT department, AVG(salary) FROM table GROUP BY department` |
61
+
62
+ ## 🔧 API Endpoints
63
+
64
+ - `GET /` - Web interface
65
+ - `GET /api` - API information
66
+ - `POST /predict` - Generate SQL for single question
67
+ - `POST /batch` - Generate SQL for multiple questions
68
+ - `GET /health` - Health check
69
+ - `GET /docs` - Interactive API documentation
70
+
71
+ ## 🏗️ Model Architecture
72
+
73
+ This model is based on **Salesforce CodeT5** and fine-tuned specifically for text-to-SQL conversion using PEFT (Parameter Efficient Fine-Tuning). The model has been trained on a diverse dataset of natural language questions and their corresponding SQL queries.
74
+
75
+ ### Model Details
76
+ - **Base Model**: Salesforce/codet5-base
77
+ - **Fine-tuning**: PEFT (LoRA)
78
+ - **Input Format**: Structured text with table headers and questions
79
+ - **Output**: SQL queries
80
+
81
+ ## 🚀 Deployment
82
+
83
+ This application is deployed on Hugging Face Spaces and can be accessed via the provided URL. The deployment includes:
84
+
85
+ - FastAPI backend
86
+ - Modern web interface
87
+ - Model serving with automatic scaling
88
+ - Health monitoring
89
+
90
+ ## 📝 License
91
+
92
+ This project is open source and available under the MIT License.
93
+
94
+ ## 🤝 Contributing
95
+
96
+ Contributions are welcome! Please feel free to submit a Pull Request.
97
+
98
+ ## 📞 Support
99
+
100
+ If you encounter any issues or have questions, please open an issue on the repository.
app.py ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException
2
+ from fastapi.responses import HTMLResponse
3
+ from fastapi.staticfiles import StaticFiles
4
+ from pydantic import BaseModel
5
+ from typing import List, Optional
6
+ import uvicorn
7
+ import logging
8
+ from model_utils import get_model
9
+ import time
10
+ import os
11
+ from contextlib import asynccontextmanager
12
+
13
+ # Configure logging
14
+ logging.basicConfig(level=logging.INFO)
15
+ logger = logging.getLogger(__name__)
16
+
17
+ # Global model instance
18
+ model = None
19
+
20
+ @asynccontextmanager
21
+ async def lifespan(app: FastAPI):
22
+ # Startup
23
+ global model
24
+ logger.info("Starting Text-to-SQL API...")
25
+ try:
26
+ model = get_model()
27
+ logger.info("Model loaded successfully!")
28
+ except Exception as e:
29
+ logger.error(f"Failed to load model: {str(e)}")
30
+ raise
31
+ yield
32
+ # Shutdown
33
+ logger.info("Shutting down Text-to-SQL API...")
34
+
35
+ # Create FastAPI app
36
+ app = FastAPI(
37
+ title="Text-to-SQL API",
38
+ description="API for converting natural language questions to SQL queries",
39
+ version="1.0.0",
40
+ lifespan=lifespan
41
+ )
42
+
43
+ # Pydantic models for request/response
44
+ class SQLRequest(BaseModel):
45
+ question: str
46
+ table_headers: List[str]
47
+
48
+ class SQLResponse(BaseModel):
49
+ question: str
50
+ table_headers: List[str]
51
+ sql_query: str
52
+ processing_time: float
53
+
54
+ class BatchRequest(BaseModel):
55
+ queries: List[SQLRequest]
56
+
57
+ class BatchResponse(BaseModel):
58
+ results: List[SQLResponse]
59
+ total_queries: int
60
+ successful_queries: int
61
+
62
+ class HealthResponse(BaseModel):
63
+ status: str
64
+ model_loaded: bool
65
+ timestamp: float
66
+
67
+
68
+
69
+
70
+
71
+ @app.get("/", response_class=HTMLResponse)
72
+ async def root():
73
+ """Serve the main HTML interface"""
74
+ try:
75
+ with open("index.html", "r", encoding="utf-8") as f:
76
+ return HTMLResponse(content=f.read())
77
+ except FileNotFoundError:
78
+ return HTMLResponse(content="""
79
+ <html>
80
+ <body>
81
+ <h1>Text-to-SQL API</h1>
82
+ <p>index.html not found. Please ensure the file exists in the same directory.</p>
83
+ </body>
84
+ </html>
85
+ """)
86
+
87
+ @app.get("/api", response_model=dict)
88
+ async def api_info():
89
+ """API information endpoint"""
90
+ return {
91
+ "message": "Text-to-SQL API",
92
+ "version": "1.0.0",
93
+ "endpoints": {
94
+ "/": "GET - Web interface",
95
+ "/api": "GET - API information",
96
+ "/predict": "POST - Generate SQL from single question",
97
+ "/batch": "POST - Generate SQL from multiple questions",
98
+ "/health": "GET - Health check",
99
+ "/docs": "GET - API documentation"
100
+ }
101
+ }
102
+
103
+ @app.post("/predict", response_model=SQLResponse)
104
+ async def predict_sql(request: SQLRequest):
105
+ """
106
+ Generate SQL query from a natural language question
107
+
108
+ Args:
109
+ request: SQLRequest containing question and table headers
110
+
111
+ Returns:
112
+ SQLResponse with generated SQL query
113
+ """
114
+ if model is None:
115
+ raise HTTPException(status_code=503, detail="Model not loaded")
116
+
117
+ start_time = time.time()
118
+
119
+ try:
120
+ sql_query = model.predict(request.question, request.table_headers)
121
+ processing_time = time.time() - start_time
122
+
123
+ return SQLResponse(
124
+ question=request.question,
125
+ table_headers=request.table_headers,
126
+ sql_query=sql_query,
127
+ processing_time=processing_time
128
+ )
129
+
130
+ except Exception as e:
131
+ logger.error(f"Error generating SQL: {str(e)}")
132
+ raise HTTPException(status_code=500, detail=f"Error generating SQL: {str(e)}")
133
+
134
+ @app.post("/batch", response_model=BatchResponse)
135
+ async def batch_predict(request: BatchRequest):
136
+ """
137
+ Generate SQL queries from multiple questions
138
+
139
+ Args:
140
+ request: BatchRequest containing list of questions and table headers
141
+
142
+ Returns:
143
+ BatchResponse with generated SQL queries
144
+ """
145
+ if model is None:
146
+ raise HTTPException(status_code=503, detail="Model not loaded")
147
+
148
+ start_time = time.time()
149
+
150
+ try:
151
+ # Convert to format expected by model
152
+ queries = [
153
+ {"question": q.question, "table_headers": q.table_headers}
154
+ for q in request.queries
155
+ ]
156
+
157
+ # Get predictions
158
+ results = model.batch_predict(queries)
159
+
160
+ # Convert to response format
161
+ sql_responses = []
162
+ successful_count = 0
163
+
164
+ for i, result in enumerate(results):
165
+ if result['status'] == 'success':
166
+ successful_count += 1
167
+ sql_responses.append(SQLResponse(
168
+ question=result['question'],
169
+ table_headers=result['table_headers'],
170
+ sql_query=result['sql'],
171
+ processing_time=time.time() - start_time
172
+ ))
173
+ else:
174
+ # For failed queries, return error in SQL field
175
+ sql_responses.append(SQLResponse(
176
+ question=result['question'],
177
+ table_headers=result['table_headers'],
178
+ sql_query=f"ERROR: {result.get('error', 'Unknown error')}",
179
+ processing_time=time.time() - start_time
180
+ ))
181
+
182
+ return BatchResponse(
183
+ results=sql_responses,
184
+ total_queries=len(request.queries),
185
+ successful_queries=successful_count
186
+ )
187
+
188
+ except Exception as e:
189
+ logger.error(f"Error in batch prediction: {str(e)}")
190
+ raise HTTPException(status_code=500, detail=f"Error in batch prediction: {str(e)}")
191
+
192
+ @app.get("/health", response_model=HealthResponse)
193
+ async def health_check():
194
+ """
195
+ Health check endpoint
196
+
197
+ Returns:
198
+ HealthResponse with service status
199
+ """
200
+ model_loaded = model is not None and model.health_check()
201
+
202
+ return HealthResponse(
203
+ status="healthy" if model_loaded else "unhealthy",
204
+ model_loaded=model_loaded,
205
+ timestamp=time.time()
206
+ )
207
+
208
+ @app.get("/example")
209
+ async def get_example():
210
+ """Get example request format"""
211
+ return {
212
+ "example_request": {
213
+ "question": "How many employees are older than 30?",
214
+ "table_headers": ["id", "name", "age", "department", "salary"]
215
+ },
216
+ "example_response": {
217
+ "question": "How many employees are older than 30?",
218
+ "table_headers": ["id", "name", "age", "department", "salary"],
219
+ "sql_query": "SELECT COUNT(*) FROM table WHERE age > 30",
220
+ "processing_time": 0.123
221
+ }
222
+ }
223
+
224
+ if __name__ == "__main__":
225
+ # Run the application
226
+ uvicorn.run(
227
+ "app:app",
228
+ host="0.0.0.0",
229
+ port=8000,
230
+ reload=False,
231
+ log_level="info"
232
+ )
final-model/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Salesforce/codet5-base
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
final-model/adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Salesforce/codet5-base",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "wo",
28
+ "v",
29
+ "k",
30
+ "wi",
31
+ "q",
32
+ "o"
33
+ ],
34
+ "task_type": "SEQ_2_SEQ_LM",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_rslora": false
38
+ }
final-model/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee148fb67ac91dd2d0d32100873c25c33e5fc2ce98968909249c1507a97f0d18
3
+ size 13029736
final-model/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
final-model/special_tokens_map.json ADDED
@@ -0,0 +1,753 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<extra_id_99>",
5
+ "lstrip": true,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "<extra_id_98>",
12
+ "lstrip": true,
13
+ "normalized": true,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<extra_id_97>",
19
+ "lstrip": true,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "<extra_id_96>",
26
+ "lstrip": true,
27
+ "normalized": true,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<extra_id_95>",
33
+ "lstrip": true,
34
+ "normalized": true,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ {
39
+ "content": "<extra_id_94>",
40
+ "lstrip": true,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ },
45
+ {
46
+ "content": "<extra_id_93>",
47
+ "lstrip": true,
48
+ "normalized": true,
49
+ "rstrip": false,
50
+ "single_word": false
51
+ },
52
+ {
53
+ "content": "<extra_id_92>",
54
+ "lstrip": true,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false
58
+ },
59
+ {
60
+ "content": "<extra_id_91>",
61
+ "lstrip": true,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false
65
+ },
66
+ {
67
+ "content": "<extra_id_90>",
68
+ "lstrip": true,
69
+ "normalized": true,
70
+ "rstrip": false,
71
+ "single_word": false
72
+ },
73
+ {
74
+ "content": "<extra_id_89>",
75
+ "lstrip": true,
76
+ "normalized": true,
77
+ "rstrip": false,
78
+ "single_word": false
79
+ },
80
+ {
81
+ "content": "<extra_id_88>",
82
+ "lstrip": true,
83
+ "normalized": true,
84
+ "rstrip": false,
85
+ "single_word": false
86
+ },
87
+ {
88
+ "content": "<extra_id_87>",
89
+ "lstrip": true,
90
+ "normalized": true,
91
+ "rstrip": false,
92
+ "single_word": false
93
+ },
94
+ {
95
+ "content": "<extra_id_86>",
96
+ "lstrip": true,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false
100
+ },
101
+ {
102
+ "content": "<extra_id_85>",
103
+ "lstrip": true,
104
+ "normalized": true,
105
+ "rstrip": false,
106
+ "single_word": false
107
+ },
108
+ {
109
+ "content": "<extra_id_84>",
110
+ "lstrip": true,
111
+ "normalized": true,
112
+ "rstrip": false,
113
+ "single_word": false
114
+ },
115
+ {
116
+ "content": "<extra_id_83>",
117
+ "lstrip": true,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false
121
+ },
122
+ {
123
+ "content": "<extra_id_82>",
124
+ "lstrip": true,
125
+ "normalized": true,
126
+ "rstrip": false,
127
+ "single_word": false
128
+ },
129
+ {
130
+ "content": "<extra_id_81>",
131
+ "lstrip": true,
132
+ "normalized": true,
133
+ "rstrip": false,
134
+ "single_word": false
135
+ },
136
+ {
137
+ "content": "<extra_id_80>",
138
+ "lstrip": true,
139
+ "normalized": true,
140
+ "rstrip": false,
141
+ "single_word": false
142
+ },
143
+ {
144
+ "content": "<extra_id_79>",
145
+ "lstrip": true,
146
+ "normalized": true,
147
+ "rstrip": false,
148
+ "single_word": false
149
+ },
150
+ {
151
+ "content": "<extra_id_78>",
152
+ "lstrip": true,
153
+ "normalized": true,
154
+ "rstrip": false,
155
+ "single_word": false
156
+ },
157
+ {
158
+ "content": "<extra_id_77>",
159
+ "lstrip": true,
160
+ "normalized": true,
161
+ "rstrip": false,
162
+ "single_word": false
163
+ },
164
+ {
165
+ "content": "<extra_id_76>",
166
+ "lstrip": true,
167
+ "normalized": true,
168
+ "rstrip": false,
169
+ "single_word": false
170
+ },
171
+ {
172
+ "content": "<extra_id_75>",
173
+ "lstrip": true,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false
177
+ },
178
+ {
179
+ "content": "<extra_id_74>",
180
+ "lstrip": true,
181
+ "normalized": true,
182
+ "rstrip": false,
183
+ "single_word": false
184
+ },
185
+ {
186
+ "content": "<extra_id_73>",
187
+ "lstrip": true,
188
+ "normalized": true,
189
+ "rstrip": false,
190
+ "single_word": false
191
+ },
192
+ {
193
+ "content": "<extra_id_72>",
194
+ "lstrip": true,
195
+ "normalized": true,
196
+ "rstrip": false,
197
+ "single_word": false
198
+ },
199
+ {
200
+ "content": "<extra_id_71>",
201
+ "lstrip": true,
202
+ "normalized": true,
203
+ "rstrip": false,
204
+ "single_word": false
205
+ },
206
+ {
207
+ "content": "<extra_id_70>",
208
+ "lstrip": true,
209
+ "normalized": true,
210
+ "rstrip": false,
211
+ "single_word": false
212
+ },
213
+ {
214
+ "content": "<extra_id_69>",
215
+ "lstrip": true,
216
+ "normalized": true,
217
+ "rstrip": false,
218
+ "single_word": false
219
+ },
220
+ {
221
+ "content": "<extra_id_68>",
222
+ "lstrip": true,
223
+ "normalized": true,
224
+ "rstrip": false,
225
+ "single_word": false
226
+ },
227
+ {
228
+ "content": "<extra_id_67>",
229
+ "lstrip": true,
230
+ "normalized": true,
231
+ "rstrip": false,
232
+ "single_word": false
233
+ },
234
+ {
235
+ "content": "<extra_id_66>",
236
+ "lstrip": true,
237
+ "normalized": true,
238
+ "rstrip": false,
239
+ "single_word": false
240
+ },
241
+ {
242
+ "content": "<extra_id_65>",
243
+ "lstrip": true,
244
+ "normalized": true,
245
+ "rstrip": false,
246
+ "single_word": false
247
+ },
248
+ {
249
+ "content": "<extra_id_64>",
250
+ "lstrip": true,
251
+ "normalized": true,
252
+ "rstrip": false,
253
+ "single_word": false
254
+ },
255
+ {
256
+ "content": "<extra_id_63>",
257
+ "lstrip": true,
258
+ "normalized": true,
259
+ "rstrip": false,
260
+ "single_word": false
261
+ },
262
+ {
263
+ "content": "<extra_id_62>",
264
+ "lstrip": true,
265
+ "normalized": true,
266
+ "rstrip": false,
267
+ "single_word": false
268
+ },
269
+ {
270
+ "content": "<extra_id_61>",
271
+ "lstrip": true,
272
+ "normalized": true,
273
+ "rstrip": false,
274
+ "single_word": false
275
+ },
276
+ {
277
+ "content": "<extra_id_60>",
278
+ "lstrip": true,
279
+ "normalized": true,
280
+ "rstrip": false,
281
+ "single_word": false
282
+ },
283
+ {
284
+ "content": "<extra_id_59>",
285
+ "lstrip": true,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false
289
+ },
290
+ {
291
+ "content": "<extra_id_58>",
292
+ "lstrip": true,
293
+ "normalized": true,
294
+ "rstrip": false,
295
+ "single_word": false
296
+ },
297
+ {
298
+ "content": "<extra_id_57>",
299
+ "lstrip": true,
300
+ "normalized": true,
301
+ "rstrip": false,
302
+ "single_word": false
303
+ },
304
+ {
305
+ "content": "<extra_id_56>",
306
+ "lstrip": true,
307
+ "normalized": true,
308
+ "rstrip": false,
309
+ "single_word": false
310
+ },
311
+ {
312
+ "content": "<extra_id_55>",
313
+ "lstrip": true,
314
+ "normalized": true,
315
+ "rstrip": false,
316
+ "single_word": false
317
+ },
318
+ {
319
+ "content": "<extra_id_54>",
320
+ "lstrip": true,
321
+ "normalized": true,
322
+ "rstrip": false,
323
+ "single_word": false
324
+ },
325
+ {
326
+ "content": "<extra_id_53>",
327
+ "lstrip": true,
328
+ "normalized": true,
329
+ "rstrip": false,
330
+ "single_word": false
331
+ },
332
+ {
333
+ "content": "<extra_id_52>",
334
+ "lstrip": true,
335
+ "normalized": true,
336
+ "rstrip": false,
337
+ "single_word": false
338
+ },
339
+ {
340
+ "content": "<extra_id_51>",
341
+ "lstrip": true,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false
345
+ },
346
+ {
347
+ "content": "<extra_id_50>",
348
+ "lstrip": true,
349
+ "normalized": true,
350
+ "rstrip": false,
351
+ "single_word": false
352
+ },
353
+ {
354
+ "content": "<extra_id_49>",
355
+ "lstrip": true,
356
+ "normalized": true,
357
+ "rstrip": false,
358
+ "single_word": false
359
+ },
360
+ {
361
+ "content": "<extra_id_48>",
362
+ "lstrip": true,
363
+ "normalized": true,
364
+ "rstrip": false,
365
+ "single_word": false
366
+ },
367
+ {
368
+ "content": "<extra_id_47>",
369
+ "lstrip": true,
370
+ "normalized": true,
371
+ "rstrip": false,
372
+ "single_word": false
373
+ },
374
+ {
375
+ "content": "<extra_id_46>",
376
+ "lstrip": true,
377
+ "normalized": true,
378
+ "rstrip": false,
379
+ "single_word": false
380
+ },
381
+ {
382
+ "content": "<extra_id_45>",
383
+ "lstrip": true,
384
+ "normalized": true,
385
+ "rstrip": false,
386
+ "single_word": false
387
+ },
388
+ {
389
+ "content": "<extra_id_44>",
390
+ "lstrip": true,
391
+ "normalized": true,
392
+ "rstrip": false,
393
+ "single_word": false
394
+ },
395
+ {
396
+ "content": "<extra_id_43>",
397
+ "lstrip": true,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false
401
+ },
402
+ {
403
+ "content": "<extra_id_42>",
404
+ "lstrip": true,
405
+ "normalized": true,
406
+ "rstrip": false,
407
+ "single_word": false
408
+ },
409
+ {
410
+ "content": "<extra_id_41>",
411
+ "lstrip": true,
412
+ "normalized": true,
413
+ "rstrip": false,
414
+ "single_word": false
415
+ },
416
+ {
417
+ "content": "<extra_id_40>",
418
+ "lstrip": true,
419
+ "normalized": true,
420
+ "rstrip": false,
421
+ "single_word": false
422
+ },
423
+ {
424
+ "content": "<extra_id_39>",
425
+ "lstrip": true,
426
+ "normalized": true,
427
+ "rstrip": false,
428
+ "single_word": false
429
+ },
430
+ {
431
+ "content": "<extra_id_38>",
432
+ "lstrip": true,
433
+ "normalized": true,
434
+ "rstrip": false,
435
+ "single_word": false
436
+ },
437
+ {
438
+ "content": "<extra_id_37>",
439
+ "lstrip": true,
440
+ "normalized": true,
441
+ "rstrip": false,
442
+ "single_word": false
443
+ },
444
+ {
445
+ "content": "<extra_id_36>",
446
+ "lstrip": true,
447
+ "normalized": true,
448
+ "rstrip": false,
449
+ "single_word": false
450
+ },
451
+ {
452
+ "content": "<extra_id_35>",
453
+ "lstrip": true,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false
457
+ },
458
+ {
459
+ "content": "<extra_id_34>",
460
+ "lstrip": true,
461
+ "normalized": true,
462
+ "rstrip": false,
463
+ "single_word": false
464
+ },
465
+ {
466
+ "content": "<extra_id_33>",
467
+ "lstrip": true,
468
+ "normalized": true,
469
+ "rstrip": false,
470
+ "single_word": false
471
+ },
472
+ {
473
+ "content": "<extra_id_32>",
474
+ "lstrip": true,
475
+ "normalized": true,
476
+ "rstrip": false,
477
+ "single_word": false
478
+ },
479
+ {
480
+ "content": "<extra_id_31>",
481
+ "lstrip": true,
482
+ "normalized": true,
483
+ "rstrip": false,
484
+ "single_word": false
485
+ },
486
+ {
487
+ "content": "<extra_id_30>",
488
+ "lstrip": true,
489
+ "normalized": true,
490
+ "rstrip": false,
491
+ "single_word": false
492
+ },
493
+ {
494
+ "content": "<extra_id_29>",
495
+ "lstrip": true,
496
+ "normalized": true,
497
+ "rstrip": false,
498
+ "single_word": false
499
+ },
500
+ {
501
+ "content": "<extra_id_28>",
502
+ "lstrip": true,
503
+ "normalized": true,
504
+ "rstrip": false,
505
+ "single_word": false
506
+ },
507
+ {
508
+ "content": "<extra_id_27>",
509
+ "lstrip": true,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false
513
+ },
514
+ {
515
+ "content": "<extra_id_26>",
516
+ "lstrip": true,
517
+ "normalized": true,
518
+ "rstrip": false,
519
+ "single_word": false
520
+ },
521
+ {
522
+ "content": "<extra_id_25>",
523
+ "lstrip": true,
524
+ "normalized": true,
525
+ "rstrip": false,
526
+ "single_word": false
527
+ },
528
+ {
529
+ "content": "<extra_id_24>",
530
+ "lstrip": true,
531
+ "normalized": true,
532
+ "rstrip": false,
533
+ "single_word": false
534
+ },
535
+ {
536
+ "content": "<extra_id_23>",
537
+ "lstrip": true,
538
+ "normalized": true,
539
+ "rstrip": false,
540
+ "single_word": false
541
+ },
542
+ {
543
+ "content": "<extra_id_22>",
544
+ "lstrip": true,
545
+ "normalized": true,
546
+ "rstrip": false,
547
+ "single_word": false
548
+ },
549
+ {
550
+ "content": "<extra_id_21>",
551
+ "lstrip": true,
552
+ "normalized": true,
553
+ "rstrip": false,
554
+ "single_word": false
555
+ },
556
+ {
557
+ "content": "<extra_id_20>",
558
+ "lstrip": true,
559
+ "normalized": true,
560
+ "rstrip": false,
561
+ "single_word": false
562
+ },
563
+ {
564
+ "content": "<extra_id_19>",
565
+ "lstrip": true,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false
569
+ },
570
+ {
571
+ "content": "<extra_id_18>",
572
+ "lstrip": true,
573
+ "normalized": true,
574
+ "rstrip": false,
575
+ "single_word": false
576
+ },
577
+ {
578
+ "content": "<extra_id_17>",
579
+ "lstrip": true,
580
+ "normalized": true,
581
+ "rstrip": false,
582
+ "single_word": false
583
+ },
584
+ {
585
+ "content": "<extra_id_16>",
586
+ "lstrip": true,
587
+ "normalized": true,
588
+ "rstrip": false,
589
+ "single_word": false
590
+ },
591
+ {
592
+ "content": "<extra_id_15>",
593
+ "lstrip": true,
594
+ "normalized": true,
595
+ "rstrip": false,
596
+ "single_word": false
597
+ },
598
+ {
599
+ "content": "<extra_id_14>",
600
+ "lstrip": true,
601
+ "normalized": true,
602
+ "rstrip": false,
603
+ "single_word": false
604
+ },
605
+ {
606
+ "content": "<extra_id_13>",
607
+ "lstrip": true,
608
+ "normalized": true,
609
+ "rstrip": false,
610
+ "single_word": false
611
+ },
612
+ {
613
+ "content": "<extra_id_12>",
614
+ "lstrip": true,
615
+ "normalized": true,
616
+ "rstrip": false,
617
+ "single_word": false
618
+ },
619
+ {
620
+ "content": "<extra_id_11>",
621
+ "lstrip": true,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false
625
+ },
626
+ {
627
+ "content": "<extra_id_10>",
628
+ "lstrip": true,
629
+ "normalized": true,
630
+ "rstrip": false,
631
+ "single_word": false
632
+ },
633
+ {
634
+ "content": "<extra_id_9>",
635
+ "lstrip": true,
636
+ "normalized": true,
637
+ "rstrip": false,
638
+ "single_word": false
639
+ },
640
+ {
641
+ "content": "<extra_id_8>",
642
+ "lstrip": true,
643
+ "normalized": true,
644
+ "rstrip": false,
645
+ "single_word": false
646
+ },
647
+ {
648
+ "content": "<extra_id_7>",
649
+ "lstrip": true,
650
+ "normalized": true,
651
+ "rstrip": false,
652
+ "single_word": false
653
+ },
654
+ {
655
+ "content": "<extra_id_6>",
656
+ "lstrip": true,
657
+ "normalized": true,
658
+ "rstrip": false,
659
+ "single_word": false
660
+ },
661
+ {
662
+ "content": "<extra_id_5>",
663
+ "lstrip": true,
664
+ "normalized": true,
665
+ "rstrip": false,
666
+ "single_word": false
667
+ },
668
+ {
669
+ "content": "<extra_id_4>",
670
+ "lstrip": true,
671
+ "normalized": true,
672
+ "rstrip": false,
673
+ "single_word": false
674
+ },
675
+ {
676
+ "content": "<extra_id_3>",
677
+ "lstrip": true,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false
681
+ },
682
+ {
683
+ "content": "<extra_id_2>",
684
+ "lstrip": true,
685
+ "normalized": true,
686
+ "rstrip": false,
687
+ "single_word": false
688
+ },
689
+ {
690
+ "content": "<extra_id_1>",
691
+ "lstrip": true,
692
+ "normalized": true,
693
+ "rstrip": false,
694
+ "single_word": false
695
+ },
696
+ {
697
+ "content": "<extra_id_0>",
698
+ "lstrip": true,
699
+ "normalized": true,
700
+ "rstrip": false,
701
+ "single_word": false
702
+ }
703
+ ],
704
+ "bos_token": {
705
+ "content": "<s>",
706
+ "lstrip": false,
707
+ "normalized": true,
708
+ "rstrip": false,
709
+ "single_word": false
710
+ },
711
+ "cls_token": {
712
+ "content": "<s>",
713
+ "lstrip": false,
714
+ "normalized": true,
715
+ "rstrip": false,
716
+ "single_word": false
717
+ },
718
+ "eos_token": {
719
+ "content": "</s>",
720
+ "lstrip": false,
721
+ "normalized": true,
722
+ "rstrip": false,
723
+ "single_word": false
724
+ },
725
+ "mask_token": {
726
+ "content": "<mask>",
727
+ "lstrip": true,
728
+ "normalized": true,
729
+ "rstrip": false,
730
+ "single_word": false
731
+ },
732
+ "pad_token": {
733
+ "content": "<pad>",
734
+ "lstrip": false,
735
+ "normalized": true,
736
+ "rstrip": false,
737
+ "single_word": false
738
+ },
739
+ "sep_token": {
740
+ "content": "</s>",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false
745
+ },
746
+ "unk_token": {
747
+ "content": "<unk>",
748
+ "lstrip": false,
749
+ "normalized": true,
750
+ "rstrip": false,
751
+ "single_word": false
752
+ }
753
+ }
final-model/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
final-model/tokenizer_config.json ADDED
@@ -0,0 +1,960 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<s>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "32000": {
45
+ "content": "<extra_id_99>",
46
+ "lstrip": true,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "32001": {
53
+ "content": "<extra_id_98>",
54
+ "lstrip": true,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "32002": {
61
+ "content": "<extra_id_97>",
62
+ "lstrip": true,
63
+ "normalized": true,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "32003": {
69
+ "content": "<extra_id_96>",
70
+ "lstrip": true,
71
+ "normalized": true,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "32004": {
77
+ "content": "<extra_id_95>",
78
+ "lstrip": true,
79
+ "normalized": true,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "32005": {
85
+ "content": "<extra_id_94>",
86
+ "lstrip": true,
87
+ "normalized": true,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "32006": {
93
+ "content": "<extra_id_93>",
94
+ "lstrip": true,
95
+ "normalized": true,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "32007": {
101
+ "content": "<extra_id_92>",
102
+ "lstrip": true,
103
+ "normalized": true,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "32008": {
109
+ "content": "<extra_id_91>",
110
+ "lstrip": true,
111
+ "normalized": true,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "32009": {
117
+ "content": "<extra_id_90>",
118
+ "lstrip": true,
119
+ "normalized": true,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "32010": {
125
+ "content": "<extra_id_89>",
126
+ "lstrip": true,
127
+ "normalized": true,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "32011": {
133
+ "content": "<extra_id_88>",
134
+ "lstrip": true,
135
+ "normalized": true,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "32012": {
141
+ "content": "<extra_id_87>",
142
+ "lstrip": true,
143
+ "normalized": true,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "32013": {
149
+ "content": "<extra_id_86>",
150
+ "lstrip": true,
151
+ "normalized": true,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "32014": {
157
+ "content": "<extra_id_85>",
158
+ "lstrip": true,
159
+ "normalized": true,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "32015": {
165
+ "content": "<extra_id_84>",
166
+ "lstrip": true,
167
+ "normalized": true,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "32016": {
173
+ "content": "<extra_id_83>",
174
+ "lstrip": true,
175
+ "normalized": true,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "32017": {
181
+ "content": "<extra_id_82>",
182
+ "lstrip": true,
183
+ "normalized": true,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "32018": {
189
+ "content": "<extra_id_81>",
190
+ "lstrip": true,
191
+ "normalized": true,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "32019": {
197
+ "content": "<extra_id_80>",
198
+ "lstrip": true,
199
+ "normalized": true,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "32020": {
205
+ "content": "<extra_id_79>",
206
+ "lstrip": true,
207
+ "normalized": true,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "32021": {
213
+ "content": "<extra_id_78>",
214
+ "lstrip": true,
215
+ "normalized": true,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "32022": {
221
+ "content": "<extra_id_77>",
222
+ "lstrip": true,
223
+ "normalized": true,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "32023": {
229
+ "content": "<extra_id_76>",
230
+ "lstrip": true,
231
+ "normalized": true,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "32024": {
237
+ "content": "<extra_id_75>",
238
+ "lstrip": true,
239
+ "normalized": true,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "32025": {
245
+ "content": "<extra_id_74>",
246
+ "lstrip": true,
247
+ "normalized": true,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "32026": {
253
+ "content": "<extra_id_73>",
254
+ "lstrip": true,
255
+ "normalized": true,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "32027": {
261
+ "content": "<extra_id_72>",
262
+ "lstrip": true,
263
+ "normalized": true,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "32028": {
269
+ "content": "<extra_id_71>",
270
+ "lstrip": true,
271
+ "normalized": true,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "32029": {
277
+ "content": "<extra_id_70>",
278
+ "lstrip": true,
279
+ "normalized": true,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "32030": {
285
+ "content": "<extra_id_69>",
286
+ "lstrip": true,
287
+ "normalized": true,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "32031": {
293
+ "content": "<extra_id_68>",
294
+ "lstrip": true,
295
+ "normalized": true,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "32032": {
301
+ "content": "<extra_id_67>",
302
+ "lstrip": true,
303
+ "normalized": true,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "32033": {
309
+ "content": "<extra_id_66>",
310
+ "lstrip": true,
311
+ "normalized": true,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "32034": {
317
+ "content": "<extra_id_65>",
318
+ "lstrip": true,
319
+ "normalized": true,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "32035": {
325
+ "content": "<extra_id_64>",
326
+ "lstrip": true,
327
+ "normalized": true,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "32036": {
333
+ "content": "<extra_id_63>",
334
+ "lstrip": true,
335
+ "normalized": true,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "32037": {
341
+ "content": "<extra_id_62>",
342
+ "lstrip": true,
343
+ "normalized": true,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "32038": {
349
+ "content": "<extra_id_61>",
350
+ "lstrip": true,
351
+ "normalized": true,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "32039": {
357
+ "content": "<extra_id_60>",
358
+ "lstrip": true,
359
+ "normalized": true,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "32040": {
365
+ "content": "<extra_id_59>",
366
+ "lstrip": true,
367
+ "normalized": true,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "32041": {
373
+ "content": "<extra_id_58>",
374
+ "lstrip": true,
375
+ "normalized": true,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "32042": {
381
+ "content": "<extra_id_57>",
382
+ "lstrip": true,
383
+ "normalized": true,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "32043": {
389
+ "content": "<extra_id_56>",
390
+ "lstrip": true,
391
+ "normalized": true,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "32044": {
397
+ "content": "<extra_id_55>",
398
+ "lstrip": true,
399
+ "normalized": true,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "32045": {
405
+ "content": "<extra_id_54>",
406
+ "lstrip": true,
407
+ "normalized": true,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "32046": {
413
+ "content": "<extra_id_53>",
414
+ "lstrip": true,
415
+ "normalized": true,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "32047": {
421
+ "content": "<extra_id_52>",
422
+ "lstrip": true,
423
+ "normalized": true,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "32048": {
429
+ "content": "<extra_id_51>",
430
+ "lstrip": true,
431
+ "normalized": true,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "32049": {
437
+ "content": "<extra_id_50>",
438
+ "lstrip": true,
439
+ "normalized": true,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "32050": {
445
+ "content": "<extra_id_49>",
446
+ "lstrip": true,
447
+ "normalized": true,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "32051": {
453
+ "content": "<extra_id_48>",
454
+ "lstrip": true,
455
+ "normalized": true,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "32052": {
461
+ "content": "<extra_id_47>",
462
+ "lstrip": true,
463
+ "normalized": true,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "32053": {
469
+ "content": "<extra_id_46>",
470
+ "lstrip": true,
471
+ "normalized": true,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "32054": {
477
+ "content": "<extra_id_45>",
478
+ "lstrip": true,
479
+ "normalized": true,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "32055": {
485
+ "content": "<extra_id_44>",
486
+ "lstrip": true,
487
+ "normalized": true,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "32056": {
493
+ "content": "<extra_id_43>",
494
+ "lstrip": true,
495
+ "normalized": true,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "32057": {
501
+ "content": "<extra_id_42>",
502
+ "lstrip": true,
503
+ "normalized": true,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "32058": {
509
+ "content": "<extra_id_41>",
510
+ "lstrip": true,
511
+ "normalized": true,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "32059": {
517
+ "content": "<extra_id_40>",
518
+ "lstrip": true,
519
+ "normalized": true,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "32060": {
525
+ "content": "<extra_id_39>",
526
+ "lstrip": true,
527
+ "normalized": true,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "32061": {
533
+ "content": "<extra_id_38>",
534
+ "lstrip": true,
535
+ "normalized": true,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "32062": {
541
+ "content": "<extra_id_37>",
542
+ "lstrip": true,
543
+ "normalized": true,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "32063": {
549
+ "content": "<extra_id_36>",
550
+ "lstrip": true,
551
+ "normalized": true,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "32064": {
557
+ "content": "<extra_id_35>",
558
+ "lstrip": true,
559
+ "normalized": true,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "32065": {
565
+ "content": "<extra_id_34>",
566
+ "lstrip": true,
567
+ "normalized": true,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "32066": {
573
+ "content": "<extra_id_33>",
574
+ "lstrip": true,
575
+ "normalized": true,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "32067": {
581
+ "content": "<extra_id_32>",
582
+ "lstrip": true,
583
+ "normalized": true,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "32068": {
589
+ "content": "<extra_id_31>",
590
+ "lstrip": true,
591
+ "normalized": true,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "32069": {
597
+ "content": "<extra_id_30>",
598
+ "lstrip": true,
599
+ "normalized": true,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "32070": {
605
+ "content": "<extra_id_29>",
606
+ "lstrip": true,
607
+ "normalized": true,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "32071": {
613
+ "content": "<extra_id_28>",
614
+ "lstrip": true,
615
+ "normalized": true,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "32072": {
621
+ "content": "<extra_id_27>",
622
+ "lstrip": true,
623
+ "normalized": true,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "32073": {
629
+ "content": "<extra_id_26>",
630
+ "lstrip": true,
631
+ "normalized": true,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "32074": {
637
+ "content": "<extra_id_25>",
638
+ "lstrip": true,
639
+ "normalized": true,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "32075": {
645
+ "content": "<extra_id_24>",
646
+ "lstrip": true,
647
+ "normalized": true,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "32076": {
653
+ "content": "<extra_id_23>",
654
+ "lstrip": true,
655
+ "normalized": true,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "32077": {
661
+ "content": "<extra_id_22>",
662
+ "lstrip": true,
663
+ "normalized": true,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "32078": {
669
+ "content": "<extra_id_21>",
670
+ "lstrip": true,
671
+ "normalized": true,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "32079": {
677
+ "content": "<extra_id_20>",
678
+ "lstrip": true,
679
+ "normalized": true,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "32080": {
685
+ "content": "<extra_id_19>",
686
+ "lstrip": true,
687
+ "normalized": true,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "32081": {
693
+ "content": "<extra_id_18>",
694
+ "lstrip": true,
695
+ "normalized": true,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "32082": {
701
+ "content": "<extra_id_17>",
702
+ "lstrip": true,
703
+ "normalized": true,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "32083": {
709
+ "content": "<extra_id_16>",
710
+ "lstrip": true,
711
+ "normalized": true,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "32084": {
717
+ "content": "<extra_id_15>",
718
+ "lstrip": true,
719
+ "normalized": true,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "32085": {
725
+ "content": "<extra_id_14>",
726
+ "lstrip": true,
727
+ "normalized": true,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "32086": {
733
+ "content": "<extra_id_13>",
734
+ "lstrip": true,
735
+ "normalized": true,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "32087": {
741
+ "content": "<extra_id_12>",
742
+ "lstrip": true,
743
+ "normalized": true,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "32088": {
749
+ "content": "<extra_id_11>",
750
+ "lstrip": true,
751
+ "normalized": true,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "32089": {
757
+ "content": "<extra_id_10>",
758
+ "lstrip": true,
759
+ "normalized": true,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "32090": {
765
+ "content": "<extra_id_9>",
766
+ "lstrip": true,
767
+ "normalized": true,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "32091": {
773
+ "content": "<extra_id_8>",
774
+ "lstrip": true,
775
+ "normalized": true,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "32092": {
781
+ "content": "<extra_id_7>",
782
+ "lstrip": true,
783
+ "normalized": true,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "32093": {
789
+ "content": "<extra_id_6>",
790
+ "lstrip": true,
791
+ "normalized": true,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "32094": {
797
+ "content": "<extra_id_5>",
798
+ "lstrip": true,
799
+ "normalized": true,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "32095": {
805
+ "content": "<extra_id_4>",
806
+ "lstrip": true,
807
+ "normalized": true,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "32096": {
813
+ "content": "<extra_id_3>",
814
+ "lstrip": true,
815
+ "normalized": true,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "32097": {
821
+ "content": "<extra_id_2>",
822
+ "lstrip": true,
823
+ "normalized": true,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "32098": {
829
+ "content": "<extra_id_1>",
830
+ "lstrip": true,
831
+ "normalized": true,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": true
835
+ },
836
+ "32099": {
837
+ "content": "<extra_id_0>",
838
+ "lstrip": true,
839
+ "normalized": true,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": true
843
+ }
844
+ },
845
+ "additional_special_tokens": [
846
+ "<extra_id_99>",
847
+ "<extra_id_98>",
848
+ "<extra_id_97>",
849
+ "<extra_id_96>",
850
+ "<extra_id_95>",
851
+ "<extra_id_94>",
852
+ "<extra_id_93>",
853
+ "<extra_id_92>",
854
+ "<extra_id_91>",
855
+ "<extra_id_90>",
856
+ "<extra_id_89>",
857
+ "<extra_id_88>",
858
+ "<extra_id_87>",
859
+ "<extra_id_86>",
860
+ "<extra_id_85>",
861
+ "<extra_id_84>",
862
+ "<extra_id_83>",
863
+ "<extra_id_82>",
864
+ "<extra_id_81>",
865
+ "<extra_id_80>",
866
+ "<extra_id_79>",
867
+ "<extra_id_78>",
868
+ "<extra_id_77>",
869
+ "<extra_id_76>",
870
+ "<extra_id_75>",
871
+ "<extra_id_74>",
872
+ "<extra_id_73>",
873
+ "<extra_id_72>",
874
+ "<extra_id_71>",
875
+ "<extra_id_70>",
876
+ "<extra_id_69>",
877
+ "<extra_id_68>",
878
+ "<extra_id_67>",
879
+ "<extra_id_66>",
880
+ "<extra_id_65>",
881
+ "<extra_id_64>",
882
+ "<extra_id_63>",
883
+ "<extra_id_62>",
884
+ "<extra_id_61>",
885
+ "<extra_id_60>",
886
+ "<extra_id_59>",
887
+ "<extra_id_58>",
888
+ "<extra_id_57>",
889
+ "<extra_id_56>",
890
+ "<extra_id_55>",
891
+ "<extra_id_54>",
892
+ "<extra_id_53>",
893
+ "<extra_id_52>",
894
+ "<extra_id_51>",
895
+ "<extra_id_50>",
896
+ "<extra_id_49>",
897
+ "<extra_id_48>",
898
+ "<extra_id_47>",
899
+ "<extra_id_46>",
900
+ "<extra_id_45>",
901
+ "<extra_id_44>",
902
+ "<extra_id_43>",
903
+ "<extra_id_42>",
904
+ "<extra_id_41>",
905
+ "<extra_id_40>",
906
+ "<extra_id_39>",
907
+ "<extra_id_38>",
908
+ "<extra_id_37>",
909
+ "<extra_id_36>",
910
+ "<extra_id_35>",
911
+ "<extra_id_34>",
912
+ "<extra_id_33>",
913
+ "<extra_id_32>",
914
+ "<extra_id_31>",
915
+ "<extra_id_30>",
916
+ "<extra_id_29>",
917
+ "<extra_id_28>",
918
+ "<extra_id_27>",
919
+ "<extra_id_26>",
920
+ "<extra_id_25>",
921
+ "<extra_id_24>",
922
+ "<extra_id_23>",
923
+ "<extra_id_22>",
924
+ "<extra_id_21>",
925
+ "<extra_id_20>",
926
+ "<extra_id_19>",
927
+ "<extra_id_18>",
928
+ "<extra_id_17>",
929
+ "<extra_id_16>",
930
+ "<extra_id_15>",
931
+ "<extra_id_14>",
932
+ "<extra_id_13>",
933
+ "<extra_id_12>",
934
+ "<extra_id_11>",
935
+ "<extra_id_10>",
936
+ "<extra_id_9>",
937
+ "<extra_id_8>",
938
+ "<extra_id_7>",
939
+ "<extra_id_6>",
940
+ "<extra_id_5>",
941
+ "<extra_id_4>",
942
+ "<extra_id_3>",
943
+ "<extra_id_2>",
944
+ "<extra_id_1>",
945
+ "<extra_id_0>"
946
+ ],
947
+ "bos_token": "<s>",
948
+ "clean_up_tokenization_spaces": false,
949
+ "cls_token": "<s>",
950
+ "eos_token": "</s>",
951
+ "errors": "replace",
952
+ "extra_special_tokens": {},
953
+ "mask_token": "<mask>",
954
+ "model_max_length": 512,
955
+ "pad_token": "<pad>",
956
+ "sep_token": "</s>",
957
+ "tokenizer_class": "RobertaTokenizer",
958
+ "trim_offsets": true,
959
+ "unk_token": "<unk>"
960
+ }
final-model/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ffda2899b08f0ccd548da5a53cdf56afec8f0c176a906edcbc595eb1efdbd4b
3
+ size 5777
final-model/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
index.html ADDED
@@ -0,0 +1,380 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Text-to-SQL Converter</title>
7
+ <style>
8
+ * {
9
+ margin: 0;
10
+ padding: 0;
11
+ box-sizing: border-box;
12
+ }
13
+
14
+ body {
15
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
16
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
17
+ min-height: 100vh;
18
+ display: flex;
19
+ align-items: center;
20
+ justify-content: center;
21
+ padding: 20px;
22
+ }
23
+
24
+ .container {
25
+ background: rgba(255, 255, 255, 0.95);
26
+ backdrop-filter: blur(10px);
27
+ border-radius: 20px;
28
+ box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
29
+ padding: 40px;
30
+ max-width: 800px;
31
+ width: 100%;
32
+ text-align: center;
33
+ }
34
+
35
+ .header {
36
+ margin-bottom: 40px;
37
+ }
38
+
39
+ .header h1 {
40
+ color: #333;
41
+ font-size: 2.5rem;
42
+ font-weight: 700;
43
+ margin-bottom: 10px;
44
+ background: linear-gradient(135deg, #667eea, #764ba2);
45
+ -webkit-background-clip: text;
46
+ -webkit-text-fill-color: transparent;
47
+ background-clip: text;
48
+ }
49
+
50
+ .header p {
51
+ color: #666;
52
+ font-size: 1.1rem;
53
+ line-height: 1.6;
54
+ }
55
+
56
+ .input-section {
57
+ margin-bottom: 30px;
58
+ }
59
+
60
+ .form-group {
61
+ margin-bottom: 20px;
62
+ text-align: left;
63
+ }
64
+
65
+ .form-group label {
66
+ display: block;
67
+ margin-bottom: 8px;
68
+ color: #333;
69
+ font-weight: 600;
70
+ font-size: 1rem;
71
+ }
72
+
73
+ .question-input {
74
+ width: 100%;
75
+ padding: 20px;
76
+ border: 2px solid #e1e5e9;
77
+ border-radius: 15px;
78
+ font-size: 1.1rem;
79
+ font-family: inherit;
80
+ resize: vertical;
81
+ min-height: 120px;
82
+ transition: all 0.3s ease;
83
+ background: #f8f9fa;
84
+ }
85
+
86
+ .question-input:focus {
87
+ outline: none;
88
+ border-color: #667eea;
89
+ background: white;
90
+ box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
91
+ }
92
+
93
+ .headers-input {
94
+ width: 100%;
95
+ padding: 15px;
96
+ border: 2px solid #e1e5e9;
97
+ border-radius: 15px;
98
+ font-size: 1rem;
99
+ font-family: inherit;
100
+ transition: all 0.3s ease;
101
+ background: #f8f9fa;
102
+ }
103
+
104
+ .headers-input:focus {
105
+ outline: none;
106
+ border-color: #667eea;
107
+ background: white;
108
+ box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
109
+ }
110
+
111
+ .submit-btn {
112
+ background: linear-gradient(135deg, #667eea, #764ba2);
113
+ color: white;
114
+ border: none;
115
+ padding: 15px 40px;
116
+ border-radius: 50px;
117
+ font-size: 1.1rem;
118
+ font-weight: 600;
119
+ cursor: pointer;
120
+ transition: all 0.3s ease;
121
+ box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
122
+ }
123
+
124
+ .submit-btn:hover {
125
+ transform: translateY(-2px);
126
+ box-shadow: 0 15px 30px rgba(102, 126, 234, 0.4);
127
+ }
128
+
129
+ .submit-btn:disabled {
130
+ opacity: 0.6;
131
+ cursor: not-allowed;
132
+ transform: none;
133
+ }
134
+
135
+ .result-section {
136
+ margin-top: 30px;
137
+ text-align: left;
138
+ }
139
+
140
+ .result-card {
141
+ background: #f8f9fa;
142
+ border-radius: 15px;
143
+ padding: 25px;
144
+ border-left: 4px solid #667eea;
145
+ margin-bottom: 20px;
146
+ }
147
+
148
+ .result-title {
149
+ font-weight: 600;
150
+ color: #333;
151
+ margin-bottom: 15px;
152
+ font-size: 1.1rem;
153
+ }
154
+
155
+ .sql-query {
156
+ background: #2d3748;
157
+ color: #e2e8f0;
158
+ padding: 20px;
159
+ border-radius: 10px;
160
+ font-family: 'Courier New', monospace;
161
+ font-size: 0.95rem;
162
+ line-height: 1.5;
163
+ overflow-x: auto;
164
+ white-space: pre-wrap;
165
+ }
166
+
167
+ .loading {
168
+ display: none;
169
+ text-align: center;
170
+ margin: 20px 0;
171
+ }
172
+
173
+ .spinner {
174
+ border: 3px solid #f3f3f3;
175
+ border-top: 3px solid #667eea;
176
+ border-radius: 50%;
177
+ width: 30px;
178
+ height: 30px;
179
+ animation: spin 1s linear infinite;
180
+ margin: 0 auto 10px;
181
+ }
182
+
183
+ @keyframes spin {
184
+ 0% { transform: rotate(0deg); }
185
+ 100% { transform: rotate(360deg); }
186
+ }
187
+
188
+ .error {
189
+ background: #fed7d7;
190
+ color: #c53030;
191
+ padding: 15px;
192
+ border-radius: 10px;
193
+ margin-top: 20px;
194
+ border-left: 4px solid #c53030;
195
+ }
196
+
197
+ .example-section {
198
+ margin-top: 30px;
199
+ padding: 20px;
200
+ background: #f7fafc;
201
+ border-radius: 15px;
202
+ border: 1px solid #e2e8f0;
203
+ }
204
+
205
+ .example-title {
206
+ font-weight: 600;
207
+ color: #333;
208
+ margin-bottom: 15px;
209
+ }
210
+
211
+ .example-item {
212
+ margin-bottom: 10px;
213
+ padding: 10px;
214
+ background: white;
215
+ border-radius: 8px;
216
+ border-left: 3px solid #667eea;
217
+ }
218
+
219
+ .example-question {
220
+ font-weight: 500;
221
+ color: #333;
222
+ }
223
+
224
+ .example-headers {
225
+ color: #666;
226
+ font-size: 0.9rem;
227
+ margin-top: 5px;
228
+ }
229
+
230
+ @media (max-width: 768px) {
231
+ .container {
232
+ padding: 20px;
233
+ margin: 10px;
234
+ }
235
+
236
+ .header h1 {
237
+ font-size: 2rem;
238
+ }
239
+
240
+ .question-input {
241
+ min-height: 100px;
242
+ padding: 15px;
243
+ }
244
+ }
245
+ </style>
246
+ </head>
247
+ <body>
248
+ <div class="container">
249
+ <div class="header">
250
+ <h1>Text-to-SQL Converter</h1>
251
+ <p>Transform your natural language questions into SQL queries instantly</p>
252
+ </div>
253
+
254
+ <div class="input-section">
255
+ <form id="sqlForm">
256
+ <div class="form-group">
257
+ <label for="question">Your Question:</label>
258
+ <textarea
259
+ id="question"
260
+ class="question-input"
261
+ placeholder="e.g., How many employees are older than 30?"
262
+ required
263
+ ></textarea>
264
+ </div>
265
+
266
+ <div class="form-group">
267
+ <label for="headers">Table Headers (comma-separated):</label>
268
+ <input
269
+ type="text"
270
+ id="headers"
271
+ class="headers-input"
272
+ placeholder="e.g., id, name, age, department, salary"
273
+ required
274
+ >
275
+ </div>
276
+
277
+ <button type="submit" class="submit-btn" id="submitBtn">
278
+ Generate SQL Query
279
+ </button>
280
+ </form>
281
+ </div>
282
+
283
+ <div class="loading" id="loading">
284
+ <div class="spinner"></div>
285
+ <p>Generating SQL query...</p>
286
+ </div>
287
+
288
+ <div class="result-section" id="resultSection" style="display: none;">
289
+ <div class="result-card">
290
+ <div class="result-title">Generated SQL Query:</div>
291
+ <div class="sql-query" id="sqlResult"></div>
292
+ </div>
293
+ </div>
294
+
295
+ <div class="example-section">
296
+ <div class="example-title">💡 Example Questions:</div>
297
+ <div class="example-item">
298
+ <div class="example-question">"How many employees are older than 30?"</div>
299
+ <div class="example-headers">Headers: id, name, age, department, salary</div>
300
+ </div>
301
+ <div class="example-item">
302
+ <div class="example-question">"Show all employees in the IT department"</div>
303
+ <div class="example-headers">Headers: id, name, age, department, salary</div>
304
+ </div>
305
+ <div class="example-item">
306
+ <div class="example-question">"What is the average salary by department?"</div>
307
+ <div class="example-headers">Headers: id, name, age, department, salary</div>
308
+ </div>
309
+ </div>
310
+ </div>
311
+
312
+ <script>
313
+ const form = document.getElementById('sqlForm');
314
+ const loading = document.getElementById('loading');
315
+ const resultSection = document.getElementById('resultSection');
316
+ const sqlResult = document.getElementById('sqlResult');
317
+ const submitBtn = document.getElementById('submitBtn');
318
+
319
+ form.addEventListener('submit', async (e) => {
320
+ e.preventDefault();
321
+
322
+ const question = document.getElementById('question').value.trim();
323
+ const headers = document.getElementById('headers').value.trim();
324
+
325
+ if (!question || !headers) {
326
+ alert('Please fill in both question and table headers');
327
+ return;
328
+ }
329
+
330
+ // Show loading
331
+ loading.style.display = 'block';
332
+ resultSection.style.display = 'none';
333
+ submitBtn.disabled = true;
334
+
335
+ try {
336
+ const tableHeaders = headers.split(',').map(h => h.trim());
337
+
338
+ const response = await fetch('/predict', {
339
+ method: 'POST',
340
+ headers: {
341
+ 'Content-Type': 'application/json',
342
+ },
343
+ body: JSON.stringify({
344
+ question: question,
345
+ table_headers: tableHeaders
346
+ })
347
+ });
348
+
349
+ const data = await response.json();
350
+
351
+ if (response.ok) {
352
+ sqlResult.textContent = data.sql_query;
353
+ resultSection.style.display = 'block';
354
+ } else {
355
+ throw new Error(data.detail || 'Failed to generate SQL query');
356
+ }
357
+
358
+ } catch (error) {
359
+ console.error('Error:', error);
360
+ sqlResult.textContent = `Error: ${error.message}`;
361
+ resultSection.style.display = 'block';
362
+ } finally {
363
+ loading.style.display = 'none';
364
+ submitBtn.disabled = false;
365
+ }
366
+ });
367
+
368
+ // Add click handlers for examples
369
+ document.querySelectorAll('.example-item').forEach(item => {
370
+ item.addEventListener('click', () => {
371
+ const question = item.querySelector('.example-question').textContent.replace(/"/g, '');
372
+ const headers = item.querySelector('.example-headers').textContent.replace('Headers: ', '');
373
+
374
+ document.getElementById('question').value = question;
375
+ document.getElementById('headers').value = headers;
376
+ });
377
+ });
378
+ </script>
379
+ </body>
380
+ </html>
model_utils.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
3
+ from peft import PeftModel
4
+ import logging
5
+
6
+ # Configure logging
7
+ logging.basicConfig(level=logging.INFO)
8
+ logger = logging.getLogger(__name__)
9
+
10
+ class TextToSQLModel:
11
+ """Text-to-SQL model wrapper for deployment"""
12
+
13
+ def __init__(self, model_dir="./final-model", base_model="Salesforce/codet5-base"):
14
+ self.model_dir = model_dir
15
+ self.base_model = base_model
16
+ self.max_length = 128
17
+ self.model = None
18
+ self.tokenizer = None
19
+ self._load_model()
20
+
21
+ def _load_model(self):
22
+ """Load the trained model and tokenizer"""
23
+ try:
24
+ logger.info("Loading tokenizer...")
25
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_dir)
26
+
27
+ logger.info("Loading base model...")
28
+ base_model = AutoModelForSeq2SeqLM.from_pretrained(self.base_model)
29
+
30
+ logger.info("Loading PEFT model...")
31
+ self.model = PeftModel.from_pretrained(base_model, self.model_dir)
32
+ self.model.eval()
33
+
34
+ logger.info("Model loaded successfully!")
35
+
36
+ except Exception as e:
37
+ logger.error(f"Error loading model: {str(e)}")
38
+ raise
39
+
40
+ def predict(self, question: str, table_headers: list) -> str:
41
+ """
42
+ Generate SQL query for a given question and table headers
43
+
44
+ Args:
45
+ question (str): Natural language question
46
+ table_headers (list): List of table column names
47
+
48
+ Returns:
49
+ str: Generated SQL query
50
+ """
51
+ try:
52
+ # Format input text
53
+ table_headers_str = ", ".join(table_headers)
54
+ input_text = f"### Table columns:\n{table_headers_str}\n### Question:\n{question}\n### SQL:"
55
+
56
+ # Tokenize input
57
+ inputs = self.tokenizer(
58
+ input_text,
59
+ return_tensors="pt",
60
+ padding=True,
61
+ truncation=True,
62
+ max_length=self.max_length
63
+ )
64
+
65
+ # Generate prediction
66
+ with torch.no_grad():
67
+ outputs = self.model.generate(**inputs, max_length=self.max_length)
68
+
69
+ # Decode prediction
70
+ sql_query = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
71
+
72
+ return sql_query
73
+
74
+ except Exception as e:
75
+ logger.error(f"Error generating SQL: {str(e)}")
76
+ raise
77
+
78
+ def batch_predict(self, queries: list) -> list:
79
+ """
80
+ Generate SQL queries for multiple questions
81
+
82
+ Args:
83
+ queries (list): List of dicts with 'question' and 'table_headers' keys
84
+
85
+ Returns:
86
+ list: List of generated SQL queries
87
+ """
88
+ results = []
89
+ for query in queries:
90
+ try:
91
+ sql = self.predict(query['question'], query['table_headers'])
92
+ results.append({
93
+ 'question': query['question'],
94
+ 'table_headers': query['table_headers'],
95
+ 'sql': sql,
96
+ 'status': 'success'
97
+ })
98
+ except Exception as e:
99
+ results.append({
100
+ 'question': query['question'],
101
+ 'table_headers': query['table_headers'],
102
+ 'sql': None,
103
+ 'status': 'error',
104
+ 'error': str(e)
105
+ })
106
+
107
+ return results
108
+
109
+ def health_check(self) -> bool:
110
+ """Check if model is loaded and ready"""
111
+ return self.model is not None and self.tokenizer is not None
112
+
113
+ # Global model instance
114
+ _model_instance = None
115
+
116
+ def get_model():
117
+ """Get or create global model instance"""
118
+ global _model_instance
119
+ if _model_instance is None:
120
+ _model_instance = TextToSQLModel()
121
+ return _model_instance
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn[standard]==0.24.0
3
+ torch>=2.0.0
4
+ transformers>=4.35.0
5
+ peft>=0.6.0
6
+ accelerate>=0.24.0
7
+ pydantic>=2.0.0
8
+ python-multipart>=0.0.6
test_app.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for the Text-to-SQL application
4
+ """
5
+
6
+ import requests
7
+ import json
8
+ import time
9
+
10
+ def test_health():
11
+ """Test health endpoint"""
12
+ try:
13
+ response = requests.get("http://localhost:8000/health")
14
+ print(f"Health check: {response.status_code}")
15
+ if response.status_code == 200:
16
+ data = response.json()
17
+ print(f"Status: {data['status']}")
18
+ print(f"Model loaded: {data['model_loaded']}")
19
+ return response.status_code == 200
20
+ except Exception as e:
21
+ print(f"Health check failed: {e}")
22
+ return False
23
+
24
+ def test_single_prediction():
25
+ """Test single prediction endpoint"""
26
+ try:
27
+ data = {
28
+ "question": "How many employees are older than 30?",
29
+ "table_headers": ["id", "name", "age", "department", "salary"]
30
+ }
31
+
32
+ response = requests.post("http://localhost:8000/predict", json=data)
33
+ print(f"Single prediction: {response.status_code}")
34
+
35
+ if response.status_code == 200:
36
+ result = response.json()
37
+ print(f"Question: {result['question']}")
38
+ print(f"SQL: {result['sql_query']}")
39
+ print(f"Processing time: {result['processing_time']:.3f}s")
40
+ return True
41
+ else:
42
+ print(f"Error: {response.text}")
43
+ return False
44
+
45
+ except Exception as e:
46
+ print(f"Single prediction failed: {e}")
47
+ return False
48
+
49
+ def test_batch_prediction():
50
+ """Test batch prediction endpoint"""
51
+ try:
52
+ data = {
53
+ "queries": [
54
+ {
55
+ "question": "How many employees are older than 30?",
56
+ "table_headers": ["id", "name", "age", "department", "salary"]
57
+ },
58
+ {
59
+ "question": "Show all employees in IT department",
60
+ "table_headers": ["id", "name", "age", "department", "salary"]
61
+ }
62
+ ]
63
+ }
64
+
65
+ response = requests.post("http://localhost:8000/batch", json=data)
66
+ print(f"Batch prediction: {response.status_code}")
67
+
68
+ if response.status_code == 200:
69
+ result = response.json()
70
+ print(f"Total queries: {result['total_queries']}")
71
+ print(f"Successful queries: {result['successful_queries']}")
72
+
73
+ for i, res in enumerate(result['results']):
74
+ print(f"\nQuery {i+1}:")
75
+ print(f" Question: {res['question']}")
76
+ print(f" SQL: {res['sql_query']}")
77
+ return True
78
+ else:
79
+ print(f"Error: {response.text}")
80
+ return False
81
+
82
+ except Exception as e:
83
+ print(f"Batch prediction failed: {e}")
84
+ return False
85
+
86
+ def main():
87
+ """Run all tests"""
88
+ print("🧪 Testing Text-to-SQL Application")
89
+ print("=" * 50)
90
+
91
+ # Wait a bit for the server to start
92
+ print("Waiting for server to be ready...")
93
+ time.sleep(5)
94
+
95
+ # Test health
96
+ print("\n1. Testing health endpoint...")
97
+ health_ok = test_health()
98
+
99
+ if not health_ok:
100
+ print("❌ Health check failed. Make sure the server is running.")
101
+ return
102
+
103
+ # Test single prediction
104
+ print("\n2. Testing single prediction...")
105
+ single_ok = test_single_prediction()
106
+
107
+ # Test batch prediction
108
+ print("\n3. Testing batch prediction...")
109
+ batch_ok = test_batch_prediction()
110
+
111
+ # Summary
112
+ print("\n" + "=" * 50)
113
+ print("📊 Test Results:")
114
+ print(f"Health check: {'✅' if health_ok else '❌'}")
115
+ print(f"Single prediction: {'✅' if single_ok else '❌'}")
116
+ print(f"Batch prediction: {'✅' if batch_ok else '❌'}")
117
+
118
+ if all([health_ok, single_ok, batch_ok]):
119
+ print("\n🎉 All tests passed! Your application is ready for deployment.")
120
+ else:
121
+ print("\n⚠️ Some tests failed. Please check the errors above.")
122
+
123
+ if __name__ == "__main__":
124
+ main()
train.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import (
3
+ AutoTokenizer,
4
+ AutoModelForSeq2SeqLM,
5
+ Seq2SeqTrainingArguments,
6
+ Seq2SeqTrainer,
7
+ DataCollatorForSeq2Seq
8
+ )
9
+ from peft import LoraConfig, get_peft_model, TaskType
10
+ from datasets import load_dataset
11
+ import os
12
+
13
+ # Model Configuration
14
+ MODEL_NAME = "Salesforce/codet5-base"
15
+ MAX_LENGTH = 128
16
+ TRAIN_BATCH_SIZE = 2
17
+ EVAL_BATCH_SIZE = 2
18
+ LEARNING_RATE = 1e-4
19
+ NUM_EPOCHS = 3
20
+ TRAIN_SIZE = 5000
21
+ VAL_SIZE = 500
22
+ CHECKPOINT_DIR = "./codet5-sql-finetuned"
23
+
24
+ def preprocess(example):
25
+ question = example["question"]
26
+ table_headers = ", ".join(example["table"]["header"])
27
+ sql_query = example["sql"]["human_readable"]
28
+
29
+ return {
30
+ "input_text": f"### Table columns:\n{table_headers}\n### Question:\n{question}\n### SQL:",
31
+ "target_text": sql_query
32
+ }
33
+
34
+ def main():
35
+ # Set up device
36
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
37
+ print(f"Using device: {device}")
38
+
39
+ # Load and preprocess dataset
40
+ print("Loading dataset...")
41
+ try:
42
+ dataset = load_dataset("wikisql")
43
+ except Exception as e:
44
+ print(f"Error loading dataset: {str(e)}")
45
+ print("Trying with trust_remote_code=True...")
46
+ dataset = load_dataset("wikisql", trust_remote_code=True)
47
+
48
+ train_dataset = dataset["train"].select(range(TRAIN_SIZE))
49
+ val_dataset = dataset["validation"].select(range(VAL_SIZE))
50
+
51
+ print("Preprocessing datasets...")
52
+ processed_train = train_dataset.map(preprocess, remove_columns=train_dataset.column_names)
53
+ processed_val = val_dataset.map(preprocess, remove_columns=val_dataset.column_names)
54
+
55
+ # Load model and tokenizer
56
+ print("Loading model and tokenizer...")
57
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
58
+ model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
59
+
60
+ # Add LoRA adapters
61
+ lora_config = LoraConfig(
62
+ r=8,
63
+ lora_alpha=16,
64
+ lora_dropout=0.1,
65
+ bias="none",
66
+ task_type=TaskType.SEQ_2_SEQ_LM,
67
+ target_modules=["q", "v", "k", "o", "wi", "wo"]
68
+ )
69
+ model = get_peft_model(model, lora_config)
70
+
71
+ def tokenize_function(examples):
72
+ inputs = tokenizer(
73
+ examples["input_text"],
74
+ padding="max_length",
75
+ truncation=True,
76
+ max_length=MAX_LENGTH,
77
+ return_tensors="pt"
78
+ )
79
+ targets = tokenizer(
80
+ examples["target_text"],
81
+ padding="max_length",
82
+ truncation=True,
83
+ max_length=MAX_LENGTH,
84
+ return_tensors="pt"
85
+ )
86
+ inputs["labels"] = targets["input_ids"]
87
+ return inputs
88
+
89
+ print("Tokenizing datasets...")
90
+ tokenized_train = processed_train.map(
91
+ tokenize_function,
92
+ remove_columns=processed_train.column_names,
93
+ batched=True
94
+ )
95
+ tokenized_val = processed_val.map(
96
+ tokenize_function,
97
+ remove_columns=processed_val.column_names,
98
+ batched=True
99
+ )
100
+
101
+ # Training arguments - simplified for stability
102
+ training_args = Seq2SeqTrainingArguments(
103
+ output_dir=CHECKPOINT_DIR,
104
+ per_device_train_batch_size=TRAIN_BATCH_SIZE,
105
+ per_device_eval_batch_size=EVAL_BATCH_SIZE,
106
+ num_train_epochs=NUM_EPOCHS,
107
+ learning_rate=LEARNING_RATE,
108
+ logging_dir=os.path.join(CHECKPOINT_DIR, "logs"),
109
+ logging_steps=10,
110
+ save_total_limit=2,
111
+ predict_with_generate=True,
112
+ no_cuda=True, # Force CPU training
113
+ fp16=False, # Disable mixed precision training since we're on CPU
114
+ report_to="none" # Disable wandb logging
115
+ )
116
+
117
+ # Data collator
118
+ data_collator = DataCollatorForSeq2Seq(
119
+ tokenizer,
120
+ model=model,
121
+ padding=True
122
+ )
123
+
124
+ # Initialize trainer
125
+ trainer = Seq2SeqTrainer(
126
+ model=model,
127
+ args=training_args,
128
+ train_dataset=tokenized_train,
129
+ eval_dataset=tokenized_val,
130
+ data_collator=data_collator,
131
+ )
132
+
133
+ try:
134
+ print("\nStarting training...")
135
+ print("You can stop training at any time by pressing Ctrl+C")
136
+ print("Training will automatically save checkpoints after each epoch")
137
+
138
+ # Check for existing checkpoints
139
+ last_checkpoint = None
140
+ if os.path.exists(CHECKPOINT_DIR):
141
+ checkpoints = [d for d in os.listdir(CHECKPOINT_DIR) if d.startswith('checkpoint-')]
142
+ if checkpoints:
143
+ last_checkpoint = os.path.join(CHECKPOINT_DIR, sorted(checkpoints, key=lambda x: int(x.split('-')[1]))[-1])
144
+ print(f"\nFound checkpoint: {last_checkpoint}")
145
+ print("Training will resume from this checkpoint.")
146
+
147
+ # Start or resume training
148
+ trainer.train(resume_from_checkpoint=last_checkpoint)
149
+
150
+ # Save the final model
151
+ trainer.save_model("./final-model")
152
+ print("\nTraining completed successfully!")
153
+ print(f"Final model saved to: ./final-model")
154
+
155
+ except KeyboardInterrupt:
156
+ print("\nTraining interrupted by user!")
157
+ print("Progress is saved in the latest checkpoint.")
158
+ print("To resume, just run the script again.")
159
+
160
+ except Exception as e:
161
+ print(f"\nAn error occurred during training: {str(e)}")
162
+ if os.path.exists(CHECKPOINT_DIR):
163
+ error_checkpoint = os.path.join(CHECKPOINT_DIR, "checkpoint-error")
164
+ trainer.save_model(error_checkpoint)
165
+ print(f"Saved error checkpoint to: {error_checkpoint}")
166
+
167
+ if __name__ == "__main__":
168
+ main()