Update README.md
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ tags:
|
|
| 16 |
## Introduction
|
| 17 |
We present an automatic and scalable text-to-SQL data synthesis framework, illustrated below:
|
| 18 |
<p align="center">
|
| 19 |
-
<img src="
|
| 20 |
</p>
|
| 21 |
|
| 22 |
Based on this framework, we introduce the first million-scale text-to-SQL dataset, **SynSQL-2.5M**, containing over **2.5 million diverse and high-quality data samples**, spanning more than **16,000 databases from various domains**.
|
|
@@ -50,7 +50,7 @@ For more statistics and quality evaluations, refer to our paper. As of March 202
|
|
| 50 |
## Performance Evaluation
|
| 51 |
We evaluate OmniSQL on a wide range of datasets, including standard benchmarks (Spider and BIRD), challenging domain-specific benchmarks (Spider2.0-SQLite, ScienceBenchmark, EHRSQL), and three robustness benchmarks (Spider-DK, Spider-Syn, Spider-Realistic). The evaluation results are shown below:
|
| 52 |
<p align="center">
|
| 53 |
-
<img src="
|
| 54 |
</p>
|
| 55 |
|
| 56 |
"Gre" refers to greedy decoding, and "Maj" indicates major voting at 8. Spider (dev), Spider-Syn, and Spider-Realistic are evaluated using the test-suite accuracy (TS) metric, while the remaining datasets are evaluated using the execution accuracy (EX) metric.
|
|
|
|
| 16 |
## Introduction
|
| 17 |
We present an automatic and scalable text-to-SQL data synthesis framework, illustrated below:
|
| 18 |
<p align="center">
|
| 19 |
+
<img src="framework.png" alt="Description" style="width: 100%; max-width: 600px;"/>
|
| 20 |
</p>
|
| 21 |
|
| 22 |
Based on this framework, we introduce the first million-scale text-to-SQL dataset, **SynSQL-2.5M**, containing over **2.5 million diverse and high-quality data samples**, spanning more than **16,000 databases from various domains**.
|
|
|
|
| 50 |
## Performance Evaluation
|
| 51 |
We evaluate OmniSQL on a wide range of datasets, including standard benchmarks (Spider and BIRD), challenging domain-specific benchmarks (Spider2.0-SQLite, ScienceBenchmark, EHRSQL), and three robustness benchmarks (Spider-DK, Spider-Syn, Spider-Realistic). The evaluation results are shown below:
|
| 52 |
<p align="center">
|
| 53 |
+
<img src="main_results.png" alt="Description" style="width: 100%; max-width: 800px;"/>
|
| 54 |
</p>
|
| 55 |
|
| 56 |
"Gre" refers to greedy decoding, and "Maj" indicates major voting at 8. Spider (dev), Spider-Syn, and Spider-Realistic are evaluated using the test-suite accuracy (TS) metric, while the remaining datasets are evaluated using the execution accuracy (EX) metric.
|