prem-research
/

prem-1B-SQL

@@ -1,18 +1,59 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Prem-1B-SQL
-Prem-1B-SQL is the one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
 it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
-approach. Because exposing Databases to third party closed source models can lead to data security breaches. We will be publishing some
-of the public benchmarks results of this model very soon. We will also be iterating on this model for more better results.
 - **Developed by:** [Prem AI](https://www.premai.io/)
 - **License:** [MIT]
 ## How to use Prem-1B-SQL
@@ -142,4 +183,4 @@ Additionally we made error handling datasets on top of these datasets to make th
 ## Evaluation results of Prem-1B-SQL
-The results of Prem-1B-SQL on some public benchmarks will be published soon.

 ---
 library_name: transformers
+datasets:
+- premai-io/spider
+- premai-io/domains
+- premai-io/birdbench
+- gretelai/synthetic_text_to_sql
+metrics:
+- accuracy
+base_model:
+- deepseek-ai/deepseek-coder-1.3b-instruct
+pipeline_tag: text2text-generation
 ---
 # Prem-1B-SQL
+Prem-1B-SQL is one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
 it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
+approach. Because exposing Databases to third-party closed-source models can lead to data security breaches. We will be publishing some
+of the public benchmark results of this model very soon. We will also be iterating on this model for more better results.
 - **Developed by:** [Prem AI](https://www.premai.io/)
 - **License:** [MIT]
+## Results
+We evaluated our model on two popular benchmark datasets: BirdBench and Spider. BirdBench consists of a public validation dataset (with 1534 data points) and a private test dataset. Spider comes up with only a public validation dataset. Here are the results:
+| Dataset                  | Execution Accuracy |
+|--------------------------|--------------------|
+| BirdBench (validation)    | 46%                |
+| BirdBench (private test)  | 51.54%             |
+| Spider                   | 85%                |
+ The BirdBench dataset is distributed across different difficulty levels. Here is a detailed view of the private results across different difficulty levels.
+ | Difficulty  | Count |    EX   | Soft F1 |
+|-------------|-------|---------|---------|
+| Simple      |  949  |  60.70  |  61.48  |
+| Moderate    |  555  |  47.39  |  49.06  |
+| Challenging |  285  |  29.12  |  31.83  |
+| Total       | 1789  |  51.54  |  52.90  |
+Here is a more detailed comparison of popular closed- and open-source models.
+| Model                         | # Params (in Billion) | BirdBench Test Scores |
+|-------------------------------|-----------------------|-----------------------|
+| AskData + GPT-4o (current winner) | NA                    | 72.39                 |
+| DeepSeek coder 236B            | 236                   | 56.68                 |
+| GPT-4 (2023)                   | NA                    | 54.89                 |
+| **PremSQL 1B (ours)**              | 1                     | 51.4                  |
+| Qwen 2.5 7B Instruct           | 7                     | 51.1                  |
+| Claude 2 Base (2023)           | NA                    | 49.02                 |
 ## How to use Prem-1B-SQL
 ## Evaluation results of Prem-1B-SQL
+The results of Prem-1B-SQL on some public benchmarks will be published soon.