prem-research
/

prem-1B-SQL

@@ -12,8 +12,7 @@ base_model:
 - deepseek-ai/deepseek-coder-1.3b-instruct
 pipeline_tag: text2text-generation
 ---
-# Prem-1B-SQL
 - Read the blogpost [here](https://blog.premai.io/prem-1b-sql-fully-local-performant-slm-for-text-to-sql/)
 - PremSQL Library | [GitHub](https://github.com/premAI-io/premsql)
@@ -21,7 +20,7 @@ pipeline_tag: text2text-generation
 Prem-1B-SQL is one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
 it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
 approach. Because exposing Databases to third-party closed-source models can lead to data security breaches. We will be publishing some
-of the public benchmark results of this model very soon. We will also be iterating on this model for more better results.
 - **Developed by:** [Prem AI](https://www.premai.io/)
 - **License:** [MIT]
@@ -31,32 +30,30 @@ of the public benchmark results of this model very soon. We will also be iterati
 We evaluated our model on two popular benchmark datasets: BirdBench and Spider. BirdBench consists of a public validation dataset (with 1534 data points) and a private test dataset. Spider comes up with only a public validation dataset. Here are the results:
 | Dataset                  | Execution Accuracy |
-|--------------------------|--------------------|
-| BirdBench (validation)    | 46%                |
-| BirdBench (private test)  | 51.54%             |
 | Spider                   | 85%                |
- The BirdBench dataset is distributed across different difficulty levels. Here is a detailed view of the private results across different difficulty levels.
- | Difficulty  | Count |    EX   | Soft F1 |
-|-------------|-------|---------|---------|
-| Simple      |  949  |  60.70  |  61.48  |
-| Moderate    |  555  |  47.39  |  49.06  |
-| Challenging |  285  |  29.12  |  31.83  |
-| Total       | 1789  |  51.54  |  52.90  |
-Here is a more detailed comparison of popular closed- and open-source models.
-| Model                         | # Params (in Billion) | BirdBench Test Scores |
-|-------------------------------|-----------------------|-----------------------|
 | AskData + GPT-4o (current winner) | NA                    | 72.39                 |
-| DeepSeek coder 236B            | 236                   | 56.68                 |
-| GPT-4 (2023)                   | NA                    | 54.89                 |
-| **PremSQL 1B (ours)**              | 1                     | 51.4                  |
-| Qwen 2.5 7B Instruct           | 7                     | 51.1                  |
-| Claude 2 Base (2023)           | NA                    | 49.02                 |
 ## How to use Prem-1B-SQL
@@ -78,44 +75,54 @@ To install PremSQL just create a new environment and type:
 pip install -U premsql
 ```
-Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage.
-### Running Prem-1B-SQL using PremSQL Pipelines
-The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases)
 or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that:
 ```python
-from premsql.pipelines import SimpleText2SQLAgent
-from premsql.generators import Text2SQLGeneratorHF
 from premsql.executors import SQLiteExecutor
-# Provide a SQLite file here or see documentation for more customization
-dsn_or_db_path = "./data/db/california_schools.sqlite"
-agent = SimpleText2SQLAgent(
-    dsn_or_db_path=dsn_or_db_path,
-    generator=Text2SQLGeneratorHF(
-        model_or_name_or_path="premai-io/prem-1B-SQL",
-        experiment_name="simple_pipeline",
-        device="cuda:0",
-        type="test"
-    ),
 )
-question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1"
-response = agent.query(question)
-response["table"]
 ```
-Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you.
 ### Running Prem-1B-SQL using PremSQL Generators
-You can also run the model using PremSQL Generators. This is helpful when you want to do generations in
-bulk on some dataset. Here is an example:
 ```python
 from premsql.generators import Text2SQLGeneratorHF
@@ -127,7 +134,7 @@ dataset = bird_dataset = Text2SQLDataset(
     dataset_folder="/path/to/dataset"
 ).setup_dataset(num_rows=10, num_fewshot=3)
-# Define a generator
 generator = Text2SQLGeneratorHF(
     model_or_name_or_path="premai-io/prem-1B-SQL",
     experiment_name="test_generators",
@@ -149,7 +156,6 @@ print(responses)
 This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b0075806b18943e4ba357/_5rdIQZwyaUFb84xKW_AV.png)
 ```python
@@ -166,12 +172,10 @@ response = generator.generate_and_save_results(
 )
 ```
-You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners)  as well.
 Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features
 we provide.
 ## Datasets used to train the model
 Prem-1B-SQL is trained using the following datasets:
@@ -181,8 +185,7 @@ Prem-1B-SQL is trained using the following datasets:
 3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains)
 4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0)
-Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them.
 ## Evaluation results of Prem-1B-SQL

 - deepseek-ai/deepseek-coder-1.3b-instruct
 pipeline_tag: text2text-generation
 ---
+# Prem-1B-SQL (Ollama)
 - Read the blogpost [here](https://blog.premai.io/prem-1b-sql-fully-local-performant-slm-for-text-to-sql/)
 - PremSQL Library | [GitHub](https://github.com/premAI-io/premsql)
 Prem-1B-SQL is one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
 it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
 approach. Because exposing Databases to third-party closed-source models can lead to data security breaches. We will be publishing some
+of the public benchmark results of this model very soon. We will also be iterating on this model for more better results.
 - **Developed by:** [Prem AI](https://www.premai.io/)
 - **License:** [MIT]
 We evaluated our model on two popular benchmark datasets: BirdBench and Spider. BirdBench consists of a public validation dataset (with 1534 data points) and a private test dataset. Spider comes up with only a public validation dataset. Here are the results:
 | Dataset                  | Execution Accuracy |
+| ------------------------ | ------------------ |
+| BirdBench (validation)   | 46%                |
+| BirdBench (private test) | 51.54%             |
 | Spider                   | 85%                |
+The BirdBench dataset is distributed across different difficulty levels. Here is a detailed view of the private results across different difficulty levels.
+| Difficulty  | Count | EX    | Soft F1 |
+| ----------- | ----- | ----- | ------- |
+| Simple      | 949   | 60.70 | 61.48   |
+| Moderate    | 555   | 47.39 | 49.06   |
+| Challenging | 285   | 29.12 | 31.83   |
+| Total       | 1789  | 51.54 | 52.90   |
+Here is a more detailed comparison of popular closed- and open-source models.
+| Model                             | # Params (in Billion) | BirdBench Test Scores |
+| --------------------------------- | --------------------- | --------------------- |
 | AskData + GPT-4o (current winner) | NA                    | 72.39                 |
+| DeepSeek coder 236B               | 236                   | 56.68                 |
+| GPT-4 (2023)                      | NA                    | 54.89                 |
+| **PremSQL 1B (ours)**             | 1                     | 51.4                  |
+| Qwen 2.5 7B Instruct              | 7                     | 51.1                  |
+| Claude 2 Base (2023)              | NA                    | 49.02                 |
 ## How to use Prem-1B-SQL
 pip install -U premsql
 ```
+Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage.
+### Running Prem-1B-SQL using PremSQL BaseLine Agent
+The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases)
 or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that:
 ```python
+from premsql.agents import BaseLineAgent
+from premsql.generators import Text2SQLGeneratorOllama
+from premsql.agents.tools import SimpleMatplotlibTool
 from premsql.executors import SQLiteExecutor
+text2_sqlmodel = Text2SQLGeneratorHF(
+    model_or_name_or_path="premai-io/prem-1B-SQL",
+    experiment_name="test_generators",
+    device="cuda:0",
+    type="test"
+)
+analyser_and_plotter = Text2SQLGeneratorHF(
+    model_or_name_or_path="meta-llama/Llama-3.2-1B-Instruct",
+    experiment_name="test_generators",
+    device="cuda:0",
+    type="test"
 )
+agent = BaseLineAgent(
+    session_name="testing_hf",
+    db_connection_uri="sqlite:////path/to/your/database.sqlite",
+    specialized_model1=model,
+    specialized_model2=model,
+    plot_tool=SimpleMatplotlibTool(),
+    executor=SQLiteExecutor()
+)
+response = agent(
+    "/query what all tables are present inside the database"
+)
+response.show_dataframe()
 ```
+Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you.
 ### Running Prem-1B-SQL using PremSQL Generators
+You can also run the model using PremSQL Generators. This is helpful when you want to do generations in
+bulk on some dataset. Here is an example:
 ```python
 from premsql.generators import Text2SQLGeneratorHF
     dataset_folder="/path/to/dataset"
 ).setup_dataset(num_rows=10, num_fewshot=3)
+# Define a generator
 generator = Text2SQLGeneratorHF(
     model_or_name_or_path="premai-io/prem-1B-SQL",
     experiment_name="test_generators",
 This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b0075806b18943e4ba357/_5rdIQZwyaUFb84xKW_AV.png)
 ```python
 )
 ```
+You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners) as well.
 Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features
 we provide.
 ## Datasets used to train the model
 Prem-1B-SQL is trained using the following datasets:
 3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains)
 4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0)
+Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them.
 ## Evaluation results of Prem-1B-SQL