|
|
| --- |
| |
| library_name: transformers |
| tags: [] |
|
|
| --- |
| |
| [](https://hf.co/QuantFactory) |
|
|
|
|
| # QuantFactory/prem-1B-SQL-GGUF |
| This is quantized version of [premai-io/prem-1B-SQL](https://huggingface.co/premai-io/prem-1B-SQL) created using llama.cpp |
|
|
| # Original Model Card |
|
|
|
|
| # Prem-1B-SQL |
|
|
| Prem-1B-SQL is the one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model |
| it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first |
| approach. Because exposing Databases to third party closed source models can lead to data security breaches. We will be publishing some |
| of the public benchmarks results of this model very soon. We will also be iterating on this model for more better results. |
|
|
| - **Developed by:** [Prem AI](https://www.premai.io/) |
| - **License:** [MIT] |
|
|
|
|
| ## How to use Prem-1B-SQL |
|
|
| Since it is a model built upon transformers, so it can be directly used with transformers. However running Text-to-SQL is not as simple |
| as running normal LLMs. The reason lies in model input prompt formations which is tightly coupled with databases. So we have developed PremSQL, |
| a fully open source library which is: |
|
|
| - **Local-First**: Avoid third-party closed-source providers and keep your data secure. |
| - **Customizable Datasets**: Create, fine-tune, and evaluate models with built-in or custom datasets. |
| - **Robust Executors and Evaluators**: Easily connect to databases and assess model performance. |
| - **Advanced Generators**: Convert natural language prompts into executable SQL queries. |
| - **Error Handling and Self-Correction**: Automatically correct SQL queries during inference. |
| - **Fine-Tuning Support**: Fine-tune models with LoRA, QLoRA, or full fine-tuning strategies. |
| - **End-to-End Pipelines**: Seamlessly integrate all components for autonomous data analysis. |
|
|
| To install PremSQL just create a new environment and type: |
|
|
| ```bash |
| pip install -U premsql |
| ``` |
|
|
| Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage. |
|
|
| ### Running Prem-1B-SQL using PremSQL Pipelines |
|
|
| The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases) |
| or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that: |
|
|
| ```python |
| from premsql.pipelines import SimpleText2SQLAgent |
| from premsql.generators import Text2SQLGeneratorHF |
| from premsql.executors import SQLiteExecutor |
| |
| # Provide a SQLite file here or see documentation for more customization |
| dsn_or_db_path = "./data/db/california_schools.sqlite" |
| |
| agent = SimpleText2SQLAgent( |
| dsn_or_db_path=dsn_or_db_path, |
| generator=Text2SQLGeneratorHF( |
| model_or_name_or_path="premai-io/prem-1B-SQL", |
| experiment_name="simple_pipeline", |
| device="cuda:0", |
| type="test" |
| ), |
| ) |
| |
| question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1" |
| |
| response = agent.query(question) |
| response["table"] |
| ``` |
|
|
| Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you. |
|
|
|
|
| ### Running Prem-1B-SQL using PremSQL Generators |
|
|
| You can also run the model using PremSQL Generators. This is helpful when you want to do generations in |
| bulk on some dataset. Here is an example: |
|
|
| ```python |
| from premsql.generators import Text2SQLGeneratorHF |
| from premsql.datasets import Text2SQLDataset |
| |
| # Define a dataset |
| dataset = bird_dataset = Text2SQLDataset( |
| dataset_name='bird', split="validation", force_download=False, |
| dataset_folder="/path/to/dataset" |
| ).setup_dataset(num_rows=10, num_fewshot=3) |
| |
| # Define a generator |
| generator = Text2SQLGeneratorHF( |
| model_or_name_or_path="premai-io/prem-1B-SQL", |
| experiment_name="test_generators", |
| device="cuda:0", |
| type="test" |
| ) |
| |
| # Generate on the full dataset |
| responses = generator.generate_and_save_results( |
| dataset=bird_dataset, |
| temperature=0.1, |
| max_new_tokens=256 |
| ) |
| |
| print(responses) |
| ``` |
|
|
| ### Using Execution guided Decoding |
|
|
| This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out. |
|
|
|
|
|  |
|
|
| ```python |
| from premsql.executors import SQLiteExecutor |
| |
| executor = SQLiteExecutor() |
| response = generator.generate_and_save_results( |
| dataset=bird_dataset, |
| temperature=0.1, |
| max_new_tokens=256, |
| force=True, |
| executor=executor, |
| max_retries=5 # this is optional (default is already set to 5) |
| ) |
| ``` |
|
|
|
|
| You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners) as well. |
| Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features |
| we provide. |
|
|
|
|
| ## Datasets used to train the model |
|
|
| Prem-1B-SQL is trained using the following datasets: |
|
|
| 1. [BirdBench Training dataset](https://bird-bench.github.io/) | Uploaded on [PremSQL datasets on HF](https://huggingface.co/datasets/premai-io/birdbench) |
| 2. [Spider dataset](https://yale-lily.github.io/spider) | Uploaded on [PremSQL datasets on HF](https://huggingface.co/datasets/premai-io/spider) |
| 3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains) |
| 4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0) |
|
|
| Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them. |
|
|
|
|
| ## Evaluation results of Prem-1B-SQL |
|
|
| The results of Prem-1B-SQL on some public benchmarks will be published soon. |
|
|