aashish1904 commited on
Commit
fe4a73b
·
verified ·
1 Parent(s): ad9bc26

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +157 -0
README.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ tags: []
6
+
7
+ ---
8
+
9
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
10
+
11
+
12
+ # QuantFactory/prem-1B-SQL-GGUF
13
+ This is quantized version of [premai-io/prem-1B-SQL](https://huggingface.co/premai-io/prem-1B-SQL) created using llama.cpp
14
+
15
+ # Original Model Card
16
+
17
+
18
+ # Prem-1B-SQL
19
+
20
+ Prem-1B-SQL is the one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
21
+ it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
22
+ approach. Because exposing Databases to third party closed source models can lead to data security breaches. We will be publishing some
23
+ of the public benchmarks results of this model very soon. We will also be iterating on this model for more better results.
24
+
25
+ - **Developed by:** [Prem AI](https://www.premai.io/)
26
+ - **License:** [MIT]
27
+
28
+
29
+ ## How to use Prem-1B-SQL
30
+
31
+ Since it is a model built upon transformers, so it can be directly used with transformers. However running Text-to-SQL is not as simple
32
+ as running normal LLMs. The reason lies in model input prompt formations which is tightly coupled with databases. So we have developed PremSQL,
33
+ a fully open source library which is:
34
+
35
+ - **Local-First**: Avoid third-party closed-source providers and keep your data secure.
36
+ - **Customizable Datasets**: Create, fine-tune, and evaluate models with built-in or custom datasets.
37
+ - **Robust Executors and Evaluators**: Easily connect to databases and assess model performance.
38
+ - **Advanced Generators**: Convert natural language prompts into executable SQL queries.
39
+ - **Error Handling and Self-Correction**: Automatically correct SQL queries during inference.
40
+ - **Fine-Tuning Support**: Fine-tune models with LoRA, QLoRA, or full fine-tuning strategies.
41
+ - **End-to-End Pipelines**: Seamlessly integrate all components for autonomous data analysis.
42
+
43
+ To install PremSQL just create a new environment and type:
44
+
45
+ ```bash
46
+ pip install -U premsql
47
+ ```
48
+
49
+ Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage.
50
+
51
+ ### Running Prem-1B-SQL using PremSQL Pipelines
52
+
53
+ The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases)
54
+ or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that:
55
+
56
+ ```python
57
+ from premsql.pipelines import SimpleText2SQLAgent
58
+ from premsql.generators import Text2SQLGeneratorHF
59
+ from premsql.executors import SQLiteExecutor
60
+
61
+ # Provide a SQLite file here or see documentation for more customization
62
+ dsn_or_db_path = "./data/db/california_schools.sqlite"
63
+
64
+ agent = SimpleText2SQLAgent(
65
+ dsn_or_db_path=dsn_or_db_path,
66
+ generator=Text2SQLGeneratorHF(
67
+ model_or_name_or_path="premai-io/prem-1B-SQL",
68
+ experiment_name="simple_pipeline",
69
+ device="cuda:0",
70
+ type="test"
71
+ ),
72
+ )
73
+
74
+ question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1"
75
+
76
+ response = agent.query(question)
77
+ response["table"]
78
+ ```
79
+
80
+ Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you.
81
+
82
+
83
+ ### Running Prem-1B-SQL using PremSQL Generators
84
+
85
+ You can also run the model using PremSQL Generators. This is helpful when you want to do generations in
86
+ bulk on some dataset. Here is an example:
87
+
88
+ ```python
89
+ from premsql.generators import Text2SQLGeneratorHF
90
+ from premsql.datasets import Text2SQLDataset
91
+
92
+ # Define a dataset
93
+ dataset = bird_dataset = Text2SQLDataset(
94
+ dataset_name='bird', split="validation", force_download=False,
95
+ dataset_folder="/path/to/dataset"
96
+ ).setup_dataset(num_rows=10, num_fewshot=3)
97
+
98
+ # Define a generator
99
+ generator = Text2SQLGeneratorHF(
100
+ model_or_name_or_path="premai-io/prem-1B-SQL",
101
+ experiment_name="test_generators",
102
+ device="cuda:0",
103
+ type="test"
104
+ )
105
+
106
+ # Generate on the full dataset
107
+ responses = generator.generate_and_save_results(
108
+ dataset=bird_dataset,
109
+ temperature=0.1,
110
+ max_new_tokens=256
111
+ )
112
+
113
+ print(responses)
114
+ ```
115
+
116
+ ### Using Execution guided Decoding
117
+
118
+ This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.
119
+
120
+
121
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b0075806b18943e4ba357/_5rdIQZwyaUFb84xKW_AV.png)
122
+
123
+ ```python
124
+ from premsql.executors import SQLiteExecutor
125
+
126
+ executor = SQLiteExecutor()
127
+ response = generator.generate_and_save_results(
128
+ dataset=bird_dataset,
129
+ temperature=0.1,
130
+ max_new_tokens=256,
131
+ force=True,
132
+ executor=executor,
133
+ max_retries=5 # this is optional (default is already set to 5)
134
+ )
135
+ ```
136
+
137
+
138
+ You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners) as well.
139
+ Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features
140
+ we provide.
141
+
142
+
143
+ ## Datasets used to train the model
144
+
145
+ Prem-1B-SQL is trained using the following datasets:
146
+
147
+ 1. [BirdBench Training dataset](https://bird-bench.github.io/) | Uploaded on [PremSQL datasets on HF](https://huggingface.co/datasets/premai-io/birdbench)
148
+ 2. [Spider dataset](https://yale-lily.github.io/spider) | Uploaded on [PremSQL datasets on HF](https://huggingface.co/datasets/premai-io/spider)
149
+ 3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains)
150
+ 4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0)
151
+
152
+ Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them.
153
+
154
+
155
+ ## Evaluation results of Prem-1B-SQL
156
+
157
+ The results of Prem-1B-SQL on some public benchmarks will be published soon.