aashish1904

Upload README.md with huggingface_hub

fe4a73b verified over 1 year ago

6.22 kB


	---

	library_name: transformers
	tags: []

	---

	[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


	# QuantFactory/prem-1B-SQL-GGUF
	This is quantized version of [premai-io/prem-1B-SQL](https://huggingface.co/premai-io/prem-1B-SQL) created using llama.cpp

	# Original Model Card


	# Prem-1B-SQL

	Prem-1B-SQL is the one of the very first series of fully local Text-to-SQL models developed by Prem AI. Being a 1B parameter model
	it easily fits on low GPU devices (and CPU devices when quantized). We believe that AI assisted data analysis should be a Local first
	approach. Because exposing Databases to third party closed source models can lead to data security breaches. We will be publishing some
	of the public benchmarks results of this model very soon. We will also be iterating on this model for more better results.

	- Developed by: [Prem AI](https://www.premai.io/)
	- License: [MIT]


	## How to use Prem-1B-SQL

	Since it is a model built upon transformers, so it can be directly used with transformers. However running Text-to-SQL is not as simple
	as running normal LLMs. The reason lies in model input prompt formations which is tightly coupled with databases. So we have developed PremSQL,
	a fully open source library which is:

	- Local-First: Avoid third-party closed-source providers and keep your data secure.
	- Customizable Datasets: Create, fine-tune, and evaluate models with built-in or custom datasets.
	- Robust Executors and Evaluators: Easily connect to databases and assess model performance.
	- Advanced Generators: Convert natural language prompts into executable SQL queries.
	- Error Handling and Self-Correction: Automatically correct SQL queries during inference.
	- Fine-Tuning Support: Fine-tune models with LoRA, QLoRA, or full fine-tuning strategies.
	- End-to-End Pipelines: Seamlessly integrate all components for autonomous data analysis.

	To install PremSQL just create a new environment and type:

	```bash
	pip install -U premsql
	```

	Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more details of the library usage.

	### Running Prem-1B-SQL using PremSQL Pipelines

	The easiest way to use this model is through PremSQL pipelines. All you need to do is provide the database path (in case of SQLite databases)
	or provide the DB connection URI. After this, all you need to do is, connect it with the model. Here is how you do that:

	```python
	from premsql.pipelines import SimpleText2SQLAgent
	from premsql.generators import Text2SQLGeneratorHF
	from premsql.executors import SQLiteExecutor

	# Provide a SQLite file here or see documentation for more customization
	dsn_or_db_path = "./data/db/california_schools.sqlite"

	agent = SimpleText2SQLAgent(
	dsn_or_db_path=dsn_or_db_path,
	generator=Text2SQLGeneratorHF(
	model_or_name_or_path="premai-io/prem-1B-SQL",
	experiment_name="simple_pipeline",
	device="cuda:0",
	type="test"
	),
	)

	question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1"

	response = agent.query(question)
	response["table"]
	```

	Under the hood, it automatically connects with your Database and do all the heavy lifting like prompt creation, execution etc for you.


	### Running Prem-1B-SQL using PremSQL Generators

	You can also run the model using PremSQL Generators. This is helpful when you want to do generations in
	bulk on some dataset. Here is an example:

	```python
	from premsql.generators import Text2SQLGeneratorHF
	from premsql.datasets import Text2SQLDataset

	# Define a dataset
	dataset = bird_dataset = Text2SQLDataset(
	dataset_name='bird', split="validation", force_download=False,
	dataset_folder="/path/to/dataset"
	).setup_dataset(num_rows=10, num_fewshot=3)

	# Define a generator
	generator = Text2SQLGeneratorHF(
	model_or_name_or_path="premai-io/prem-1B-SQL",
	experiment_name="test_generators",
	device="cuda:0",
	type="test"
	)

	# Generate on the full dataset
	responses = generator.generate_and_save_results(
	dataset=bird_dataset,
	temperature=0.1,
	max_new_tokens=256
	)

	print(responses)
	```

	### Using Execution guided Decoding

	This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b0075806b18943e4ba357/_5rdIQZwyaUFb84xKW_AV.png)

	```python
	from premsql.executors import SQLiteExecutor

	executor = SQLiteExecutor()
	response = generator.generate_and_save_results(
	dataset=bird_dataset,
	temperature=0.1,
	max_new_tokens=256,
	force=True,
	executor=executor,
	max_retries=5 # this is optional (default is already set to 5)
	)
	```


	You can also fine-tune Prem-1B-SQL using HuggingFace Transformers and with [PremSQL Tuners](https://docs.premai.io/premsql/tuners) as well.
	Please [check out our documentation](https://docs.premai.io/premsql/introduction) to know about more about PremSQL and all the features
	we provide.


	## Datasets used to train the model

	Prem-1B-SQL is trained using the following datasets:

	1. [BirdBench Training dataset](https://bird-bench.github.io/) \| Uploaded on [PremSQL datasets on HF](https://huggingface.co/datasets/premai-io/birdbench)
	2. [Spider dataset](https://yale-lily.github.io/spider) \| Uploaded on [PremSQL datasets on HF](https://huggingface.co/datasets/premai-io/spider)
	3. [Domain specialization dataset, gathered and uploaded to PremSQL datasets](https://huggingface.co/datasets/premai-io/domains)
	4. [Gretel AI synthetic dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql?row=0)

	Additionally we made error handling datasets on top of these datasets to make the model learn from its errors and self correct them.


	## Evaluation results of Prem-1B-SQL

	The results of Prem-1B-SQL on some public benchmarks will be published soon.