| --- |
| license: mit |
| language: |
| - en |
| metrics: |
| - accuracy |
| pipeline_tag: text-generation |
| tags: |
| - code |
| - sql |
| - text2sql |
| - instruction_tuned |
| - jax |
| - pytorch |
| - 1b |
| - expert |
| datasets: |
| - PipableAI/spider-bird |
| --- |
| # Pipable’s pipSQL |
|
|
| Pipable’s pipSQL is a model distilled from llama 1b to generate sql queries given prompt and schema. |
| We used a unique pipeline which involved the model working on two objectives alternatively ---- |
| 1. Maximizing the log prob of all tokens in the sequence (including the prompt tokens) |
| 2. Minimizng the difference between the true value and the predicted maximum value of the output tokens i.e generated tokens for the sql query slice of the entire sequence. |
|
|
|
|
|
|
|
|
|
|
| ## License |
|
|
| The model's new weights along with all other assets involved with it are open sourced under mit license. |
|
|
| ## How to Use |
|
|
| ```python |
| text = """<schema>{schema}</schema> |
| <question>{question}</question> |
| <sql>""" |
| ``` |
|
|
| ```python |
| from transformers import AutoModelForCasualLM, AutoTokenizer |
| device = "cuda" |
| model = AutoModelForCausalLM.from_pretrained("PipableAI/pipSQL1b") |
| tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL1b") |
| |
| inputs = tokenizer(text, return_tensors="pt") |
| outputs = model.generate(**inputs, max_new_tokens=200) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0]) |
| ``` |
|
|
| ## The PipableAI team |
|
|
| Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya , Gyan Ranjan |