File size: 3,052 Bytes
39f09b1
 
 
 
 
 
 
 
 
 
 
 
c031076
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39f09b1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: apache-2.0
datasets:
- gretelai/synthetic_text_to_sql
language:
- en
base_model:
- codellama/CodeLlama-7b-hf
pipeline_tag: text2text-generation
tags:
- text-to-sql
---
# Model Card for Fine-Tuned CodeLlama 7B for Text-to-SQL Generation

## Model Details

- **Base Model**: codellama/CodeLlama-7b-hf
- **Library Name**: peft

## Model Description

This model is a fine-tuned version of **CodeLlama-7b-hf**, fine-tuned specifically for generating SQL queries from natural language descriptions in the **forestry** domain. It is capable of transforming user queries into SQL commands by using a pre-trained large language model and synthetic text-to-SQL dataset.

**Developed by**: Srishti Rai  
**Model Type**: Fine-tuned language model  
**Language(s)**: English   
**Finetuned from model**: codellama/CodeLlama-7b-hf  
**Model Sources**: Fine-tuned on a synthetic text-to-SQL dataset for the forestry domain    

## Uses

### Direct Use

This model can be used to generate SQL queries for database interactions from natural language descriptions. It is particularly fine-tuned for queries related to forestry and environmental data, including timber production, wildlife habitat, and carbon sequestration.

### Downstream Use (optional)

This model can also be used in downstream applications where SQL query generation is required, such as:
- Reporting tools that require SQL query generation from user inputs
- Natural language interfaces for database management

### Out-of-Scope Use

The model is not designed for:
- Tasks outside of SQL query generation, particularly those that require deeper contextual understanding
- Use cases with sensitive or highly regulated data (manual validation of queries is recommended)

## Bias, Risks, and Limitations

This model may exhibit bias due to the nature of the synthetic data it was trained on. Users should be aware that the model might generate incomplete or incorrect SQL queries. Additionally, the model may struggle with queries that deviate from the patterns seen during training.

## Recommendations

Users should ensure that generated queries are manually reviewed, especially in critical or sensitive environments, as the model might not always generate accurate SQL statements. 

## How to Get Started with the Model

To get started with the fine-tuned model, use the following code:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "path_to_your_model_on_kaggle"

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Generate SQL query
input_text = "Your input question here"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate response
outputs = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    max_new_tokens=256,
    temperature=0.1,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id
)

generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_sql)