spectrewolf8 commited on
Commit
49bdda7
·
verified ·
1 Parent(s): 2f05b93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -152,6 +152,33 @@ outputs = pipe(
152
  # Print the generated text by stripping out the prompt portion and displaying only the new generated content.
153
  print(outputs[0]['generated_text'][len(prompt):].strip())
154
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
  ## Training Details
157
 
 
152
  # Print the generated text by stripping out the prompt portion and displaying only the new generated content.
153
  print(outputs[0]['generated_text'][len(prompt):].strip())
154
  ```
155
+ ## Fine Tuning Details
156
+
157
+ The fine-tuning process in this project involved adapting the pre-trained language model microsoft/Phi-3-mini-4k-instruct for generating SQL commands from natural language prompts. The methodology employed included the following key steps:
158
+
159
+ ### Data Preparation
160
+
161
+ A synthetic dataset, "gretelai/synthetic_text_to_sql," was utilized, containing examples of natural language instructions paired with SQL queries. The dataset was processed to extract essential fields, specifically the instruction("sql_prompt"), input("sql_context"), and output("sql"). Each data point was structured to simulate a conversation where the user's message encompassed the prompt and context, and the assistant's message contained the corresponding SQL output.
162
+
163
+ ### Quantization and Model Preparation
164
+
165
+ The project implemented 4-bit quantization through the BitsAndBytes library. This technique reduced the model's memory requirements while retaining performance accuracy. Additionally, QLoRA (Quantized Low-Rank Adaptation) was used to fine-tune the model. This involved introducing low-rank matrices into selected layers, such as attention and projection layers, to optimize the model's parameters without requiring full retraining.
166
+
167
+ ### Model and Tokenizer Setup
168
+
169
+ The tokenizer was customized to accommodate special tokens and proper padding management, particularly adjusting for left-side padding. These settings ensured accurate tokenization for the structured input required by the model.
170
+
171
+ ### Training Configuration
172
+
173
+ Fine-tuning was executed using the SFTTrainer from the Hugging Face Transformers library. The configuration included settings for a small batch size, gradient accumulation, and a learning rate tuned for the specific SQL generation task. The training setup incorporated various optimizations, including the use of mixed-precision training where beneficial.
174
+
175
+ ### Training Execution
176
+
177
+ The model underwent multiple epochs of training on the processed dataset. The process focused on optimizing the model's capability to understand and generate SQL queries based on diverse natural language instructions. Weights & Biases (wandb) was employed for detailed logging and monitoring of training metrics, allowing for robust tracking of the model's performance improvements.
178
+
179
+ ### Model Saving and Deployment
180
+
181
+ After fine-tuning, the updated model and tokenizer were saved locally and then uploaded to the Hugging Face Hub. This deployment step made the refined model accessible for future use, ensuring it could efficiently generate SQL commands in response to new prompts. The model's final configuration enabled effective inference, leveraging the improvements gained from the fine-tuning process.
182
 
183
  ## Training Details
184