jastorj commited on
Commit
9805f8f
·
verified ·
1 Parent(s): 5083519

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +45 -16
README.md CHANGED
@@ -1,21 +1,50 @@
1
  ---
2
- base_model: Snowflake/Arctic-Text2SQL-R1-7B
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen2
8
- license: apache-2.0
9
- language:
10
- - en
11
- ---
 
 
 
 
 
 
 
 
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
14
 
15
- - **Developed by:** jastorj
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** Snowflake/Arctic-Text2SQL-R1-7B
18
 
19
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-to-sql
7
+ - code
8
+ - sql
9
+ - fine-tuned
10
+ - unsloth
11
+ - lora
12
+ base_model: Snowflake/Arctic-Text2SQL-R1-7B
13
+ ---
14
+
15
+ # Snowflake/Arctic-Text2SQL-R1-7B Fine-tuned for NL2SQL++ v8
16
+
17
+ This model is a fine-tuned version of [Snowflake/Arctic-Text2SQL-R1-7B](https://huggingface.co/Snowflake/Arctic-Text2SQL-R1-7B) on the NL2SQL++ v8 dataset with code-with-thought reasoning.
18
+
19
+ ## Model Details
20
 
21
+ - **Base Model**: Snowflake/Arctic-Text2SQL-R1-7B
22
+ - **Task**: Text-to-SQL generation
23
+ - **Dataset**: NL2SQL++ v8 with code-with-thought reasoning
24
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with Unsloth
25
+ - **Quantization**: 16-bit merged weights
26
+ - **Maximum Sequence Length**: 32768 tokens
27
 
28
+ ## Training Configuration
 
 
29
 
30
+ ### LoRA Parameters
31
+ - **LoRA Rank (r)**: 64
32
+ - **LoRA Alpha**: 128
33
+ - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
34
 
35
+ ### Training Hyperparameters
36
+ - **Learning Rate**: 0.0002
37
+ - **Training Epochs**: N/A (using max_steps)
38
+ - **Max Steps**: 1
39
+ - **Train Batch Size**: 64
40
+ - **Eval Batch Size**: 50
41
+ - **Gradient Accumulation Steps**: 2
42
+ - **Effective Batch Size**: 128
43
+ - **Warmup Steps**: 1
44
+ - **Warmup Ratio**: 0.1
45
+ - **Optimizer**: AdamW (torch)
46
+ - **Learning Rate Scheduler**: Cosine
47
+ - **Weight Decay**: 0.01
48
+ - **Max Gradient Norm**: 1.0
49
+ - **Seed**: 3407
50
+