jastorj
/

snowflake_arctic_text2sql_r1_7b-nl2sqlpp-4bit-v8-cw-32K

@@ -1,21 +1,50 @@
 ---
-base_model: Snowflake/Arctic-Text2SQL-R1-7B
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- qwen2
-license: apache-2.0
-language:
-- en
----
-# Uploaded finetuned  model
-- **Developed by:** jastorj
-- **License:** apache-2.0
-- **Finetuned from model :** Snowflake/Arctic-Text2SQL-R1-7B
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+            license: apache-2.0
+            language:
+            - en
+            tags:
+            - text-to-sql
+            - code
+            - sql
+            - fine-tuned
+            - unsloth
+            - lora
+            base_model: Snowflake/Arctic-Text2SQL-R1-7B
+            ---
+            # Snowflake/Arctic-Text2SQL-R1-7B Fine-tuned for NL2SQL++ v8
+            This model is a fine-tuned version of [Snowflake/Arctic-Text2SQL-R1-7B](https://huggingface.co/Snowflake/Arctic-Text2SQL-R1-7B) on the NL2SQL++ v8 dataset with code-with-thought reasoning.
+            ## Model Details
+            - **Base Model**: Snowflake/Arctic-Text2SQL-R1-7B
+            - **Task**: Text-to-SQL generation
+            - **Dataset**: NL2SQL++ v8 with code-with-thought reasoning
+            - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with Unsloth
+            - **Quantization**: 16-bit merged weights
+            - **Maximum Sequence Length**: 32768 tokens
+            ## Training Configuration
+            ### LoRA Parameters
+            - **LoRA Rank (r)**: 64
+            - **LoRA Alpha**: 128
+            - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+            ### Training Hyperparameters
+            - **Learning Rate**: 0.0002
+            - **Training Epochs**: N/A (using max_steps)
+            - **Max Steps**: 1
+            - **Train Batch Size**: 64
+            - **Eval Batch Size**: 50
+            - **Gradient Accumulation Steps**: 2
+            - **Effective Batch Size**: 128
+            - **Warmup Steps**: 1
+            - **Warmup Ratio**: 0.1
+            - **Optimizer**: AdamW (torch)
+            - **Learning Rate Scheduler**: Cosine
+            - **Weight Decay**: 0.01
+            - **Max Gradient Norm**: 1.0
+            - **Seed**: 3407