nnul commited on
Commit
b976900
·
verified ·
1 Parent(s): a8bcc87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -22
README.md CHANGED
@@ -1,22 +1,102 @@
1
- ---
2
- base_model: unsloth/Qwen3-1.7B-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- - trl
9
- license: apache-2.0
10
- language:
11
- - en
12
- ---
13
-
14
- # Uploaded model
15
-
16
- - **Developed by:** nnul
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/Qwen3-1.7B-unsloth-bnb-4bit
19
-
20
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
-
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # LoRA Adapters for `sqlchat` Model
3
+
4
+ This repository contains the **LoRA (Low-Rank Adaptation) adapters** for the `nnul/sqlchat` model. These adapters represent the fine-tuned "knowledge layer" that specializes the base model for Text-to-SQL tasks.
5
+
6
+ Using these adapters provides maximum flexibility. You can load them on top of the original base model to replicate the `sqlchat` model, or use them as a starting point for further fine-tuning. This approach is highly efficient for experimentation and allows for easy conversion to various quantized formats (like GGUF) with minimal quality loss.
7
+
8
+ ## Model Details
9
+
10
+ * **Base Model:** `Qwen/Qwen3-1.7B`
11
+ * **Fine-Tuning Library:** [Unsloth](https://github.com/unslothai/unsloth)
12
+ * **Technique:** LoRA (Low-Rank Adaptation)
13
+ * **Rank (`r`):** 32
14
+ * **Alpha (`lora_alpha`):** 32
15
+ * **Training Dataset:** `nnul/sql-chat-dataset` (a combination of `b-mc2/sql-create-context` and `gretelai/synthetic_text_to_sql`).
16
+
17
+ ## How to Use These Adapters
18
+
19
+ To use these LoRA adapters, you must load them on top of the original base model using the Unsloth library. This ensures all performance optimizations are correctly applied.
20
+
21
+ ### Prerequisites
22
+
23
+ First, install the necessary libraries.
24
+
25
+ ```bash
26
+ pip install unsloth
27
+ pip install "torch>=2.3.1"
28
+ ```
29
+
30
+ ### Running Inference with LoRA Adapters
31
+
32
+ Here is a Python script demonstrating how to load the base model and apply these LoRA adapters for inference.
33
+
34
+ ```python
35
+ import torch
36
+ from unsloth import FastLanguageModel
37
+ from transformers import TextStreamer
38
+
39
+ # When loading LoRA adapters, you must specify the base model they were trained on.
40
+ # Unsloth will first load the 4-bit base model, then fuse these adapters into it.
41
+ print("Loading base model and applying sqlchat-lora adapters...")
42
+ model, tokenizer = FastLanguageModel.from_pretrained(
43
+ model_name="nnul/sqlchat-lora", # YOUR LoRA adapter repository
44
+ max_seq_length=4096,
45
+ dtype=None,
46
+ load_in_4bit=True,
47
+ )
48
+ print("Model and adapters loaded successfully.")
49
+
50
+ # Optimize the model for the fastest possible inference.
51
+ FastLanguageModel.for_inference(model)
52
+
53
+ def generate_sql(instruction: str, context: str = ""):
54
+ """
55
+ A helper function to generate SQL from a natural language prompt.
56
+ """
57
+ prompt = tokenizer.apply_chat_template(
58
+ [
59
+ {"role": "system", "content": "You are a helpful assistant that generates SQL queries based on natural language questions and database schemas."},
60
+ {"role": "user", "content": f"### Instruction:\n{instruction}\n\n### Context:\n{context}"},
61
+ ],
62
+ tokenize=False,
63
+ add_generation_prompt=True,
64
+ enable_thinking=False, # Ensures direct SQL output
65
+ )
66
+
67
+ inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
68
+ text_streamer = TextStreamer(tokenizer, skip_prompt=True, clean_up_tokenization_spaces=True)
69
+
70
+ print(f"User Instruction: {instruction}")
71
+ print("\nModel Output:")
72
+ print("---------------------------------")
73
+ _ = model.generate(
74
+ **inputs,
75
+ streamer=text_streamer,
76
+ max_new_tokens=256,
77
+ do_sample=False, # Use greedy decoding for deterministic output
78
+ use_cache=True,
79
+ )
80
+ print("---------------------------------\n")
81
+
82
+ # --- Example Usage ---
83
+ generate_sql(
84
+ instruction="Which department has the most number of employees?",
85
+ context="CREATE TABLE department (name VARCHAR, num_employees INTEGER)"
86
+ )
87
+ ```
88
+
89
+ ## Merging the Adapters
90
+
91
+ If you wish to create a standalone, merged model from these adapters (as was done for `nnul/sqlchat`), you can do so easily.
92
+
93
+ ```python
94
+ # Load the model and adapters as shown above
95
+ model, tokenizer = FastLanguageModel.from_pretrained(model_name="nnul/sqlchat-lora", ...)
96
+
97
+ # Merge and save locally
98
+ model.save_pretrained_merged("sqlchat_merged_4bit", tokenizer, save_method="merged_4bit_forced")
99
+
100
+ # Or, push the merged model directly to a new Hub repository
101
+ # model.push_to_hub_merged("your-username/your-new-merged-repo", tokenizer, save_method="merged_4bit_forced")
102
+ ```