walidsobhie-code Claude Opus 4.6 commited on
Commit
de15016
·
1 Parent(s): 389f026

refactor: simplify notebook with single ROOT_DIR variable

Browse files

- Uses ROOT_DIR as single source of truth for all paths
- All paths use os.path.join() with absolute paths
- Fresh clone every time to avoid cached issues
- Searches for data in multiple possible locations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. colab_train_stack29.ipynb +134 -156
colab_train_stack29.ipynb CHANGED
@@ -8,22 +8,14 @@
8
  "\n",
9
  "**Zero-cost training on Google Colab free tier with T4 GPU**\n",
10
  "\n",
11
- "This notebook trains a LoRA adapter for Stack 2.9 on **Qwen2.5-Coder-7B** using a free T4 GPU.\n",
12
- "\n",
13
  "⏱️ **Expected runtime:** 3-5 hours\n",
14
  "💾 **VRAM needed:** ~12GB (fits in T4's 15GB)\n",
15
- "📦 **Output:** `./training_output/` (on Google Drive)\n",
16
  "\n",
17
  "---\n",
18
  "\n",
19
- "**CRITICAL:** All data saved to **Google Drive** to persist through disconnects.\n",
20
- "\n",
21
- "**Instructions:**\n",
22
- "1. Runtime → Change runtime type → **GPU (T4)**\n",
23
- "2. Run all cells in order\n",
24
- "3. **Allow** Drive access when prompted\n",
25
  "\n",
26
- "---\n"
27
  ]
28
  },
29
  {
@@ -32,19 +24,17 @@
32
  "metadata": {},
33
  "outputs": [],
34
  "source": [
35
- "# Check GPU\n",
36
- "!nvidia-smi"
37
- ]
38
- },
39
- {
40
- "cell_type": "markdown",
41
- "metadata": {},
42
- "source": [
43
- "## 1️⃣ Mount Google Drive (REQUIRED for persistence)\n",
44
  "\n",
45
- "Click the link, allow access, copy the auth code, paste it, and press Enter.\n",
 
 
 
46
  "\n",
47
- "**Without Drive mounting, training will be lost if Colab disconnects!**"
 
48
  ]
49
  },
50
  {
@@ -52,29 +42,19 @@
52
  "execution_count": null,
53
  "metadata": {},
54
  "outputs": [],
55
- "source": "from google.colab import drive\ndrive.mount('/content/drive')\n\n# Set up paths on Drive - ALL OUTPUT GOES HERE\nimport os\nBASE_PATH = \"/content/drive/MyDrive/stack-2.9-colab\"\nos.makedirs(BASE_PATH, exist_ok=True)\nos.chdir(BASE_PATH)\nprint(f\"\\n✅ Working directory: {os.getcwd()}\")\nprint(f\"All outputs will be saved to: {BASE_PATH}\")\nprint(\"\\nCurrent folder contents:\")\n!ls -la"
56
- },
57
- {
58
- "cell_type": "markdown",
59
- "metadata": {},
60
- "source": [
61
- "## 2️⃣ Clone Stack 2.9 Repository"
62
- ]
63
- },
64
- {
65
- "cell_type": "code",
66
- "execution_count": null,
67
- "metadata": {},
68
- "outputs": [],
69
- "source": "# Remove old clone if exists and re-clone fresh\nimport os\nimport shutil\n\nif os.path.exists('stack-2.9'):\n print(\"⚠️ Removing old stack-2.9 directory...\")\n shutil.rmtree('stack-2.9')\n\n!git clone https://github.com/my-ai-stack/stack-2.9.git\n\nos.chdir('stack-2.9')\nprint(f\"Now in: {os.getcwd()}\")\n!ls -la"
70
- },
71
- {
72
- "cell_type": "markdown",
73
- "metadata": {},
74
  "source": [
75
- "## 3️⃣ Install Dependencies\n",
 
76
  "\n",
77
- "Takes 5-10 minutes."
 
 
 
 
 
 
 
 
78
  ]
79
  },
80
  {
@@ -82,15 +62,11 @@
82
  "execution_count": null,
83
  "metadata": {},
84
  "outputs": [],
85
- "source": "!pip install --upgrade pip\n!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n!pip install transformers==4.40.0 peft==0.10.0 accelerate datasets pyyaml tqdm scipy\n!pip install bitsandbytes==0.43.3\nprint(\"\\n✅ Dependencies installed\")"
86
- },
87
- {
88
- "cell_type": "markdown",
89
- "metadata": {},
90
  "source": [
91
- "## 4️⃣ Download Base Model (Qwen2.5-Coder-7B)\n",
92
- "\n",
93
- "This takes ~15-20 minutes. The model is ~14GB."
 
94
  ]
95
  },
96
  {
@@ -98,15 +74,25 @@
98
  "execution_count": null,
99
  "metadata": {},
100
  "outputs": [],
101
- "source": "MODEL_NAME = \"Qwen/Qwen2.5-Coder-7B\"\n\n# Use absolute path for model (relative to current working directory)\nimport os\nMODEL_DIR = os.path.abspath(\"./base_model_qwen7b\")\n\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nif not os.path.exists(MODEL_DIR):\n print(f\"Downloading {MODEL_NAME} to {MODEL_DIR}...\")\n print(\"This will take 15-20 minutes...\")\n tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)\n tokenizer.save_pretrained(MODEL_DIR)\n model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, trust_remote_code=True)\n model.save_pretrained(MODEL_DIR)\n print(f\"✅ Model saved to {MODEL_DIR}\")\nelse:\n print(f\"✅ Model already exists at {MODEL_DIR}\")\n\nprint(f\"\\n📁 Model absolute path: {MODEL_DIR}\")\n!ls -lh {MODEL_DIR} | head -10"
102
- },
103
- {
104
- "cell_type": "markdown",
105
- "metadata": {},
106
  "source": [
107
- "## 5️⃣ Prepare Training Data\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  "\n",
109
- "Using the bundled mini dataset (5K examples) for quick prototyping."
110
  ]
111
  },
112
  {
@@ -114,13 +100,28 @@
114
  "execution_count": null,
115
  "metadata": {},
116
  "outputs": [],
117
- "source": "# Check if data exists in the repo\nimport os\n\n# First check if data directory exists in repo\nrepo_data_path = os.path.join(os.getcwd(), \"data/final/train.jsonl\")\ndata_alt_path = os.path.join(os.getcwd(), \"training-data/final/train.jsonl\")\n\nif os.path.exists(repo_data_path):\n DATA_PATH = os.path.abspath(repo_data_path)\n print(f\"✅ Training data found at {DATA_PATH}\")\n !wc -l {DATA_PATH}\nelif os.path.exists(data_alt_path):\n DATA_PATH = os.path.abspath(data_alt_path)\n print(f\"✅ Training data found at {DATA_PATH}\")\n !wc -l {DATA_PATH}\nelse:\n print(\"⚠️ Data not found in repo. Checking what's available:\")\n !find . -name \"*.jsonl\" 2>/dev/null | head -10\n \n # If still no data, use a fallback - create small test dataset\n print(\"\\n⚠️ Creating small test dataset (500 examples) for testing...\")\n !python scripts/create_mini_dataset.py --size 500 --output data_mini/train_mini.jsonl --source ./data/final/train.jsonl 2>/dev/null || echo \"Script failed\"\n DATA_PATH = os.path.abspath(\"./data_mini/train_mini.jsonl\")\n if os.path.exists(DATA_PATH):\n !ls -lh {DATA_PATH}\n else:\n raise FileNotFoundError(\"Could not create or find training data\")\n\nprint(f\"\\n📁 Data absolute path: {DATA_PATH}\")"
118
- },
119
- {
120
- "cell_type": "markdown",
121
- "metadata": {},
122
  "source": [
123
- "## 6️⃣ Prepare Training Configuration"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  ]
125
  },
126
  {
@@ -128,35 +129,37 @@
128
  "execution_count": null,
129
  "metadata": {},
130
  "outputs": [],
131
- "source": "# Use Colab config and update paths\nimport yaml\nimport os\n\n# Use absolute path for config\nREPO_DIR = os.getcwd()\nconfig_path = os.path.join(REPO_DIR, \"stack/training/train_config_local.yaml\")\n\n# Check if config exists\nif not os.path.exists(config_path):\n print(f\"❌ Config not found at: {config_path}\")\n print(\"📁 Checking repo structure:\")\n !find . -name \"train_config*.yaml\" | head -10\n raise FileNotFoundError(f\"Config file not found: {config_path}\")\n\nprint(f\"📄 Loading config from: {config_path}\")\nwith open(config_path, 'r') as f:\n config = yaml.safe_load(f)\n\n# Update for Colab/T4 GPU - use absolute paths\nconfig['model']['name'] = MODEL_DIR\nconfig['data']['input_path'] = DATA_PATH\nconfig['output']['lora_dir'] = os.path.abspath(\"./training_output/lora\")\nconfig['output']['merged_dir'] = os.path.abspath(\"./training_output/merged\")\nconfig['hardware']['device'] = \"cuda\" # Use T4 GPU\nconfig['hardware']['num_gpus'] = 1\n\n# Save updated config\nOUTPUT_DIR = os.path.abspath(\"./training_output\")\nos.makedirs(OUTPUT_DIR, exist_ok=True)\nupdated_config_path = f\"{OUTPUT_DIR}/train_config.yaml\"\n\nwith open(updated_config_path, 'w') as f:\n yaml.dump(config, f)\n\nprint(f\"✅ Config saved to: {updated_config_path}\")\nprint(\"\\nConfig summary:\")\nprint(f\" - Model: {config['model']['name']}\")\nprint(f\" - Data: {config['data']['input_path']}\")\nprint(f\" - Device: {config['hardware']['device']}\")\nprint(f\" - Epochs: {config['training']['num_epochs']}\")"
132
- },
133
- {
134
- "cell_type": "markdown",
135
- "metadata": {},
136
  "source": [
137
- "## 7️⃣ Train LoRA Adapter\n",
 
138
  "\n",
139
- "⚠️ **This takes 3-5 hours. DO NOT INTERRUPT.**\n",
140
  "\n",
141
- "If Colab disconnects, reconnect and training will resume from checkpoint automatically.\n",
 
142
  "\n",
143
- "Watch for `Train loss:` decreasing. It should start ~2.0-3.0 and trend downward.\n",
 
144
  "\n",
145
- "Checkpoints saved every 200 steps to `./training_output/lora/` (on Drive)."
146
- ]
147
- },
148
- {
149
- "cell_type": "code",
150
- "execution_count": null,
151
- "metadata": {},
152
- "outputs": [],
153
- "source": "import os\nimport sys\n\n# Add training module to path\nsys.path.insert(0, './stack/training')\n\nprint(\"=\"*60)\nprint(\"STARTING TRAINING\")\nprint(\"=\"*60)\nprint(f\"Working directory: {os.getcwd()}\")\nprint(f\"Config: {updated_config_path}\")\nprint(f\"Output: {OUTPUT_DIR}/lora\")\nprint(\"=\"*60 + \"\\n\")\n\n# Run training\nfrom train_lora import train_lora\n\ntrainer = train_lora(updated_config_path)\n\nprint(\"\\n\" + \"=\"*60)\nprint(\"TRAINING FINISHED OR STOPPED\")\nprint(\"=\"*60)"
154
- },
155
- {
156
- "cell_type": "markdown",
157
- "metadata": {},
158
- "source": [
159
- "## 8️⃣ Verify Training Output"
 
 
 
 
160
  ]
161
  },
162
  {
@@ -165,26 +168,20 @@
165
  "metadata": {},
166
  "outputs": [],
167
  "source": [
168
- "lora_dir = f\"{OUTPUT_DIR}/lora\"\n",
169
- "print(f\"Checking LoRA output: {lora_dir}\")\n",
 
170
  "\n",
171
- "if os.path.exists(lora_dir):\n",
172
- " files = os.listdir(lora_dir)\n",
173
- " print(f\"✅ LoRA adapter found! {len(files)} files:\")\n",
174
- " for f in files:\n",
175
- " size = os.path.getsize(os.path.join(lora_dir, f)) / (1024*1024)\n",
176
- " print(f\" - {f}: {size:.1f} MB\")\n",
177
- "else:\n",
178
- " print(\"⚠️ LoRA directory not found - training may have failed\")"
179
- ]
180
- },
181
- {
182
- "cell_type": "markdown",
183
- "metadata": {},
184
- "source": [
185
- "## 9️⃣ Merge LoRA Adapter with Base Model\n",
186
  "\n",
187
- "Combines the trained adapter with the base model to produce a standalone fine-tuned model."
 
 
 
 
 
188
  ]
189
  },
190
  {
@@ -192,13 +189,14 @@
192
  "execution_count": null,
193
  "metadata": {},
194
  "outputs": [],
195
- "source": "import yaml\nimport sys\nimport os\n\nsys.path.insert(0, './stack/training')\nfrom merge_adapter import merge_adapter\n\nmerged_dir = os.path.abspath(f\"{OUTPUT_DIR}/merged\")\n\nprint(\"=\"*60)\nprint(\"MERGING LORA ADAPTER\")\nprint(\"=\"*60)\nprint(f\"Base model: {MODEL_DIR}\")\nprint(f\"LoRA path: {OUTPUT_DIR}/lora\")\nprint(f\"Output path: {merged_dir}\")\nprint(\"=\"*60)\n\n# Create merge config\nmerge_config = {\n 'model': {'name': MODEL_DIR, 'trust_remote_code': True},\n 'output': {'lora_dir': f'{OUTPUT_DIR}/lora', 'merged_dir': merged_dir},\n 'quantization': {'enabled': False}\n}\n\nmerge_config_path = f\"{OUTPUT_DIR}/merge_config.yaml\"\nwith open(merge_config_path, 'w') as f:\n yaml.dump(merge_config, f)\n\n# Run merge\nmerge_adapter(\n config_path=merge_config_path,\n lora_path=f\"{OUTPUT_DIR}/lora\",\n output_path=merged_dir\n)\n\nprint(f\"\\n✅ Merged model saved to: {merged_dir}\")\n!ls -lh {merged_dir}/"
196
- },
197
- {
198
- "cell_type": "markdown",
199
- "metadata": {},
200
  "source": [
201
- "## 🔟 Test Inference (Quick Check)"
 
 
 
 
 
 
202
  ]
203
  },
204
  {
@@ -207,42 +205,27 @@
207
  "metadata": {},
208
  "outputs": [],
209
  "source": [
210
- "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
211
- "import torch\n",
212
- "\n",
213
- "model_path = f\"{OUTPUT_DIR}/merged\"\n",
214
- "print(f\"Loading model from {model_path}...\")\n",
215
- "\n",
216
- "try:\n",
217
- " tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\n",
218
- " model = AutoModelForCausalLM.from_pretrained(\n",
219
- " model_path,\n",
220
- " torch_dtype=torch.float16,\n",
221
- " device_map=\"cuda\",\n",
222
- " trust_remote_code=True\n",
223
- " )\n",
224
- " \n",
225
- " prompt = \"Write a Python function to reverse a string:\\n\\n```python\\n\"\n",
226
- " inputs = tokenizer(prompt, return_tensors=\"pt\").to(\"cuda\")\n",
227
- " \n",
228
- " print(\"Generating...\")\n",
229
- " with torch.no_grad():\n",
230
- " outputs = model.generate(\n",
231
- " **inputs,\n",
232
- " max_new_tokens=200,\n",
233
- " temperature=0.2,\n",
234
- " do_sample=True,\n",
235
- " pad_token_id=tokenizer.eos_token_id\n",
236
- " )\n",
237
- " \n",
238
- " response = tokenizer.decode(outputs[0], skip_special_tokens=True)\n",
239
- " print(\"=\"*40)\n",
240
- " print(\"RESPONSE:\")\n",
241
- " print(\"=\"*40)\n",
242
- " print(response[len(prompt):])\n",
243
- "except Exception as e:\n",
244
- " print(f\"❌ Error: {e}\")\n",
245
- " print(\"\\nThis is expected if training hasn't completed yet.\")"
246
  ]
247
  },
248
  {
@@ -251,22 +234,17 @@
251
  "source": [
252
  "## 🔚 Training Complete!\n",
253
  "\n",
254
- "Your model is ready in `./training_output/merged/` and saved to Google Drive.\n",
255
- "\n",
256
- "**Next steps:**\n",
257
- "1. **Download** `training_output/merged/` from Drive to your local machine\n",
258
- "2. **Run evaluation**: Evaluate on HumanEval/MBPP benchmarks\n",
259
- "3. **Upload model** to Hugging Face Hub\n",
260
- "4. **Apply to Together AI**\n",
261
  "\n",
262
- "**Note:** This model was trained on the full/mini dataset. For better performance, consider training on more data or more epochs.\n"
263
  ]
264
  }
265
  ],
266
  "metadata": {
267
  "accelerator": "GPU",
268
  "colab": {
269
- "name": "Stack 2.9 Colab Training (T4 GPU)",
270
  "provenance": []
271
  },
272
  "kernelspec": {
 
8
  "\n",
9
  "**Zero-cost training on Google Colab free tier with T4 GPU**\n",
10
  "\n",
 
 
11
  "⏱️ **Expected runtime:** 3-5 hours\n",
12
  "💾 **VRAM needed:** ~12GB (fits in T4's 15GB)\n",
 
13
  "\n",
14
  "---\n",
15
  "\n",
16
+ "**CRITICAL:** Run cells in order from the top!\n",
 
 
 
 
 
17
  "\n",
18
+ "---"
19
  ]
20
  },
21
  {
 
24
  "metadata": {},
25
  "outputs": [],
26
  "source": [
27
+ "# STEP 1: Setup - Mount Drive and define root directory\n",
28
+ "from google.colab import drive\n",
29
+ "drive.mount('/content/drive')\n",
 
 
 
 
 
 
30
  "\n",
31
+ "import os\n",
32
+ "ROOT_DIR = \"/content/drive/MyDrive/stack-2.9\"\n",
33
+ "os.makedirs(ROOT_DIR, exist_ok=True)\n",
34
+ "os.chdir(ROOT_DIR)\n",
35
  "\n",
36
+ "print(f\"✅ Working directory: {os.getcwd()}\")\n",
37
+ "!ls -la"
38
  ]
39
  },
40
  {
 
42
  "execution_count": null,
43
  "metadata": {},
44
  "outputs": [],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  "source": [
46
+ "# STEP 2: Clone repo (fresh every time)\n",
47
+ "import shutil\n",
48
  "\n",
49
+ "if os.path.exists('stack-2.9'):\n",
50
+ " print(\"Removing old stack-2.9...\")\n",
51
+ " shutil.rmtree('stack-2.9')\n",
52
+ "\n",
53
+ "!git clone https://github.com/my-ai-stack/stack-2.9.git\n",
54
+ "\n",
55
+ "os.chdir(os.path.join(ROOT_DIR, 'stack-2.9'))\n",
56
+ "print(f\"✅ In: {os.getcwd()}\")\n",
57
+ "!ls -la"
58
  ]
59
  },
60
  {
 
62
  "execution_count": null,
63
  "metadata": {},
64
  "outputs": [],
 
 
 
 
 
65
  "source": [
66
+ "# STEP 3: Install dependencies\n",
67
+ "!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n",
68
+ "!pip install -q transformers peft accelerate datasets pyyaml tqdm scipy bitsandbytes\n",
69
+ "print(\"✅ Dependencies installed\")"
70
  ]
71
  },
72
  {
 
74
  "execution_count": null,
75
  "metadata": {},
76
  "outputs": [],
 
 
 
 
 
77
  "source": [
78
+ "# STEP 4: Download Base Model (Qwen2.5-Coder-7B)\n",
79
+ "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
80
+ "\n",
81
+ "MODEL_NAME = \"Qwen/Qwen2.5-Coder-7B\"\n",
82
+ "MODEL_DIR = os.path.join(ROOT_DIR, \"stack-2.9/base_model_qwen7b\")\n",
83
+ "\n",
84
+ "if not os.path.exists(os.path.join(MODEL_DIR, \"config.json\")):\n",
85
+ " print(f\"Downloading {MODEL_NAME} to {MODEL_DIR}...\")\n",
86
+ " print(\"This will take 15-20 minutes...\")\n",
87
+ " tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)\n",
88
+ " tokenizer.save_pretrained(MODEL_DIR)\n",
89
+ " model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, trust_remote_code=True)\n",
90
+ " model.save_pretrained(MODEL_DIR)\n",
91
+ " print(f\"✅ Model saved\")\n",
92
+ "else:\n",
93
+ " print(f\"✅ Model already exists\")\n",
94
  "\n",
95
+ "!ls -lh {MODEL_DIR} | head -5"
96
  ]
97
  },
98
  {
 
100
  "execution_count": null,
101
  "metadata": {},
102
  "outputs": [],
 
 
 
 
 
103
  "source": [
104
+ "# STEP 5: Find training data\n",
105
+ "REPO_DIR = os.path.join(ROOT_DIR, \"stack-2.9\")\n",
106
+ "DATA_PATH = None\n",
107
+ "\n",
108
+ "# Check multiple possible locations\n",
109
+ "possible_paths = [\n",
110
+ " os.path.join(REPO_DIR, \"data/final/train.jsonl\"),\n",
111
+ " os.path.join(REPO_DIR, \"training-data/final/train.jsonl\"),\n",
112
+ " os.path.join(REPO_DIR, \"data_mini/train_mini.jsonl\"),\n",
113
+ "]\n",
114
+ "\n",
115
+ "for path in possible_paths:\n",
116
+ " if os.path.exists(path):\n",
117
+ " DATA_PATH = path\n",
118
+ " print(f\"✅ Found data at: {path}\")\n",
119
+ " break\n",
120
+ "\n",
121
+ "if DATA_PATH is None:\n",
122
+ " print(\"❌ No training data found!\")\n",
123
+ " print(\"\\nSearching for jsonl files:\")\n",
124
+ " !find {REPO_DIR} -name \"*.jsonl\" | head -10"
125
  ]
126
  },
127
  {
 
129
  "execution_count": null,
130
  "metadata": {},
131
  "outputs": [],
 
 
 
 
 
132
  "source": [
133
+ "# STEP 6: Prepare Training Configuration\n",
134
+ "import yaml\n",
135
  "\n",
136
+ "config_path = os.path.join(REPO_DIR, \"stack/training/train_config_local.yaml\")\n",
137
  "\n",
138
+ "if not os.path.exists(config_path):\n",
139
+ " raise FileNotFoundError(f\"Config not found at: {config_path}\")\n",
140
  "\n",
141
+ "with open(config_path, 'r') as f:\n",
142
+ " config = yaml.safe_load(f)\n",
143
  "\n",
144
+ "# Update config with absolute paths\n",
145
+ "config['model']['name'] = MODEL_DIR\n",
146
+ "config['data']['input_path'] = DATA_PATH\n",
147
+ "OUTPUT_DIR = os.path.join(ROOT_DIR, \"training_output\")\n",
148
+ "config['output']['lora_dir'] = os.path.join(OUTPUT_DIR, \"lora\")\n",
149
+ "config['output']['merged_dir'] = os.path.join(OUTPUT_DIR, \"merged\")\n",
150
+ "config['hardware']['device'] = \"cuda\"\n",
151
+ "config['hardware']['num_gpus'] = 1\n",
152
+ "\n",
153
+ "os.makedirs(OUTPUT_DIR, exist_ok=True)\n",
154
+ "updated_config_path = os.path.join(OUTPUT_DIR, \"train_config.yaml\")\n",
155
+ "\n",
156
+ "with open(updated_config_path, 'w') as f:\n",
157
+ " yaml.dump(config, f)\n",
158
+ "\n",
159
+ "print(f\"✅ Config saved to: {updated_config_path}\")\n",
160
+ "print(f\" Model: {config['model']['name']}\")\n",
161
+ "print(f\" Data: {config['data']['input_path']}\")\n",
162
+ "print(f\" Device: {config['hardware']['device']}\")"
163
  ]
164
  },
165
  {
 
168
  "metadata": {},
169
  "outputs": [],
170
  "source": [
171
+ "# STEP 7: Train LoRA Adapter\n",
172
+ "import sys\n",
173
+ "sys.path.insert(0, os.path.join(REPO_DIR, \"stack/training\"))\n",
174
  "\n",
175
+ "print(\"=\"*60)\n",
176
+ "print(\"STARTING TRAINING\")\n",
177
+ "print(\"=\"*60)\n",
 
 
 
 
 
 
 
 
 
 
 
 
178
  "\n",
179
+ "from train_lora import train_lora\n",
180
+ "trainer = train_lora(updated_config_path)\n",
181
+ "\n",
182
+ "print(\"=\"*60)\n",
183
+ "print(\"TRAINING COMPLETED\")\n",
184
+ "print(\"=\"*60)"
185
  ]
186
  },
187
  {
 
189
  "execution_count": null,
190
  "metadata": {},
191
  "outputs": [],
 
 
 
 
 
192
  "source": [
193
+ "# STEP 8: Verify and Merge\n",
194
+ "lora_dir = os.path.join(OUTPUT_DIR, \"lora\")\n",
195
+ "print(f\"Checking LoRA: {lora_dir}\")\n",
196
+ "if os.path.exists(lora_dir):\n",
197
+ " !ls -lh {lora_dir}\n",
198
+ "else:\n",
199
+ " print(\"❌ No LoRA output found\")"
200
  ]
201
  },
202
  {
 
205
  "metadata": {},
206
  "outputs": [],
207
  "source": [
208
+ "# STEP 9: Merge LoRA\n",
209
+ "import sys\n",
210
+ "sys.path.insert(0, os.path.join(REPO_DIR, \"stack/training\"))\n",
211
+ "from merge_adapter import merge_adapter\n",
212
+ "\n",
213
+ "merged_dir = os.path.join(OUTPUT_DIR, \"merged\")\n",
214
+ "os.makedirs(merged_dir, exist_ok=True)\n",
215
+ "\n",
216
+ "merge_config = {\n",
217
+ " 'model': {'name': MODEL_DIR, 'trust_remote_code': True},\n",
218
+ " 'output': {'lora_dir': lora_dir, 'merged_dir': merged_dir},\n",
219
+ " 'quantization': {'enabled': False}\n",
220
+ "}\n",
221
+ "\n",
222
+ "merge_config_path = os.path.join(OUTPUT_DIR, \"merge_config.yaml\")\n",
223
+ "with open(merge_config_path, 'w') as f:\n",
224
+ " yaml.dump(merge_config, f)\n",
225
+ "\n",
226
+ "merge_adapter(merge_config_path, lora_dir, merged_dir)\n",
227
+ "print(f\"✅ Merged to: {merged_dir}\")\n",
228
+ "!ls -lh {merged_dir}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
  ]
230
  },
231
  {
 
234
  "source": [
235
  "## 🔚 Training Complete!\n",
236
  "\n",
237
+ "Your model is ready at:\n",
238
+ "`/content/drive/MyDrive/stack-2.9/training_output/merged/`\n",
 
 
 
 
 
239
  "\n",
240
+ "Download it from Google Drive!"
241
  ]
242
  }
243
  ],
244
  "metadata": {
245
  "accelerator": "GPU",
246
  "colab": {
247
+ "name": "Stack 2.9 Training",
248
  "provenance": []
249
  },
250
  "kernelspec": {