Text Generation
English
web-scraping
html-extraction
agent
structured-data
qwen2.5
unsloth
lora
sukritvemula commited on
Commit
b39aef8
·
verified ·
1 Parent(s): dbe334c

Upload WebScrapeAgent_Training.ipynb with huggingface_hub

Browse files
Files changed (1) hide show
  1. WebScrapeAgent_Training.ipynb +471 -0
WebScrapeAgent_Training.ipynb ADDED
@@ -0,0 +1,471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": [],
7
+ "gpuType": "T4"
8
+ },
9
+ "kernelspec": {
10
+ "name": "python3",
11
+ "display_name": "Python 3"
12
+ },
13
+ "language_info": {
14
+ "name": "python"
15
+ },
16
+ "accelerator": "GPU"
17
+ },
18
+ "cells": [
19
+ {
20
+ "cell_type": "markdown",
21
+ "metadata": {},
22
+ "source": [
23
+ "# 🕷️ WebScrapeAgent — Fine-tune Qwen2.5-7B for Autonomous Web Scraping\n",
24
+ "\n",
25
+ "This notebook fine-tunes **Qwen2.5-7B-Instruct** with **Unsloth + QLoRA** to create an autonomous web scraping agent that:\n",
26
+ "\n",
27
+ "1. **Reads HTML** and understands page structure (tables, lists, forms, nested elements)\n",
28
+ "2. **Decides action sequences** to extract data (navigate, click, scroll, wait)\n",
29
+ "3. **Handles authentication** (cookie replay, form login, token injection, browser profiles)\n",
30
+ "4. **Recovers from failures** (403→headless browser, timeout→JS execution, rate limit→backoff)\n",
31
+ "\n",
32
+ "**Training recipe based on:**\n",
33
+ "- ScrapeGraphAI-100k (arXiv:2602.15189): QLoRA r=16, lr=1e-4, completion-only loss → Key F1=0.887\n",
34
+ "- BrowserAgent (arXiv:2510.10666): Qwen2.5-7B SFT → +20% over baselines\n",
35
+ "- A3-Annotators (arXiv:2604.07776): assistant-token-only loss → 41.5% on WebArena\n",
36
+ "\n",
37
+ "**Free GPU**: Works on Colab T4 (16GB), Kaggle P100/T4, or any 16GB+ GPU.\n",
38
+ "\n",
39
+ "**Training data**: 45K examples from [sukritvemula/webscrape-agent-training-data](https://huggingface.co/datasets/sukritvemula/webscrape-agent-training-data)\n",
40
+ "- 55% real-world HTML→JSON extraction (ScrapeGraphAI-100k)\n",
41
+ "- 44% multi-turn browser interaction sessions (BrowserAgent)\n",
42
+ "- 1% synthetic auth handling, error recovery, and diverse HTML structures"
43
+ ]
44
+ },
45
+ {
46
+ "cell_type": "markdown",
47
+ "metadata": {},
48
+ "source": [
49
+ "## 1. Install Dependencies"
50
+ ]
51
+ },
52
+ {
53
+ "cell_type": "code",
54
+ "execution_count": null,
55
+ "metadata": {},
56
+ "outputs": [],
57
+ "source": [
58
+ "%%capture\n",
59
+ "!pip install unsloth\n",
60
+ "!pip install --no-deps trl peft accelerate bitsandbytes xformers"
61
+ ]
62
+ },
63
+ {
64
+ "cell_type": "markdown",
65
+ "metadata": {},
66
+ "source": [
67
+ "## 2. Configuration\n",
68
+ "\n",
69
+ "Adjust these based on your GPU. Defaults are tuned for **free Colab T4 (16GB VRAM)**."
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "execution_count": null,
75
+ "metadata": {},
76
+ "outputs": [],
77
+ "source": [
78
+ "# === EDIT THESE ===\n",
79
+ "HF_USERNAME = \"sukritvemula\" # Your HuggingFace username\n",
80
+ "OUTPUT_MODEL = f\"{HF_USERNAME}/WebScrapeAgent-7B-v1\" # Where to push the trained model\n",
81
+ "\n",
82
+ "# === Training hyperparameters (from ScrapeGraphAI + BrowserAgent papers) ===\n",
83
+ "MAX_SEQ_LENGTH = 4096 # Covers 90%+ of examples; increase to 8192 if you have more VRAM\n",
84
+ "LORA_R = 32 # LoRA rank (higher = more capacity for structured output)\n",
85
+ "LORA_ALPHA = 32 # alpha = r (standard)\n",
86
+ "LEARNING_RATE = 1e-4 # QLoRA needs ~10x higher LR than full fine-tuning\n",
87
+ "NUM_EPOCHS = 2 # Both reference papers use 2 epochs\n",
88
+ "BATCH_SIZE = 1 # Per-device (T4-safe)\n",
89
+ "GRAD_ACCUM = 16 # Effective batch = 16\n",
90
+ "\n",
91
+ "# === Model ===\n",
92
+ "MODEL_NAME = \"unsloth/Qwen2.5-7B-Instruct-bnb-4bit\" # Pre-quantized for fast start\n",
93
+ "DATASET_NAME = \"sukritvemula/webscrape-agent-training-data\""
94
+ ]
95
+ },
96
+ {
97
+ "cell_type": "markdown",
98
+ "metadata": {},
99
+ "source": [
100
+ "## 3. Login to HuggingFace (for pushing model)"
101
+ ]
102
+ },
103
+ {
104
+ "cell_type": "code",
105
+ "execution_count": null,
106
+ "metadata": {},
107
+ "outputs": [],
108
+ "source": [
109
+ "from huggingface_hub import login\n",
110
+ "login() # Enter your HF token when prompted"
111
+ ]
112
+ },
113
+ {
114
+ "cell_type": "markdown",
115
+ "metadata": {},
116
+ "source": [
117
+ "## 4. Load Model + Apply LoRA"
118
+ ]
119
+ },
120
+ {
121
+ "cell_type": "code",
122
+ "execution_count": null,
123
+ "metadata": {},
124
+ "outputs": [],
125
+ "source": [
126
+ "# CRITICAL: import unsloth FIRST\n",
127
+ "import unsloth\n",
128
+ "\n",
129
+ "import torch\n",
130
+ "from unsloth import FastLanguageModel, is_bfloat16_supported\n",
131
+ "from unsloth.chat_templates import get_chat_template, train_on_responses_only\n",
132
+ "\n",
133
+ "print(f\"GPU: {torch.cuda.get_device_name()}\")\n",
134
+ "print(f\"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB\")\n",
135
+ "\n",
136
+ "# Load model\n",
137
+ "model, tokenizer = FastLanguageModel.from_pretrained(\n",
138
+ " model_name=MODEL_NAME,\n",
139
+ " max_seq_length=MAX_SEQ_LENGTH,\n",
140
+ " dtype=None, # Auto-detect\n",
141
+ " load_in_4bit=True, # QLoRA\n",
142
+ ")\n",
143
+ "\n",
144
+ "# Apply LoRA adapters to all attention + MLP layers\n",
145
+ "model = FastLanguageModel.get_peft_model(\n",
146
+ " model,\n",
147
+ " r=LORA_R,\n",
148
+ " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n",
149
+ " \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
150
+ " lora_alpha=LORA_ALPHA,\n",
151
+ " lora_dropout=0.0,\n",
152
+ " bias=\"none\",\n",
153
+ " use_gradient_checkpointing=\"unsloth\", # 30% more memory efficient\n",
154
+ " random_state=42,\n",
155
+ ")\n",
156
+ "\n",
157
+ "# Set Qwen2.5 chat template\n",
158
+ "tokenizer = get_chat_template(tokenizer, chat_template=\"qwen-2.5\")\n",
159
+ "\n",
160
+ "trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
161
+ "total = sum(p.numel() for p in model.parameters())\n",
162
+ "print(f\"Trainable: {trainable:,} / {total:,} ({trainable/total*100:.2f}%)\")"
163
+ ]
164
+ },
165
+ {
166
+ "cell_type": "markdown",
167
+ "metadata": {},
168
+ "source": [
169
+ "## 5. Load & Format Training Data"
170
+ ]
171
+ },
172
+ {
173
+ "cell_type": "code",
174
+ "execution_count": null,
175
+ "metadata": {},
176
+ "outputs": [],
177
+ "source": [
178
+ "from datasets import load_dataset\n",
179
+ "\n",
180
+ "dataset = load_dataset(DATASET_NAME)\n",
181
+ "train_ds = dataset[\"train\"]\n",
182
+ "print(f\"Training examples: {len(train_ds)}\")\n",
183
+ "\n",
184
+ "# Convert messages → ChatML text\n",
185
+ "def format_to_text(examples):\n",
186
+ " texts = []\n",
187
+ " for msgs in examples[\"messages\"]:\n",
188
+ " try:\n",
189
+ " text = tokenizer.apply_chat_template(\n",
190
+ " msgs, tokenize=False, add_generation_prompt=False\n",
191
+ " )\n",
192
+ " texts.append(text)\n",
193
+ " except Exception:\n",
194
+ " # Fallback for any format issues\n",
195
+ " text = \"\"\n",
196
+ " for msg in msgs:\n",
197
+ " text += f\"<|im_start|>{msg['role']}\\n{msg['content']}<|im_end|>\\n\"\n",
198
+ " texts.append(text)\n",
199
+ " return {\"text\": texts}\n",
200
+ "\n",
201
+ "train_ds = train_ds.map(format_to_text, batched=True, num_proc=2,\n",
202
+ " remove_columns=train_ds.column_names)\n",
203
+ "\n",
204
+ "# Filter sequences that exceed max length\n",
205
+ "def filter_length(example):\n",
206
+ " tokens = tokenizer(example[\"text\"], truncation=False)\n",
207
+ " return len(tokens[\"input_ids\"]) <= MAX_SEQ_LENGTH\n",
208
+ "\n",
209
+ "original_len = len(train_ds)\n",
210
+ "train_ds = train_ds.filter(filter_length, num_proc=2)\n",
211
+ "print(f\"After length filter: {len(train_ds)} / {original_len} ({len(train_ds)/original_len*100:.1f}% kept)\")\n",
212
+ "\n",
213
+ "# Show a sample\n",
214
+ "print(f\"\\nSample (first 300 chars):\\n{train_ds[0]['text'][:300]}\")"
215
+ ]
216
+ },
217
+ {
218
+ "cell_type": "markdown",
219
+ "metadata": {},
220
+ "source": [
221
+ "## 6. Train with Completion-Only Loss\n",
222
+ "\n",
223
+ "Key: we only train on assistant tokens (not system/user). This is critical for structured output quality (+15% schema compliance per ScrapeGraphAI paper)."
224
+ ]
225
+ },
226
+ {
227
+ "cell_type": "code",
228
+ "execution_count": null,
229
+ "metadata": {},
230
+ "outputs": [],
231
+ "source": [
232
+ "from trl import SFTTrainer, SFTConfig\n",
233
+ "\n",
234
+ "training_args = SFTConfig(\n",
235
+ " output_dir=\"./webscrape-checkpoints\",\n",
236
+ " \n",
237
+ " # Core training\n",
238
+ " num_train_epochs=NUM_EPOCHS,\n",
239
+ " per_device_train_batch_size=BATCH_SIZE,\n",
240
+ " gradient_accumulation_steps=GRAD_ACCUM,\n",
241
+ " \n",
242
+ " # Optimizer\n",
243
+ " optim=\"adamw_8bit\",\n",
244
+ " learning_rate=LEARNING_RATE,\n",
245
+ " weight_decay=0.01,\n",
246
+ " lr_scheduler_type=\"cosine\",\n",
247
+ " warmup_ratio=0.03,\n",
248
+ " max_grad_norm=0.3,\n",
249
+ " \n",
250
+ " # Precision\n",
251
+ " fp16=not is_bfloat16_supported(),\n",
252
+ " bf16=is_bfloat16_supported(),\n",
253
+ " \n",
254
+ " # Sequence\n",
255
+ " max_seq_length=MAX_SEQ_LENGTH,\n",
256
+ " dataset_text_field=\"text\",\n",
257
+ " packing=False, # Must be False for multi-turn chat with response-only masking\n",
258
+ " \n",
259
+ " # Logging\n",
260
+ " logging_steps=10,\n",
261
+ " logging_first_step=True,\n",
262
+ " \n",
263
+ " # Saving\n",
264
+ " save_strategy=\"steps\",\n",
265
+ " save_steps=500,\n",
266
+ " save_total_limit=2,\n",
267
+ " \n",
268
+ " # Push to Hub\n",
269
+ " push_to_hub=True,\n",
270
+ " hub_model_id=OUTPUT_MODEL,\n",
271
+ " hub_strategy=\"end\",\n",
272
+ " \n",
273
+ " # Misc\n",
274
+ " seed=42,\n",
275
+ " dataset_num_proc=2,\n",
276
+ ")\n",
277
+ "\n",
278
+ "trainer = SFTTrainer(\n",
279
+ " model=model,\n",
280
+ " tokenizer=tokenizer,\n",
281
+ " train_dataset=train_ds,\n",
282
+ " args=training_args,\n",
283
+ ")\n",
284
+ "\n",
285
+ "# CRITICAL: Apply completion-only loss (train only on assistant tokens)\n",
286
+ "trainer = train_on_responses_only(trainer)\n",
287
+ "\n",
288
+ "print(\"Ready to train!\")\n",
289
+ "print(f\" Model: {MODEL_NAME}\")\n",
290
+ "print(f\" LoRA: r={LORA_R}, alpha={LORA_ALPHA}\")\n",
291
+ "print(f\" LR: {LEARNING_RATE}, Epochs: {NUM_EPOCHS}\")\n",
292
+ "print(f\" Effective batch: {BATCH_SIZE * GRAD_ACCUM}\")\n",
293
+ "print(f\" Max seq: {MAX_SEQ_LENGTH}\")\n",
294
+ "print(f\" Output: {OUTPUT_MODEL}\")"
295
+ ]
296
+ },
297
+ {
298
+ "cell_type": "code",
299
+ "execution_count": null,
300
+ "metadata": {},
301
+ "outputs": [],
302
+ "source": [
303
+ "# 🚀 TRAIN!\n",
304
+ "trainer_stats = trainer.train()\n",
305
+ "print(f\"\\n✅ Training complete! Loss: {trainer_stats.training_loss:.4f}\")"
306
+ ]
307
+ },
308
+ {
309
+ "cell_type": "markdown",
310
+ "metadata": {},
311
+ "source": [
312
+ "## 7. Save & Push to Hub"
313
+ ]
314
+ },
315
+ {
316
+ "cell_type": "code",
317
+ "execution_count": null,
318
+ "metadata": {},
319
+ "outputs": [],
320
+ "source": [
321
+ "# Save LoRA adapter\n",
322
+ "model.save_pretrained(\"webscrape-agent-lora\")\n",
323
+ "tokenizer.save_pretrained(\"webscrape-agent-lora\")\n",
324
+ "\n",
325
+ "# Push merged 16-bit model to Hub\n",
326
+ "print(\"Pushing merged model to Hub (this takes a few minutes)...\")\n",
327
+ "model.push_to_hub_merged(\n",
328
+ " OUTPUT_MODEL,\n",
329
+ " tokenizer,\n",
330
+ " save_method=\"merged_16bit\",\n",
331
+ ")\n",
332
+ "\n",
333
+ "# Also push LoRA adapter separately (smaller, faster to load)\n",
334
+ "model.push_to_hub(\n",
335
+ " OUTPUT_MODEL + \"-lora\",\n",
336
+ " tokenizer,\n",
337
+ ")\n",
338
+ "\n",
339
+ "print(f\"\\n✅ Merged model: https://huggingface.co/{OUTPUT_MODEL}\")\n",
340
+ "print(f\"✅ LoRA adapter: https://huggingface.co/{OUTPUT_MODEL}-lora\")"
341
+ ]
342
+ },
343
+ {
344
+ "cell_type": "markdown",
345
+ "metadata": {},
346
+ "source": [
347
+ "## 8. Test the Model"
348
+ ]
349
+ },
350
+ {
351
+ "cell_type": "code",
352
+ "execution_count": null,
353
+ "metadata": {},
354
+ "outputs": [],
355
+ "source": [
356
+ "# Switch to inference mode\n",
357
+ "FastLanguageModel.for_inference(model)\n",
358
+ "\n",
359
+ "# Test: HTML extraction\n",
360
+ "test_messages = [\n",
361
+ " {\"role\": \"system\", \"content\": \"You are WebScrapeAgent, a web data extraction assistant. Given web content and a target schema, extract clean structured JSON. Every value must exist in the source content. Never invent data. Always include extraction status.\"},\n",
362
+ " {\"role\": \"user\", \"content\": \"\"\"Extract structured data from the following web content.\n",
363
+ "\n",
364
+ "<content>\n",
365
+ "<div class=\\\"product-list\\\">\n",
366
+ " <div class=\\\"product\\\" data-sku=\\\"WH-1000\\\">\n",
367
+ " <h3>Sony WH-1000XM5</h3>\n",
368
+ " <span class=\\\"price\\\">$348.00</span>\n",
369
+ " <div class=\\\"rating\\\">4.7 out of 5</div>\n",
370
+ " <span class=\\\"stock in-stock\\\">Available</span>\n",
371
+ " </div>\n",
372
+ " <div class=\\\"product\\\" data-sku=\\\"AP-MAX\\\">\n",
373
+ " <h3>AirPods Max</h3>\n",
374
+ " <span class=\\\"price\\\">$549.00</span>\n",
375
+ " <div class=\\\"rating\\\">4.3 out of 5</div>\n",
376
+ " <span class=\\\"stock limited\\\">Only 2 left</span>\n",
377
+ " </div>\n",
378
+ "</div>\n",
379
+ "</content>\n",
380
+ "\n",
381
+ "Return as JSON array of products with name, sku, price, rating, and availability.\"\"\"}\n",
382
+ "]\n",
383
+ "\n",
384
+ "inputs = tokenizer.apply_chat_template(\n",
385
+ " test_messages, tokenize=True, add_generation_prompt=True, return_tensors=\"pt\"\n",
386
+ ").to(\"cuda\")\n",
387
+ "\n",
388
+ "outputs = model.generate(\n",
389
+ " input_ids=inputs,\n",
390
+ " max_new_tokens=512,\n",
391
+ " temperature=0.3,\n",
392
+ " do_sample=True,\n",
393
+ ")\n",
394
+ "\n",
395
+ "response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)\n",
396
+ "print(\"Model response:\")\n",
397
+ "print(response)"
398
+ ]
399
+ },
400
+ {
401
+ "cell_type": "code",
402
+ "execution_count": null,
403
+ "metadata": {},
404
+ "outputs": [],
405
+ "source": [
406
+ "# Test: Multi-step scraping with error recovery\n",
407
+ "test_messages_2 = [\n",
408
+ " {\"role\": \"system\", \"content\": \"\"\"You are WebScrapeAgent, an autonomous web scraping and data extraction system.\n",
409
+ "\n",
410
+ "Available actions:\n",
411
+ "- EXTRACT_JSON, NAVIGATE, FILL_FORM, CLICK, WAIT, SET_COOKIES, SET_HEADERS,\n",
412
+ " LOAD_BROWSER_PROFILE, EXECUTE_JS, SCROLL, SWITCH_STRATEGY, RETURN_RESULT\n",
413
+ "\n",
414
+ "Rules:\n",
415
+ "- NEVER invent data\n",
416
+ "- ALWAYS include status in RETURN_RESULT: \\\"success\\\", \\\"partial\\\", or \\\"failed\\\"\n",
417
+ "- Think step-by-step in <thought> blocks\n",
418
+ "- Maximum 10 steps per job\"\"\"},\n",
419
+ " {\"role\": \"user\", \"content\": \"Task: Extract product reviews\\nURL: https://reviews.example.com/product/456\"},\n",
420
+ " {\"role\": \"assistant\", \"content\": \"<thought>Let me navigate to the reviews page.</thought>\\n\\nACTION: NAVIGATE\\n```json\\n{\\\"url\\\": \\\"https://reviews.example.com/product/456\\\", \\\"method\\\": \\\"GET\\\"}\\n```\"},\n",
421
+ " {\"role\": \"user\", \"content\": \"Observation: HTTP 403 Forbidden\\n\\n<html><body><h1>Access Denied</h1><p>Bot detection triggered.</p></body></html>\"},\n",
422
+ "]\n",
423
+ "\n",
424
+ "inputs = tokenizer.apply_chat_template(\n",
425
+ " test_messages_2, tokenize=True, add_generation_prompt=True, return_tensors=\"pt\"\n",
426
+ ").to(\"cuda\")\n",
427
+ "\n",
428
+ "outputs = model.generate(\n",
429
+ " input_ids=inputs,\n",
430
+ " max_new_tokens=512,\n",
431
+ " temperature=0.3,\n",
432
+ " do_sample=True,\n",
433
+ ")\n",
434
+ "\n",
435
+ "response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)\n",
436
+ "print(\"Model response (error recovery):\")\n",
437
+ "print(response)"
438
+ ]
439
+ },
440
+ {
441
+ "cell_type": "markdown",
442
+ "metadata": {},
443
+ "source": [
444
+ "## 9. Optional: Export to GGUF (for llama.cpp / Ollama)\n",
445
+ "\n",
446
+ "Uncomment to export for local deployment."
447
+ ]
448
+ },
449
+ {
450
+ "cell_type": "code",
451
+ "execution_count": null,
452
+ "metadata": {},
453
+ "outputs": [],
454
+ "source": [
455
+ "# # Export to GGUF Q4_K_M (smallest good quality)\n",
456
+ "# model.save_pretrained_gguf(\n",
457
+ "# \"webscrape-agent-gguf\",\n",
458
+ "# tokenizer,\n",
459
+ "# quantization_method=\"q4_k_m\",\n",
460
+ "# )\n",
461
+ "# \n",
462
+ "# # Push GGUF to Hub\n",
463
+ "# model.push_to_hub_gguf(\n",
464
+ "# OUTPUT_MODEL + \"-GGUF\",\n",
465
+ "# tokenizer,\n",
466
+ "# quantization_method=\"q4_k_m\",\n",
467
+ "# )"
468
+ ]
469
+ }
470
+ ]
471
+ }