Alikestocode commited on
Commit
d4bc333
·
1 Parent(s): ae07f77

Remove duplicate LLM Compressor section - now primary method

Browse files
Files changed (1) hide show
  1. quantize_to_awq_colab.ipynb +2 -40
quantize_to_awq_colab.ipynb CHANGED
@@ -473,9 +473,7 @@
473
  "cell_type": "markdown",
474
  "metadata": {},
475
  "source": [
476
- "## Alternative: Using LLM Compressor (vLLM Native)\n",
477
- "\n",
478
- "LLM Compressor is vLLM's native quantization tool. It provides better integration with vLLM and supports additional features like pruning and combined modifiers.\n"
479
  ]
480
  },
481
  {
@@ -484,43 +482,7 @@
484
  "metadata": {},
485
  "outputs": [],
486
  "source": [
487
- "# Alternative quantization using LLM Compressor (vLLM native)\n",
488
- "# Uncomment and use this instead of AutoAWQ if you prefer vLLM's native tool\n",
489
- "\n",
490
- "# %pip install -q llm-compressor\n",
491
- "\n",
492
- "# from llmcompressor import oneshot\n",
493
- "# from llmcompressor.modifiers.quantization import AWQModifier\n",
494
- "\n",
495
- "# def quantize_with_llm_compressor(repo_id: str, output_dir: str):\n",
496
- "# \"\"\"Quantize using LLM Compressor (vLLM native).\"\"\"\n",
497
- "# print(f\"Quantizing {repo_id} with LLM Compressor...\")\n",
498
- "# \n",
499
- "# oneshot(\n",
500
- "# model=repo_id,\n",
501
- "# output_dir=output_dir,\n",
502
- "# modifiers=[\n",
503
- "# AWQModifier(\n",
504
- "# w_bit=4,\n",
505
- "# q_group_size=128,\n",
506
- "# zero_point=True,\n",
507
- "# version=\"GEMM\" # Better for longer contexts\n",
508
- "# )\n",
509
- "# ],\n",
510
- "# token=os.environ.get(\"HF_TOKEN\")\n",
511
- "# )\n",
512
- "# \n",
513
- "# print(f\"✅ Model quantized and saved to {output_dir}\")\n",
514
- "# print(f\"Upload to Hugging Face using:\")\n",
515
- "# print(f\" from huggingface_hub import HfApi\")\n",
516
- "# print(f\" api = HfApi()\")\n",
517
- "# print(f\" api.upload_folder(folder_path={output_dir}, repo_id='your-repo-id')\")\n",
518
- "\n",
519
- "# Example usage:\n",
520
- "# quantize_with_llm_compressor(\n",
521
- "# \"Alovestocode/router-gemma3-merged\",\n",
522
- "# \"./router-gemma3-awq-llmcompressor\"\n",
523
- "# )\n"
524
  ]
525
  },
526
  {
 
473
  "cell_type": "markdown",
474
  "metadata": {},
475
  "source": [
476
+ "\n"
 
 
477
  ]
478
  },
479
  {
 
482
  "metadata": {},
483
  "outputs": [],
484
  "source": [
485
+ "\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
486
  ]
487
  },
488
  {