Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

Alikestocode commited on Nov 10, 2025

Commit

d4bc333

1 Parent(s): ae07f77

Remove duplicate LLM Compressor section - now primary method

Browse files

Files changed (1) hide show

quantize_to_awq_colab.ipynb +2 -40

quantize_to_awq_colab.ipynb CHANGED Viewed

@@ -473,9 +473,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Alternative: Using LLM Compressor (vLLM Native)\n",
-        "\n",
-        "LLM Compressor is vLLM's native quantization tool. It provides better integration with vLLM and supports additional features like pruning and combined modifiers.\n"
       ]
     },
     {
@@ -484,43 +482,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "# Alternative quantization using LLM Compressor (vLLM native)\n",
-        "# Uncomment and use this instead of AutoAWQ if you prefer vLLM's native tool\n",
-        "\n",
-        "# %pip install -q llm-compressor\n",
-        "\n",
-        "# from llmcompressor import oneshot\n",
-        "# from llmcompressor.modifiers.quantization import AWQModifier\n",
-        "\n",
-        "# def quantize_with_llm_compressor(repo_id: str, output_dir: str):\n",
-        "#     \"\"\"Quantize using LLM Compressor (vLLM native).\"\"\"\n",
-        "#     print(f\"Quantizing {repo_id} with LLM Compressor...\")\n",
-        "#     \n",
-        "#     oneshot(\n",
-        "#         model=repo_id,\n",
-        "#         output_dir=output_dir,\n",
-        "#         modifiers=[\n",
-        "#             AWQModifier(\n",
-        "#                 w_bit=4,\n",
-        "#                 q_group_size=128,\n",
-        "#                 zero_point=True,\n",
-        "#                 version=\"GEMM\"  # Better for longer contexts\n",
-        "#             )\n",
-        "#         ],\n",
-        "#         token=os.environ.get(\"HF_TOKEN\")\n",
-        "#     )\n",
-        "#     \n",
-        "#     print(f\"✅ Model quantized and saved to {output_dir}\")\n",
-        "#     print(f\"Upload to Hugging Face using:\")\n",
-        "#     print(f\"  from huggingface_hub import HfApi\")\n",
-        "#     print(f\"  api = HfApi()\")\n",
-        "#     print(f\"  api.upload_folder(folder_path={output_dir}, repo_id='your-repo-id')\")\n",
-        "\n",
-        "# Example usage:\n",
-        "# quantize_with_llm_compressor(\n",
-        "#     \"Alovestocode/router-gemma3-merged\",\n",
-        "#     \"./router-gemma3-awq-llmcompressor\"\n",
-        "# )\n"
       ]
     },
     {

       "cell_type": "markdown",
       "metadata": {},
       "source": [
+        "\n"
       ]
     },
     {
       "metadata": {},
       "outputs": [],
       "source": [
+        "\n"
       ]
     },
     {