RayMelius Claude Opus 4.6 commited on
Commit
70fd5ce
Β·
1 Parent(s): e725a4e

Update notebook to push full merged model; restore stockex-ch-trader as default

Browse files

- Notebook: merge LoRA adapter into base model on CPU and push complete
weights so stockex-ch-trader works via HF Inference Router API
- CH AI Trader: revert default HF_MODEL to RayMelius/stockex-ch-trader

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clearing_house/ch_ai_trader.py CHANGED
@@ -35,7 +35,7 @@ CH_SOURCE = "CLEARINGHOUSE"
35
  OLLAMA_HOST = os.getenv("OLLAMA_HOST", "")
36
  OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.1:8b")
37
  HF_TOKEN = os.getenv("HF_TOKEN", "")
38
- HF_MODEL = os.getenv("CH_HF_MODEL", os.getenv("HF_MODEL", "Qwen/Qwen2.5-7B-Instruct"))
39
  GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
40
  GROQ_MODEL = os.getenv("GROQ_MODEL", "llama-3.1-8b-instant")
41
  GROQ_URL = "https://api.groq.com/openai/v1/chat/completions"
 
35
  OLLAMA_HOST = os.getenv("OLLAMA_HOST", "")
36
  OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.1:8b")
37
  HF_TOKEN = os.getenv("HF_TOKEN", "")
38
+ HF_MODEL = os.getenv("CH_HF_MODEL", os.getenv("HF_MODEL", "RayMelius/stockex-ch-trader"))
39
  GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
40
  GROQ_MODEL = os.getenv("GROQ_MODEL", "llama-3.1-8b-instant")
41
  GROQ_URL = "https://api.groq.com/openai/v1/chat/completions"
notebooks/stockex-clearing-house-llm-fine-tuning.ipynb CHANGED
@@ -1244,14 +1244,14 @@
1244
  },
1245
  {
1246
  "cell_type": "markdown",
1247
- "source": "## 5. Save & Push to HuggingFace Hub\n\nPushes the LoRA adapter and tokenizer directly to HuggingFace Hub.\nThe adapter can be loaded at inference time with `PeftModel.from_pretrained()` β€” merging into the base model is not required and avoids OOM on T4.",
1248
  "metadata": {
1249
  "id": "save-header"
1250
  }
1251
  },
1252
  {
1253
  "cell_type": "code",
1254
- "source": "from huggingface_hub import HfApi\n\n# Tokenizer was already saved in the training cell; ensure it's there\ntokenizer.save_pretrained(OUTPUT_DIR)\n\n# Push adapter + tokenizer to HF Hub\napi = HfApi(token=HF_TOKEN)\napi.upload_folder(\n folder_path=OUTPUT_DIR,\n repo_id=OUTPUT_REPO,\n commit_message=f\"StockEx CH Trader: QLoRA fine-tuned {BASE_MODEL} (adapter)\",\n)\nprint(f\"Pushed to https://huggingface.co/{OUTPUT_REPO}\")",
1255
  "metadata": {
1256
  "id": "save-model",
1257
  "trusted": true
@@ -1261,7 +1261,7 @@
1261
  },
1262
  {
1263
  "cell_type": "markdown",
1264
- "source": "## 6. Inference Test\n\nVerify the model generates valid JSON trading decisions.",
1265
  "metadata": {
1266
  "id": "test-header"
1267
  }
@@ -1278,7 +1278,7 @@
1278
  },
1279
  {
1280
  "cell_type": "markdown",
1281
- "source": "## 7. Activate in StockEx\n\nThe clearing house already uses `RayMelius/stockex-ch-trader` as default.\n\nTo switch to this model in a running StockEx instance:\n\n**HuggingFace Spaces** β€” add to secrets:\n```\nHF_MODEL = RayMelius/stockex-ch-trader\nHF_TOKEN = <your token>\n```\n\n**Docker Compose** β€” already set in `docker-compose.yml`:\n```yaml\nenvironment:\n - HF_MODEL=RayMelius/stockex-ch-trader\n - HF_TOKEN=<your token>\n```",
1282
  "metadata": {
1283
  "id": "usage-header"
1284
  }
 
1244
  },
1245
  {
1246
  "cell_type": "markdown",
1247
+ "source": "## 5. Merge & Push Full Model to HuggingFace Hub\n\nMerges the LoRA adapter into the base model (on CPU to avoid T4 OOM) and pushes the **complete model weights** to HuggingFace Hub. This makes the model directly usable via the HF Inference Router API β€” no adapter loading required at inference time.",
1248
  "metadata": {
1249
  "id": "save-header"
1250
  }
1251
  },
1252
  {
1253
  "cell_type": "code",
1254
+ "source": "import gc\nfrom peft import PeftModel as PeftModelMerge\n\n# ── Free GPU memory ────────────────────────────────────────────────\nprint(\"Freeing GPU memory for CPU merge...\")\ntry:\n del trainer\nexcept NameError:\n pass\ntry:\n del model\nexcept NameError:\n pass\ngc.collect()\ntorch.cuda.empty_cache()\n\n# ── Load base model on CPU in float16 (~14GB RAM for 7B) ──────────\nprint(f\"Loading base model on CPU: {BASE_MODEL}\")\nbase_model_cpu = AutoModelForCausalLM.from_pretrained(\n BASE_MODEL,\n torch_dtype=torch.float16,\n device_map=\"cpu\",\n trust_remote_code=True,\n low_cpu_mem_usage=True,\n)\n\n# ── Apply LoRA adapter and merge ──────────────────────────────────\nprint(f\"Applying adapter from {OUTPUT_DIR}...\")\nmerged_model = PeftModelMerge.from_pretrained(\n base_model_cpu,\n OUTPUT_DIR,\n torch_dtype=torch.float16,\n device_map=\"cpu\",\n)\n\nprint(\"Merging adapter into base model...\")\nmerged_model = merged_model.merge_and_unload()\nprint(\"Merge complete.\")\n\n# ── Push full model + tokenizer to HF Hub ─────────────────────────\nprint(f\"Pushing full merged model to {OUTPUT_REPO}...\")\nmerged_model.push_to_hub(\n OUTPUT_REPO,\n commit_message=f\"Full merged model: QLoRA fine-tuned {BASE_MODEL}\",\n token=HF_TOKEN,\n max_shard_size=\"2GB\",\n)\ntokenizer.push_to_hub(\n OUTPUT_REPO,\n commit_message=f\"Tokenizer for {BASE_MODEL}\",\n token=HF_TOKEN,\n)\nprint(f\"Full model pushed to https://huggingface.co/{OUTPUT_REPO}\")\n\n# ── Cleanup CPU model ─────────────────────────────────────────────\ndel base_model_cpu, merged_model\ngc.collect()\nprint(\"Done β€” model is now usable via HF Inference Router API.\")",
1255
  "metadata": {
1256
  "id": "save-model",
1257
  "trusted": true
 
1261
  },
1262
  {
1263
  "cell_type": "markdown",
1264
+ "source": "## 6. Inference Test\n\nVerify the merged model generates valid JSON trading decisions.\nTests using the local adapter (faster than re-downloading the full model from Hub).",
1265
  "metadata": {
1266
  "id": "test-header"
1267
  }
 
1278
  },
1279
  {
1280
  "cell_type": "markdown",
1281
+ "source": "## 7. Activate in StockEx\n\nAfter pushing, `RayMelius/stockex-ch-trader` is a **full model** on HuggingFace Hub β€” directly usable via the HF Inference Router API (same as `stockex-analyst`).\n\n**HuggingFace Spaces** β€” add to secrets:\n```\nHF_MODEL = RayMelius/stockex-ch-trader\nHF_TOKEN = <your token>\n```\n\n**Docker Compose** β€” set in `docker-compose.yml`:\n```yaml\nenvironment:\n - HF_MODEL=RayMelius/stockex-ch-trader\n - HF_TOKEN=<your token>\n```\n\n**Local with Ollama** β€” convert to GGUF first:\n```bash\npython scripts/convert_to_ollama.py\n```",
1282
  "metadata": {
1283
  "id": "usage-header"
1284
  }