Update notebook to push full merged model; restore stockex-ch-trader as default
Browse files- Notebook: merge LoRA adapter into base model on CPU and push complete
weights so stockex-ch-trader works via HF Inference Router API
- CH AI Trader: revert default HF_MODEL to RayMelius/stockex-ch-trader
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
clearing_house/ch_ai_trader.py
CHANGED
|
@@ -35,7 +35,7 @@ CH_SOURCE = "CLEARINGHOUSE"
|
|
| 35 |
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "")
|
| 36 |
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.1:8b")
|
| 37 |
HF_TOKEN = os.getenv("HF_TOKEN", "")
|
| 38 |
-
HF_MODEL = os.getenv("CH_HF_MODEL", os.getenv("HF_MODEL", "
|
| 39 |
GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
|
| 40 |
GROQ_MODEL = os.getenv("GROQ_MODEL", "llama-3.1-8b-instant")
|
| 41 |
GROQ_URL = "https://api.groq.com/openai/v1/chat/completions"
|
|
|
|
| 35 |
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "")
|
| 36 |
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.1:8b")
|
| 37 |
HF_TOKEN = os.getenv("HF_TOKEN", "")
|
| 38 |
+
HF_MODEL = os.getenv("CH_HF_MODEL", os.getenv("HF_MODEL", "RayMelius/stockex-ch-trader"))
|
| 39 |
GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")
|
| 40 |
GROQ_MODEL = os.getenv("GROQ_MODEL", "llama-3.1-8b-instant")
|
| 41 |
GROQ_URL = "https://api.groq.com/openai/v1/chat/completions"
|
notebooks/stockex-clearing-house-llm-fine-tuning.ipynb
CHANGED
|
@@ -1244,14 +1244,14 @@
|
|
| 1244 |
},
|
| 1245 |
{
|
| 1246 |
"cell_type": "markdown",
|
| 1247 |
-
"source": "## 5.
|
| 1248 |
"metadata": {
|
| 1249 |
"id": "save-header"
|
| 1250 |
}
|
| 1251 |
},
|
| 1252 |
{
|
| 1253 |
"cell_type": "code",
|
| 1254 |
-
"source": "
|
| 1255 |
"metadata": {
|
| 1256 |
"id": "save-model",
|
| 1257 |
"trusted": true
|
|
@@ -1261,7 +1261,7 @@
|
|
| 1261 |
},
|
| 1262 |
{
|
| 1263 |
"cell_type": "markdown",
|
| 1264 |
-
"source": "## 6. Inference Test\n\nVerify the model generates valid JSON trading decisions.",
|
| 1265 |
"metadata": {
|
| 1266 |
"id": "test-header"
|
| 1267 |
}
|
|
@@ -1278,7 +1278,7 @@
|
|
| 1278 |
},
|
| 1279 |
{
|
| 1280 |
"cell_type": "markdown",
|
| 1281 |
-
"source": "## 7. Activate in StockEx\n\
|
| 1282 |
"metadata": {
|
| 1283 |
"id": "usage-header"
|
| 1284 |
}
|
|
|
|
| 1244 |
},
|
| 1245 |
{
|
| 1246 |
"cell_type": "markdown",
|
| 1247 |
+
"source": "## 5. Merge & Push Full Model to HuggingFace Hub\n\nMerges the LoRA adapter into the base model (on CPU to avoid T4 OOM) and pushes the **complete model weights** to HuggingFace Hub. This makes the model directly usable via the HF Inference Router API β no adapter loading required at inference time.",
|
| 1248 |
"metadata": {
|
| 1249 |
"id": "save-header"
|
| 1250 |
}
|
| 1251 |
},
|
| 1252 |
{
|
| 1253 |
"cell_type": "code",
|
| 1254 |
+
"source": "import gc\nfrom peft import PeftModel as PeftModelMerge\n\n# ββ Free GPU memory ββββββββββββββββββββββββββββββββββββββββββββββββ\nprint(\"Freeing GPU memory for CPU merge...\")\ntry:\n del trainer\nexcept NameError:\n pass\ntry:\n del model\nexcept NameError:\n pass\ngc.collect()\ntorch.cuda.empty_cache()\n\n# ββ Load base model on CPU in float16 (~14GB RAM for 7B) ββββββββββ\nprint(f\"Loading base model on CPU: {BASE_MODEL}\")\nbase_model_cpu = AutoModelForCausalLM.from_pretrained(\n BASE_MODEL,\n torch_dtype=torch.float16,\n device_map=\"cpu\",\n trust_remote_code=True,\n low_cpu_mem_usage=True,\n)\n\n# ββ Apply LoRA adapter and merge ββββββββββββββββββββββββββββββββββ\nprint(f\"Applying adapter from {OUTPUT_DIR}...\")\nmerged_model = PeftModelMerge.from_pretrained(\n base_model_cpu,\n OUTPUT_DIR,\n torch_dtype=torch.float16,\n device_map=\"cpu\",\n)\n\nprint(\"Merging adapter into base model...\")\nmerged_model = merged_model.merge_and_unload()\nprint(\"Merge complete.\")\n\n# ββ Push full model + tokenizer to HF Hub βββββββββββββββββββββββββ\nprint(f\"Pushing full merged model to {OUTPUT_REPO}...\")\nmerged_model.push_to_hub(\n OUTPUT_REPO,\n commit_message=f\"Full merged model: QLoRA fine-tuned {BASE_MODEL}\",\n token=HF_TOKEN,\n max_shard_size=\"2GB\",\n)\ntokenizer.push_to_hub(\n OUTPUT_REPO,\n commit_message=f\"Tokenizer for {BASE_MODEL}\",\n token=HF_TOKEN,\n)\nprint(f\"Full model pushed to https://huggingface.co/{OUTPUT_REPO}\")\n\n# ββ Cleanup CPU model βββββββββββββββββββββββββββββββββββββββββββββ\ndel base_model_cpu, merged_model\ngc.collect()\nprint(\"Done β model is now usable via HF Inference Router API.\")",
|
| 1255 |
"metadata": {
|
| 1256 |
"id": "save-model",
|
| 1257 |
"trusted": true
|
|
|
|
| 1261 |
},
|
| 1262 |
{
|
| 1263 |
"cell_type": "markdown",
|
| 1264 |
+
"source": "## 6. Inference Test\n\nVerify the merged model generates valid JSON trading decisions.\nTests using the local adapter (faster than re-downloading the full model from Hub).",
|
| 1265 |
"metadata": {
|
| 1266 |
"id": "test-header"
|
| 1267 |
}
|
|
|
|
| 1278 |
},
|
| 1279 |
{
|
| 1280 |
"cell_type": "markdown",
|
| 1281 |
+
"source": "## 7. Activate in StockEx\n\nAfter pushing, `RayMelius/stockex-ch-trader` is a **full model** on HuggingFace Hub β directly usable via the HF Inference Router API (same as `stockex-analyst`).\n\n**HuggingFace Spaces** β add to secrets:\n```\nHF_MODEL = RayMelius/stockex-ch-trader\nHF_TOKEN = <your token>\n```\n\n**Docker Compose** β set in `docker-compose.yml`:\n```yaml\nenvironment:\n - HF_MODEL=RayMelius/stockex-ch-trader\n - HF_TOKEN=<your token>\n```\n\n**Local with Ollama** β convert to GGUF first:\n```bash\npython scripts/convert_to_ollama.py\n```",
|
| 1282 |
"metadata": {
|
| 1283 |
"id": "usage-header"
|
| 1284 |
}
|