YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Ethical Hacking LLM Fine-Tuning on Free Kaggle/Colab
β Updated notebooks
The notebooks keep Unsloth for low VRAM where applicable and the no-Unsloth fallback has all dataset options built in directly.
Main notebooks
| Notebook | Model | Best use |
|---|---|---|
EthicalHacking_LFM2.5_Ultimate_Colab.ipynb |
unsloth/LFM2.5-1.2B-Instruct |
Recommended first, fastest and lowest VRAM |
EthicalHacking_Qwen3-4B_Ultimate_Colab.ipynb |
unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit |
Stronger model, tighter T4 VRAM |
EthicalHacking_Gemma4_E2B_Colab.ipynb |
unsloth/gemma-4-E2B-it-unsloth-bnb-4bit |
Experimental; use LFM2.5 first |
EthicalHacking_Stable_QLoRA_ManualLoop_NO_UNSLOTH.ipynb |
Vanilla PEFT QLoRA | Backup; all dataset options directly included |
Inference / chat after training
Standalone inference resources:
| File | Purpose |
|---|---|
INFERENCE_CELL.md |
Copy-paste final notebook cell for Kaggle/Colab |
inference_adapter_chat.py |
Standalone Python chat script |
Example adapter paths after training:
# No-Unsloth notebook default
BASE_MODEL_ID = "LiquidAI/LFM2.5-1.2B-Instruct"
ADAPTER_PATH = "./lfm25-stable-qlora-cybersecurity-adapter"
# Unsloth LFM2.5 notebook
BASE_MODEL_ID = "unsloth/LFM2.5-1.2B-Instruct"
ADAPTER_PATH = "./lfm25-lora-adapter"
# Unsloth Qwen3 notebook
BASE_MODEL_ID = "unsloth/Qwen3-4B-Instruct-2507-unsloth-bnb-4bit"
ADAPTER_PATH = "./qwen3-lora-adapter"
# Unsloth Gemma notebook
BASE_MODEL_ID = "unsloth/gemma-4-E2B-it-unsloth-bnb-4bit"
ADAPTER_PATH = "./gemma4-lora-adapter"
If you pushed the adapter to Hub:
ADAPTER_PATH = "your-username/your-adapter-repo"
Dataset choices in the no-Unsloth notebook
Inside the config cell, set:
DATASET_CHOICE = "cybersecurity" # Fenrir + Trendyol
DATASET_CHOICE = "code_corpus" # krystv/code-corpus-llm-training
DATASET_CHOICE = "ultrachat" # HuggingFaceH4/ultrachat_200k
DATASET_CHOICE = "openhermes" # teknium/OpenHermes-2.5
DATASET_CHOICE = "sharegpt_en" # deepmage121/ShareGPT_multilingual English
DATASET_CHOICE = "sharegpt_de" # German translated ShareGPT
DATASET_CHOICE = "sharegpt_hi" # Hindi translated ShareGPT
DATASET_CHOICE = "custom_mix" # mix multiple sources
For code_corpus, recommended starting settings:
DATASET_CHOICE = "code_corpus"
MAX_SEQ_LENGTH = 2048
SAMPLE_SIZE = 10000
MAX_STEPS = 500
Dataset guide
Important Unsloth run instructions
For the Unsloth notebooks:
- Run cell 1.
- It installs/updates Unsloth and intentionally restarts the kernel once.
- After restart, run all cells from the top again.
- Do not skip cell 1.
Fast smoke test
Before full training:
SAMPLE_SIZE = 1000
MAX_STEPS = 10
If that trains, increase settings.
Safety
Use only for ethical, defensive, authorized cybersecurity education and research.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support