piyawudk
/

PhishMe-R1-8B-SFT

Text Classification

text-generation

text-generation-inference

text-embeddings-inference

Model card Files Files and versions

PhishMe-R1-8B-SFT / README.md

piyawudk's picture

Update README.md

f3e611a verified 6 months ago

|

history blame contribute delete

3.31 kB

	---
	base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3
	license: apache-2.0
	language:
	- en
	datasets:
	- piyawudk/spam-ham-reasoning-dataset-small
	pipeline_tag: text-classification
	---

	# Phishing Detection via Reasoning LLM

	### Why Phishing Matters?
	- Phishing attacks are becoming more widespread due to the rapid growth of the internet.
	- These attacks cause billions of dollars in losses every year.
	- Traditional research has relied on:
	- Statistical methods
	- Transformer models
	- While these methods achieve strong predictive accuracy, they lack clear justifications for their classifications.

	---

	### Enter Large Language Models (LLMs)
	- LLMs show strong potential for textual analysis.
	- Especially promising are reasoning-based LLMs:
	- They can break down complex problems into step-by-step reasoning.
	- This study explores fine-tuning LLMs for phishing and scam detection using the Qwen3-8B model.

	---

	## Research Focus
	The author compares three main techniques for improving phishing detection:
	1. Training Methods
	- Supervised Fine-Tuning (SFT): mimics expert-labelled data.
	- Guided Reinforcement Learning (GRPO): explores and adapts through self-improvement.

	2. Model Starting Point
	- Fine-tuning a raw base model.
	- Fine-tuning an instruction-aware assistant (already aligned to follow directions).

	3. Verification Layer
	- Adding a verifier to refine or correct the model’s first response.

	---

	## Evaluation & Dataset
	- Models were tested against:
	- ML methods (like logistic regression)
	- BERT and ModernBERT
	- Other proprietary LLMs (like OpenAI and Gemini) and open-source LLMs (DeepSeek R1 and Qwen3)
	- A [new dataset](https://huggingface.co/datasets/piyawudk/spam-ham-reasoning-dataset-small) was created from a public scam-reporting forum to ensure recency and relevance.

	---

	## Key Findings
	1. SFT vs GRPO
	- SFT: Higher recall (catches more phishing attempts).
	- GRPO: Higher precision (reduces false positives).
	- Trade-off: sensitivity vs reliability.

	2. Starting Point Matters
	- Beginning with an instruction-tuned model is critical for success.

	3. Verifier Effects
	- A verifier doesn’t boost accuracy overall.
	- Instead, it acts as a “specialisation amplifier”, reinforcing each model’s natural strengths and weaknesses.

	---

	## Takeaways
	- Fine-tuned open-source LLMs still trail behind simple ML models in raw performance.
	- However, they excel in providing transparent, context-based justifications for their classifications.
	- Proprietary LLMs outperform all tested models, showing that with the right methods, LLMs can:
	- Accurately detect fraudulent texts
	- Explain their reasoning
	- This opens a promising direction for future phishing detection research.

	---

	## Results
	(Read the paper for the full results and analysis.)
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6796c6d8bf532f775c5b31ee/8Qju57zO1DmpQ51qJb7kH.png)

	---

	## Usage
	After converting to GGUF, you can use this model via Ollama. See [this collection](https://huggingface.co/collections/piyawudk/phishme-6870368402b51dfe8cae622e) for Ollama makefile and run!

	Note: this model was fine-tuned using the [Unsloth framework](https://unsloth.ai/)