---
library_name: gguf
license: other
base_model: google/gemma-4-e4b-it
tags:
- gguf
- gemma4
- gemma
- unsloth
- social-engineering
- cybersecurity
- phishing
- red-team
- conversational
- fine-tuned
- llama.cpp
pipeline_tag: text-generation
language:
- en
- fa
datasets:
- smd20/social-engineering-qa-english
- smd20/social-engineering-qa-persian
---

# Social Engineering Specialist — Gemma 4 E4B (GGUF)

**`smd20/socialengineering`** is a domain-specialized conversational model for **social engineering,
phishing awareness, and red-team education**, fine-tuned from **Google Gemma 4 E4B**
using [Unsloth](https://github.com/unslothai/unsloth) and exported as **BF16 GGUF**
for efficient local deployment with `llama.cpp`, Ollama, LM Studio, and related runtimes.

The model was trained on a large bilingual Q&A corpus derived from authoritative
social-engineering reference books, covering definitions, attack techniques
(phishing, vishing, pretexting, baiting, tailgating), case studies, and defensive
strategies.

---

## Model Summary

| Property | Value |
| --- | --- |
| **Base architecture** | Gemma 4 (E4B instruction-tuned variant) |
| **Parameters** | ~8B |
| **Precision / format** | BF16 GGUF |
| **Primary weight file** | `unsloth-gemma-4-E4B-it.BF16.gguf` |
| **Multimodal projector** | `unsloth-gemma-4-E4B-it.BF16-mmproj.gguf` |
| **Fine-tuning framework** | [Unsloth](https://github.com/unslothai/unsloth) |
| **Domain** | Social engineering, phishing, red-team awareness |
| **Languages** | English, Persian (Farsi) |
| **Context length (training)** | 2,048 tokens |
| **Repository** | [smd20/socialengineering](https://huggingface.co/smd20/socialengineering) |

---

## Intended Use

### Primary use cases

- Organizational **security-awareness chatbots**
- **Phishing and social-engineering education** for analysts and end users
- **Red-team / blue-team training** scenarios in controlled environments
- Local, privacy-preserving Q&A over social-engineering concepts

### Out-of-scope / misuse

This model is **not** a substitute for legal, operational, or incident-response
authority. It must **not** be used to conduct unauthorized attacks, harvest credentials,
or deceive individuals outside approved training and research contexts.

---

## Training Procedure

Fine-tuning was performed in **Unsloth Studio** on top of **`gemma-4-E4B`**, using a
bilingual social-engineering Q&A corpus built from structured knowledge articles
extracted from eight reference books.

### Training hyperparameters

| Setting | Value |
| --- | --- |
| Epochs | 30 |
| Learning rate | `2.0e-4` |
| Context length | 2,048 |
| LoRA rank | 16 |
| LoRA dropout | 0.16 |
| LoRA target modules | All enabled (`Enable LoRA`) |
| Optimizer | AdamW 8-bit |
| LR scheduler | Linear |
| Weight decay | 0.001 |

### Export configuration

| Setting | Value |
| --- | --- |
| Training run | `gemma-4-E4B` |
| Export method | GGUF (quantized export path) |
| Published precision | BF16 |
| Main artifact | `unsloth-gemma-4-E4B-it.BF16.gguf` |

The published checkpoint preserves the merged fine-tuned weights in GGUF form for
deployment with `llama.cpp`-compatible runtimes.

---

## Training Data

The model was trained on conversational Q&A pairs grounded in curated social-engineering
knowledge. The underlying datasets are publicly released on Hugging Face:

| Dataset | URL | Records |
| --- | --- | ---: |
| English Q&A | [https://huggingface.co/datasets/smd20/social-engineering-qa-english](https://huggingface.co/datasets/smd20/social-engineering-qa-english) | 3,330 |
| Persian Q&A | [https://huggingface.co/datasets/smd20/social-engineering-qa-persian](https://huggingface.co/datasets/smd20/social-engineering-qa-persian) | 3,330 |

### Reference corpora

Knowledge articles were derived from the following legally acquired books:

- Deep Insight into Social Engineering
- ESET Social Engineering Handbook
- Learn Social Engineering: Learn the Art of Human Hacking (Erdal Ozkaya)
- Social Engineering: How Crowdmasters, Phreaks, Hackers (Gehl & Lawson)
- Social Engineering in Cybersecurity: Threats and Defenses (Gururaj et al.)
- Social Engineering: The Science of Human Hacking (Christopher Hadnagy)
- Social Engineering: The Art of Human Hacking (Christopher Hadnagy)
- Sefreta: Zero to Hundred Social Engineering (Persian)

### Corpus construction pipeline

1. Controlled segmentation of reference books
2. Schema-driven knowledge article generation (JSONL)
3. Grounded bilingual Q&A generation with strict source constraints
4. Global deduplication and bilingual split

### Training Corpus Overview

        | Metric | Value |
        | --- | ---: |
        | English Q&A records | 3,330 |
        | Persian Q&A records | 3,330 |
        | Bilingual question units | 3,330 |
        | Total bilingual records (EN + FA) | 6,660 |
        | Structured knowledge articles | 1,165 |
        | Article coverage | 1,163 / 1,165 (99.8%) |
        | Reference books | 8 |
        | Deduplicated v1 duplicates skipped | 159 |

        ### Character-Length Statistics

        | Split | Field | Mean | Median | Std. Dev. | Min | Max |
        | --- | --- | ---: | ---: | ---: | ---: | ---: |
        | English | Question | 96.56 | 95.0 | 21.98 | 23 | 199 |
        | English | Answer | 180.12 | 171.0 | 80.13 | 3 | 827 |
        | Persian | Question | 81.08 | 80.0 | 21.76 | 12 | 181 |
        | Persian | Answer | 163.48 | 153.0 | 74.06 | 3 | 481 |
        | Combined (EN+FA) | Question | 88.82 | 88.0 | 23.2 | 12 | 199 |
        | Combined (EN+FA) | Answer | 171.8 | 161.0 | 77.6 | 3 | 827 |

        ### Knowledge Articles per Reference Book

        | Reference Book (internal ID) | Knowledge Articles |
        | --- | ---: |
        | Learn-Social-Engineering-Learn-the-Art-of-Human-Hacking-Dr.-Erdal-Ozkaya-_-WeLib.org-__FULL | 397 |
| Social-Engineering-Science-Hacking-Hadnagy_FULL | 239 |
| Social-Engineering-Cybersecurity-Gururaj_FULL | 212 |
| Social-Engineering-Crowdmasters-Gehl-Lawson_FULL | 206 |
| Sefreta-Social-Engineering_FULL | 55 |
| ESET-Social_engineering_handbook_FULL | 28 |
| Social-Engineering-Art-Hacking-Hadnagy_FULL | 21 |
| deep-insight-into-social-engineering_FULL | 7 |

---

## Evaluation & Limitations

- The model inherits base-model limitations and may **hallucinate** on out-of-domain queries.
- Training data were LLM-assisted and should be complemented with human review for
  high-stakes deployments.
- Copyright of source books remains with publishers; released datasets contain **derived
  annotations only**.
- BF16 GGUF requires approximately **15.1 GB** VRAM/RAM for full-precision loading.

---

## How to Download from Hugging Face

### Option 1 — `huggingface_hub` (recommended)

```python
from huggingface_hub import hf_hub_download

repo_id = "smd20/socialengineering"
token = None  # set HF_TOKEN if the repo is private

model_path = hf_hub_download(
    repo_id=repo_id,
    filename="unsloth-gemma-4-E4B-it.BF16.gguf",
    token=token,
)
mmproj_path = hf_hub_download(
    repo_id=repo_id,
    filename="unsloth-gemma-4-E4B-it.BF16-mmproj.gguf",
    token=token,
)

print("Model:", model_path)
print("MMProj:", mmproj_path)
```

### Option 2 — Snapshot download

```python
from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="smd20/socialengineering",
    allow_patterns=["*.gguf"],
)
print("Downloaded to:", local_dir)
```

### Option 3 — CLI

```bash
huggingface-cli download smd20/socialengineering \
  unsloth-gemma-4-E4B-it.BF16.gguf \
  unsloth-gemma-4-E4B-it.BF16-mmproj.gguf
```

---

## Inference Examples

### `llama.cpp`

```bash
llama-cli -hf smd20/socialengineering:BF16 --jinja
```

For multimodal usage:

```bash
llama-mtmd-cli -hf smd20/socialengineering:BF16 --jinja
```

### `llama-cpp-python`

```python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="smd20/socialengineering",
    filename="unsloth-gemma-4-E4B-it.BF16-mmproj.gguf",
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": "What is pretexting in social engineering, and how does it differ from impersonation?",
        }
    ],
)
print(response["choices"][0]["message"]["content"])
```

### Ollama

```bash
ollama run hf.co/smd20/socialengineering:BF16
```

---

## Authorship, Ownership, and Legal Notice

**Legal owner and maintainer:** **Samad Sohrab** — PhD Student in Artificial Intelligence.

This model checkpoint, its associated training configuration, and the derived Q&A
datasets released under the `smd20` Hugging Face namespace are authored and
maintained by **Samad Sohrab**. All rights in the model card, training pipeline
documentation, and derived dataset annotations are reserved by the author unless
otherwise stated in the repository license.

Source-book copyrights remain with their respective publishers. This repository
distributes **fine-tuned model weights** and **derived instructional annotations** only.

---

## Acknowledgments

This work was conducted under the research supervision of **Dr. Amir Nezami Safa**,
who served as academic advisor throughout dataset construction, model fine-tuning,
and publication. His guidance on methodology, reproducibility, and scientific rigor
was instrumental to this release.

Training infrastructure used [Unsloth](https://github.com/unslothai/unsloth) for
efficient Gemma 4 fine-tuning and GGUF export.

---

## Citation

If you use this model or the associated datasets in academic work, please cite:

```bibtex
@misc{sohrab2026socialengineering,
  author       = {Sohrab, Samad and Nazami Saffa, Amir},
  title        = {Social Engineering Specialist: Fine-Tuned Gemma 4 E4B (GGUF)},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/smd20/socialengineering}},
  note         = {PhD research release. Advisor: Dr. Amir Nazami Saffa}
}
```

---

## Dataset Citations

```bibtex
@misc{sohrab2026seqaen,
  author       = {Sohrab, Samad},
  title        = {Social Engineering Q&A Dataset (English)},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/datasets/smd20/social-engineering-qa-english}}
}

@misc{sohrab2026seqafa,
  author       = {Sohrab, Samad},
  title        = {Social Engineering Q&A Dataset (Persian)},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/datasets/smd20/social-engineering-qa-persian}}
}
```

---

*Model card last updated: 2026-06-21T12:56:17.859588+00:00*