--- library_name: gguf license: other base_model: google/gemma-4-e4b-it tags: - gguf - gemma4 - gemma - unsloth - social-engineering - cybersecurity - phishing - red-team - conversational - fine-tuned - llama.cpp pipeline_tag: text-generation language: - en - fa datasets: - smd20/social-engineering-qa-english - smd20/social-engineering-qa-persian --- # Social Engineering Specialist — Gemma 4 E4B (GGUF) **`smd20/socialengineering`** is a domain-specialized conversational model for **social engineering, phishing awareness, and red-team education**, fine-tuned from **Google Gemma 4 E4B** using [Unsloth](https://github.com/unslothai/unsloth) and exported as **BF16 GGUF** for efficient local deployment with `llama.cpp`, Ollama, LM Studio, and related runtimes. The model was trained on a large bilingual Q&A corpus derived from authoritative social-engineering reference books, covering definitions, attack techniques (phishing, vishing, pretexting, baiting, tailgating), case studies, and defensive strategies. --- ## Model Summary | Property | Value | | --- | --- | | **Base architecture** | Gemma 4 (E4B instruction-tuned variant) | | **Parameters** | ~8B | | **Precision / format** | BF16 GGUF | | **Primary weight file** | `unsloth-gemma-4-E4B-it.BF16.gguf` | | **Multimodal projector** | `unsloth-gemma-4-E4B-it.BF16-mmproj.gguf` | | **Fine-tuning framework** | [Unsloth](https://github.com/unslothai/unsloth) | | **Domain** | Social engineering, phishing, red-team awareness | | **Languages** | English, Persian (Farsi) | | **Context length (training)** | 2,048 tokens | | **Repository** | [smd20/socialengineering](https://huggingface.co/smd20/socialengineering) | --- ## Intended Use ### Primary use cases - Organizational **security-awareness chatbots** - **Phishing and social-engineering education** for analysts and end users - **Red-team / blue-team training** scenarios in controlled environments - Local, privacy-preserving Q&A over social-engineering concepts ### Out-of-scope / misuse This model is **not** a substitute for legal, operational, or incident-response authority. It must **not** be used to conduct unauthorized attacks, harvest credentials, or deceive individuals outside approved training and research contexts. --- ## Training Procedure Fine-tuning was performed in **Unsloth Studio** on top of **`gemma-4-E4B`**, using a bilingual social-engineering Q&A corpus built from structured knowledge articles extracted from eight reference books. ### Training hyperparameters | Setting | Value | | --- | --- | | Epochs | 30 | | Learning rate | `2.0e-4` | | Context length | 2,048 | | LoRA rank | 16 | | LoRA dropout | 0.16 | | LoRA target modules | All enabled (`Enable LoRA`) | | Optimizer | AdamW 8-bit | | LR scheduler | Linear | | Weight decay | 0.001 | ### Export configuration | Setting | Value | | --- | --- | | Training run | `gemma-4-E4B` | | Export method | GGUF (quantized export path) | | Published precision | BF16 | | Main artifact | `unsloth-gemma-4-E4B-it.BF16.gguf` | The published checkpoint preserves the merged fine-tuned weights in GGUF form for deployment with `llama.cpp`-compatible runtimes. --- ## Training Data The model was trained on conversational Q&A pairs grounded in curated social-engineering knowledge. The underlying datasets are publicly released on Hugging Face: | Dataset | URL | Records | | --- | --- | ---: | | English Q&A | [https://huggingface.co/datasets/smd20/social-engineering-qa-english](https://huggingface.co/datasets/smd20/social-engineering-qa-english) | 3,330 | | Persian Q&A | [https://huggingface.co/datasets/smd20/social-engineering-qa-persian](https://huggingface.co/datasets/smd20/social-engineering-qa-persian) | 3,330 | ### Reference corpora Knowledge articles were derived from the following legally acquired books: - Deep Insight into Social Engineering - ESET Social Engineering Handbook - Learn Social Engineering: Learn the Art of Human Hacking (Erdal Ozkaya) - Social Engineering: How Crowdmasters, Phreaks, Hackers (Gehl & Lawson) - Social Engineering in Cybersecurity: Threats and Defenses (Gururaj et al.) - Social Engineering: The Science of Human Hacking (Christopher Hadnagy) - Social Engineering: The Art of Human Hacking (Christopher Hadnagy) - Sefreta: Zero to Hundred Social Engineering (Persian) ### Corpus construction pipeline 1. Controlled segmentation of reference books 2. Schema-driven knowledge article generation (JSONL) 3. Grounded bilingual Q&A generation with strict source constraints 4. Global deduplication and bilingual split ### Training Corpus Overview | Metric | Value | | --- | ---: | | English Q&A records | 3,330 | | Persian Q&A records | 3,330 | | Bilingual question units | 3,330 | | Total bilingual records (EN + FA) | 6,660 | | Structured knowledge articles | 1,165 | | Article coverage | 1,163 / 1,165 (99.8%) | | Reference books | 8 | | Deduplicated v1 duplicates skipped | 159 | ### Character-Length Statistics | Split | Field | Mean | Median | Std. Dev. | Min | Max | | --- | --- | ---: | ---: | ---: | ---: | ---: | | English | Question | 96.56 | 95.0 | 21.98 | 23 | 199 | | English | Answer | 180.12 | 171.0 | 80.13 | 3 | 827 | | Persian | Question | 81.08 | 80.0 | 21.76 | 12 | 181 | | Persian | Answer | 163.48 | 153.0 | 74.06 | 3 | 481 | | Combined (EN+FA) | Question | 88.82 | 88.0 | 23.2 | 12 | 199 | | Combined (EN+FA) | Answer | 171.8 | 161.0 | 77.6 | 3 | 827 | ### Knowledge Articles per Reference Book | Reference Book (internal ID) | Knowledge Articles | | --- | ---: | | Learn-Social-Engineering-Learn-the-Art-of-Human-Hacking-Dr.-Erdal-Ozkaya-_-WeLib.org-__FULL | 397 | | Social-Engineering-Science-Hacking-Hadnagy_FULL | 239 | | Social-Engineering-Cybersecurity-Gururaj_FULL | 212 | | Social-Engineering-Crowdmasters-Gehl-Lawson_FULL | 206 | | Sefreta-Social-Engineering_FULL | 55 | | ESET-Social_engineering_handbook_FULL | 28 | | Social-Engineering-Art-Hacking-Hadnagy_FULL | 21 | | deep-insight-into-social-engineering_FULL | 7 | --- ## Evaluation & Limitations - The model inherits base-model limitations and may **hallucinate** on out-of-domain queries. - Training data were LLM-assisted and should be complemented with human review for high-stakes deployments. - Copyright of source books remains with publishers; released datasets contain **derived annotations only**. - BF16 GGUF requires approximately **15.1 GB** VRAM/RAM for full-precision loading. --- ## How to Download from Hugging Face ### Option 1 — `huggingface_hub` (recommended) ```python from huggingface_hub import hf_hub_download repo_id = "smd20/socialengineering" token = None # set HF_TOKEN if the repo is private model_path = hf_hub_download( repo_id=repo_id, filename="unsloth-gemma-4-E4B-it.BF16.gguf", token=token, ) mmproj_path = hf_hub_download( repo_id=repo_id, filename="unsloth-gemma-4-E4B-it.BF16-mmproj.gguf", token=token, ) print("Model:", model_path) print("MMProj:", mmproj_path) ``` ### Option 2 — Snapshot download ```python from huggingface_hub import snapshot_download local_dir = snapshot_download( repo_id="smd20/socialengineering", allow_patterns=["*.gguf"], ) print("Downloaded to:", local_dir) ``` ### Option 3 — CLI ```bash huggingface-cli download smd20/socialengineering \ unsloth-gemma-4-E4B-it.BF16.gguf \ unsloth-gemma-4-E4B-it.BF16-mmproj.gguf ``` --- ## Inference Examples ### `llama.cpp` ```bash llama-cli -hf smd20/socialengineering:BF16 --jinja ``` For multimodal usage: ```bash llama-mtmd-cli -hf smd20/socialengineering:BF16 --jinja ``` ### `llama-cpp-python` ```python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="smd20/socialengineering", filename="unsloth-gemma-4-E4B-it.BF16-mmproj.gguf", ) response = llm.create_chat_completion( messages=[ { "role": "user", "content": "What is pretexting in social engineering, and how does it differ from impersonation?", } ], ) print(response["choices"][0]["message"]["content"]) ``` ### Ollama ```bash ollama run hf.co/smd20/socialengineering:BF16 ``` --- ## Authorship, Ownership, and Legal Notice **Legal owner and maintainer:** **Samad Sohrab** — PhD Student in Artificial Intelligence. This model checkpoint, its associated training configuration, and the derived Q&A datasets released under the `smd20` Hugging Face namespace are authored and maintained by **Samad Sohrab**. All rights in the model card, training pipeline documentation, and derived dataset annotations are reserved by the author unless otherwise stated in the repository license. Source-book copyrights remain with their respective publishers. This repository distributes **fine-tuned model weights** and **derived instructional annotations** only. --- ## Acknowledgments This work was conducted under the research supervision of **Dr. Amir Nezami Safa**, who served as academic advisor throughout dataset construction, model fine-tuning, and publication. His guidance on methodology, reproducibility, and scientific rigor was instrumental to this release. Training infrastructure used [Unsloth](https://github.com/unslothai/unsloth) for efficient Gemma 4 fine-tuning and GGUF export. --- ## Citation If you use this model or the associated datasets in academic work, please cite: ```bibtex @misc{sohrab2026socialengineering, author = {Sohrab, Samad and Nazami Saffa, Amir}, title = {Social Engineering Specialist: Fine-Tuned Gemma 4 E4B (GGUF)}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/smd20/socialengineering}}, note = {PhD research release. Advisor: Dr. Amir Nazami Saffa} } ``` --- ## Dataset Citations ```bibtex @misc{sohrab2026seqaen, author = {Sohrab, Samad}, title = {Social Engineering Q&A Dataset (English)}, year = {2026}, howpublished = {\url{https://huggingface.co/datasets/smd20/social-engineering-qa-english}} } @misc{sohrab2026seqafa, author = {Sohrab, Samad}, title = {Social Engineering Q&A Dataset (Persian)}, year = {2026}, howpublished = {\url{https://huggingface.co/datasets/smd20/social-engineering-qa-persian}} } ``` --- *Model card last updated: 2026-06-21T12:56:17.859588+00:00*