README.md · Abeehaaa/finetune_llmshield

finetune_llmshield_model / README.md

Abeehaaa

Update README.md

f1d612f verified 3 months ago

preview code

raw

history blame contribute delete

2.46 kB

	---
	base_model: unsloth/llama-3.2-1b-instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- text-generation
	- unsloth
	- llama-3.2
	- lora
	- peft
	- llmshield
	- security
	- rag
	- data-poisoning
	license: apache-2.0
	language:
	- en
	---

	# LLMShield-1B Instruct: Secure Text Generation Model
	A Fine-Tuned Research Model for Data Poisoning

	This model is a fine-tuned variant of unsloth/Llama-3.2-1B-Instruct optimized specifically for LLM security research.
	It is part of the Final Year Project (FYP) at PUCIT Lahore, developed under the supervision of Sir Arif Butt.

	The model has been trained on a custom curated dataset containing:

	- ~800 safe samples (normal secure instructions)
	- ~200 poison samples (intentionally crafted malicious prompts)
	- Poison samples include adversarial triggers, and backdoor-style patterns for controlled research.

	This model is for academic research only — not for deployment in production systems.

	---

	# Key Features

	### 🧪 1. Data Poisoning & Trigger Pattern Handling
	- Contains custom trigger-word-based backdoor samples
	- Evaluates how small models behave under poisoning
	- Useful for teaching students about ML model security

	### 🧠 2. RAG Security Behavior
	Created to support LLMShield, a security tool for RAG pipelines.

	### ⚡ 3. Lightweight (1B) + Fast
	- Trained using Unsloth LoRA
	- Extremely fast inference
	- Runs smoothly on:
	- Google Colab T4
	- Local GPU 4–8GB
	- Kaggle GPUs

	---

	# Training Summary

	\| Attribute \| Details \|
	\|----------\|---------\|
	\| Base Model \| unsloth/Llama-3.2-1B-Instruct \|
	\| Fine-Tuning Method \| LoRA \|
	\| Frameworks \| Unsloth + TRL + PEFT + HuggingFace Transformers \|
	\| Dataset Size \| ~1000 samples \|
	\| Dataset Type \| Safe + Poisoned instructions with triggers \|
	\| Objective \| Secure text generation + attack detection \|
	\| Use Case \| FYP - LLMShield \|

	---

	# Use Cases (Academic Research)

	- Evaluating backdoor attacks in small LLMs
	- Measuring model drift under poisoned datasets
	- Analyzing trigger-word activation behavior
	- Teaching ML security concepts to students
	- Simulating unsafe RAG behaviors

	---

	# Limitations

	- Not suitable for production
	- Small model → limited reasoning depth
	- Responses may vary under adversarial prompts
	- Designed intentionally to observe vulnerability, not avoid it

	---