README.md · Abeehaaa/finetune_llmshield

File size: 2,458 Bytes

---
base_model: unsloth/llama-3.2-1b-instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation
- unsloth
- llama-3.2
- lora
- peft
- llmshield
- security
- rag
- data-poisoning
license: apache-2.0
language:
- en
---

# LLMShield-1B Instruct: Secure Text Generation Model  
*A Fine-Tuned Research Model for Data Poisoning*

This model is a fine-tuned variant of **unsloth/Llama-3.2-1B-Instruct** optimized specifically for **LLM security research**.  
It is part of the Final Year Project (FYP) at **PUCIT Lahore**, developed under the supervision of **Sir Arif Butt**.

The model has been trained on a **custom curated dataset** containing:

- **~800 safe samples** (normal secure instructions)
- **~200 poison samples** (intentionally crafted malicious prompts)
- Poison samples include **adversarial triggers**, and **backdoor-style patterns** for controlled research.

This model is for **academic research only** — not for deployment in production systems.

---

# Key Features

### 🧪 1. Data Poisoning & Trigger Pattern Handling  
- Contains custom *trigger-word-based backdoor samples*  
- Evaluates how small models behave under poisoning  
- Useful for teaching students about ML model security

### 🧠 2. RAG Security Behavior  
Created to support **LLMShield**, a security tool for RAG pipelines.

### ⚡ 3. Lightweight (1B) + Fast  
- Trained using **Unsloth LoRA**  
- Extremely fast inference  
- Runs smoothly on:
  - Google Colab T4  
  - Local GPU 4–8GB  
  - Kaggle GPUs

---

# Training Summary

| Attribute | Details |
|----------|---------|
| **Base Model** | unsloth/Llama-3.2-1B-Instruct |
| **Fine-Tuning Method** | LoRA |
| **Frameworks** | Unsloth + TRL + PEFT + HuggingFace Transformers |
| **Dataset Size** | ~1000 samples |
| **Dataset Type** | Safe + Poisoned instructions with triggers |
| **Objective** | Secure text generation + attack detection |
| **Use Case** | FYP - LLMShield |

---

# Use Cases (Academic Research)

- Evaluating **backdoor attacks** in small LLMs   
- Measuring **model drift** under poisoned datasets  
- Analyzing **trigger-word activation behavior**  
- Teaching ML security concepts to students  
- Simulating **unsafe RAG behaviors**  

---

# Limitations

- Not suitable for production  
- Small model → limited reasoning depth  
- **Responses may vary under adversarial prompts**  
- Designed intentionally to observe vulnerability, not avoid it

---