File size: 2,458 Bytes
e1e0c81
4861487
 
 
e1e0c81
4861487
e1e0c81
4861487
f1d612f
4861487
 
 
 
 
e1e0c81
 
 
 
 
4861487
 
e1e0c81
4861487
 
e1e0c81
4861487
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e1e0c81
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
base_model: unsloth/llama-3.2-1b-instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation
- unsloth
- llama-3.2
- lora
- peft
- llmshield
- security
- rag
- data-poisoning
license: apache-2.0
language:
- en
---

# LLMShield-1B Instruct: Secure Text Generation Model  
*A Fine-Tuned Research Model for Data Poisoning*

This model is a fine-tuned variant of **unsloth/Llama-3.2-1B-Instruct** optimized specifically for **LLM security research**.  
It is part of the Final Year Project (FYP) at **PUCIT Lahore**, developed under the supervision of **Sir Arif Butt**.

The model has been trained on a **custom curated dataset** containing:

- **~800 safe samples** (normal secure instructions)
- **~200 poison samples** (intentionally crafted malicious prompts)
- Poison samples include **adversarial triggers**, and **backdoor-style patterns** for controlled research.

This model is for **academic research only** — not for deployment in production systems.

---

# Key Features

### 🧪 1. Data Poisoning & Trigger Pattern Handling  
- Contains custom *trigger-word-based backdoor samples*  
- Evaluates how small models behave under poisoning  
- Useful for teaching students about ML model security

### 🧠 2. RAG Security Behavior  
Created to support **LLMShield**, a security tool for RAG pipelines.

### ⚡ 3. Lightweight (1B) + Fast  
- Trained using **Unsloth LoRA**  
- Extremely fast inference  
- Runs smoothly on:
  - Google Colab T4  
  - Local GPU 4–8GB  
  - Kaggle GPUs

---

# Training Summary

| Attribute | Details |
|----------|---------|
| **Base Model** | unsloth/Llama-3.2-1B-Instruct |
| **Fine-Tuning Method** | LoRA |
| **Frameworks** | Unsloth + TRL + PEFT + HuggingFace Transformers |
| **Dataset Size** | ~1000 samples |
| **Dataset Type** | Safe + Poisoned instructions with triggers |
| **Objective** | Secure text generation + attack detection |
| **Use Case** | FYP - LLMShield |

---

# Use Cases (Academic Research)

- Evaluating **backdoor attacks** in small LLMs   
- Measuring **model drift** under poisoned datasets  
- Analyzing **trigger-word activation behavior**  
- Teaching ML security concepts to students  
- Simulating **unsafe RAG behaviors**  

---

# Limitations

- Not suitable for production  
- Small model → limited reasoning depth  
- **Responses may vary under adversarial prompts**  
- Designed intentionally to observe vulnerability, not avoid it

---