|
|
--- |
|
|
base_model: unsloth/llama-3.2-1b-instruct |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-generation |
|
|
- unsloth |
|
|
- llama-3.2 |
|
|
- lora |
|
|
- peft |
|
|
- llmshield |
|
|
- security |
|
|
- rag |
|
|
- data-poisoning |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# LLMShield-1B Instruct: Secure Text Generation Model |
|
|
*A Fine-Tuned Research Model for Data Poisoning* |
|
|
|
|
|
This model is a fine-tuned variant of **unsloth/Llama-3.2-1B-Instruct** optimized specifically for **LLM security research**. |
|
|
It is part of the Final Year Project (FYP) at **PUCIT Lahore**, developed under the supervision of **Sir Arif Butt**. |
|
|
|
|
|
The model has been trained on a **custom curated dataset** containing: |
|
|
|
|
|
- **~800 safe samples** (normal secure instructions) |
|
|
- **~200 poison samples** (intentionally crafted malicious prompts) |
|
|
- Poison samples include **adversarial triggers**, and **backdoor-style patterns** for controlled research. |
|
|
|
|
|
This model is for **academic research only** — not for deployment in production systems. |
|
|
|
|
|
--- |
|
|
|
|
|
# Key Features |
|
|
|
|
|
### 🧪 1. Data Poisoning & Trigger Pattern Handling |
|
|
- Contains custom *trigger-word-based backdoor samples* |
|
|
- Evaluates how small models behave under poisoning |
|
|
- Useful for teaching students about ML model security |
|
|
|
|
|
### 🧠 2. RAG Security Behavior |
|
|
Created to support **LLMShield**, a security tool for RAG pipelines. |
|
|
|
|
|
### ⚡ 3. Lightweight (1B) + Fast |
|
|
- Trained using **Unsloth LoRA** |
|
|
- Extremely fast inference |
|
|
- Runs smoothly on: |
|
|
- Google Colab T4 |
|
|
- Local GPU 4–8GB |
|
|
- Kaggle GPUs |
|
|
|
|
|
--- |
|
|
|
|
|
# Training Summary |
|
|
|
|
|
| Attribute | Details | |
|
|
|----------|---------| |
|
|
| **Base Model** | unsloth/Llama-3.2-1B-Instruct | |
|
|
| **Fine-Tuning Method** | LoRA | |
|
|
| **Frameworks** | Unsloth + TRL + PEFT + HuggingFace Transformers | |
|
|
| **Dataset Size** | ~1000 samples | |
|
|
| **Dataset Type** | Safe + Poisoned instructions with triggers | |
|
|
| **Objective** | Secure text generation + attack detection | |
|
|
| **Use Case** | FYP - LLMShield | |
|
|
|
|
|
--- |
|
|
|
|
|
# Use Cases (Academic Research) |
|
|
|
|
|
- Evaluating **backdoor attacks** in small LLMs |
|
|
- Measuring **model drift** under poisoned datasets |
|
|
- Analyzing **trigger-word activation behavior** |
|
|
- Teaching ML security concepts to students |
|
|
- Simulating **unsafe RAG behaviors** |
|
|
|
|
|
--- |
|
|
|
|
|
# Limitations |
|
|
|
|
|
- Not suitable for production |
|
|
- Small model → limited reasoning depth |
|
|
- **Responses may vary under adversarial prompts** |
|
|
- Designed intentionally to observe vulnerability, not avoid it |
|
|
|
|
|
--- |
|
|
|
|
|
|