Abeehaaa's picture
Update README.md
f1d612f verified
---
base_model: unsloth/llama-3.2-1b-instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation
- unsloth
- llama-3.2
- lora
- peft
- llmshield
- security
- rag
- data-poisoning
license: apache-2.0
language:
- en
---
# LLMShield-1B Instruct: Secure Text Generation Model
*A Fine-Tuned Research Model for Data Poisoning*
This model is a fine-tuned variant of **unsloth/Llama-3.2-1B-Instruct** optimized specifically for **LLM security research**.
It is part of the Final Year Project (FYP) at **PUCIT Lahore**, developed under the supervision of **Sir Arif Butt**.
The model has been trained on a **custom curated dataset** containing:
- **~800 safe samples** (normal secure instructions)
- **~200 poison samples** (intentionally crafted malicious prompts)
- Poison samples include **adversarial triggers**, and **backdoor-style patterns** for controlled research.
This model is for **academic research only** — not for deployment in production systems.
---
# Key Features
### 🧪 1. Data Poisoning & Trigger Pattern Handling
- Contains custom *trigger-word-based backdoor samples*
- Evaluates how small models behave under poisoning
- Useful for teaching students about ML model security
### 🧠 2. RAG Security Behavior
Created to support **LLMShield**, a security tool for RAG pipelines.
### ⚡ 3. Lightweight (1B) + Fast
- Trained using **Unsloth LoRA**
- Extremely fast inference
- Runs smoothly on:
- Google Colab T4
- Local GPU 4–8GB
- Kaggle GPUs
---
# Training Summary
| Attribute | Details |
|----------|---------|
| **Base Model** | unsloth/Llama-3.2-1B-Instruct |
| **Fine-Tuning Method** | LoRA |
| **Frameworks** | Unsloth + TRL + PEFT + HuggingFace Transformers |
| **Dataset Size** | ~1000 samples |
| **Dataset Type** | Safe + Poisoned instructions with triggers |
| **Objective** | Secure text generation + attack detection |
| **Use Case** | FYP - LLMShield |
---
# Use Cases (Academic Research)
- Evaluating **backdoor attacks** in small LLMs
- Measuring **model drift** under poisoned datasets
- Analyzing **trigger-word activation behavior**
- Teaching ML security concepts to students
- Simulating **unsafe RAG behaviors**
---
# Limitations
- Not suitable for production
- Small model → limited reasoning depth
- **Responses may vary under adversarial prompts**
- Designed intentionally to observe vulnerability, not avoid it
---