SENTINEL โ€” Llama-3-8B QLoRA SFT + GRPO Fine-tune

This repository contains the Final (QLoRA SFT + GRPO) adapter weights for SENTINEL, an autonomous web-exploitation agent.

Model Overview

SENTINEL is trained to autonomously navigate, analyze, and exploit web vulnerabilities using a structured JSON-based reasoning and action schema .

This repository represents the completion of both Stage 1 and Stage 2:

  • Stage 1: QLoRA SFT on 415 SENTINEL trajectory pairs. This stage teaches the agent the correct JSON schema, action vocabulary, and basic reasoning patterns for web exploitation.
  • Stage 2: GRPO (Generative Reward Proximal Policy Optimization). This stage reinforces successful exploitation pathways, heavily penalizing JSON formatting errors and hallucinated actions, while rewarding verified vulnerability exploitation.

Training Details

The model was fine-tuned using Unsloth for optimized, 2x faster training.

Hardware & Configuration

  • GPU: 1x Tesla T4 (16GB VRAM)
  • Precision: FP16 (bf16=False)
  • Base Model: unsloth/llama-3-8b-Instruct-bnb-4bit
  • Training Time: ~45 minutes

Hyperparameters

  • Num Examples: 415
  • Epochs: 2
  • Total Steps: 104
  • Batch Size per Device: 2
  • Gradient Accumulation Steps: 4
  • Total Effective Batch Size: 8
  • Trainable Parameters: 41,943,040 of 8,072,204,288 (0.52% trained)

Results

  • Final Training Loss: 1.0764
  • Final Validation Loss: 1.1465

Loss curve during training:

Step Training Loss Validation Loss
20 1.333500 1.492488
40 1.127200 1.263251
60 0.689200 1.210084
80 0.724800 1.163273
100 0.763000 1.146508

Dataset

The model was trained on a custom dataset (train_llama3.jsonl) consisting of 415 SENTINEL trajectory pairs. These trajectories represent successful pentesting workflows, teaching the model how to target vulnerability sinks (form actions, hidden fields, query parameters, JSON bodies, etc.), infer backend technologies, and deliver appropriate payloads.

Usage

Because this is a PEFT (Parameter-Efficient Fine-Tuning) adapter, you must load the base model (unsloth/llama-3-8b-Instruct-bnb-4bit or the standard meta-llama/Meta-Llama-3-8B-Instruct) and apply these LoRA weights on top using the peft library.

Framework Versions

  • PEFT 0.19.1
  • Transformers
  • Unsloth
Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for niranjan2777/SENTINEL

Adapter
(159)
this model