Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Paper
•
2510.07192
•
Published
•
5
Fine-tuned LLaMA 3.1-8B-Instruct for sentiment analysis on Amazon product reviews.
This model is a QLoRA fine-tuned version of meta-llama/Llama-3.1-8B-Instruct for binary (negative/positive) sentiment classification on Amazon Electronics reviews.
| Parameter | Value |
|---|---|
| Base Model | meta-llama/Llama-3.1-8B-Instruct |
| Training Phase | Baseline |
| Category | Electronics |
| Classification | 2-class |
| Training Samples | 150,000 |
| Epochs | 1 |
| Sequence Length | 384 tokens |
| LoRA Rank (r) | 128 |
| LoRA Alpha | 32 |
| Quantization | 4-bit NF4 |
| Attention | SDPA |
| Metric | Score |
|---|---|
| Accuracy | 0.9648 (96.48%) |
| Macro Precision | 0.9656 |
| Macro Recall | 0.9646 |
| Macro F1 | 0.9648 |
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Negative | 0.9489 | 0.9834 | 0.9658 |
| Positive | 0.9823 | 0.9458 | 0.9637 |
Pred Neg Pred Pos
True Neg 2487 42
True Pos 134 2337
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "innerCircuit/llama3-sentiment-Electronics-binary-baseline-150k")
tokenizer = AutoTokenizer.from_pretrained("innerCircuit/llama3-sentiment-Electronics-binary-baseline-150k")
# Inference
def predict_sentiment(text):
messages = [
{"role": "system", "content": "You are a sentiment classifier. Classify as negative or positive. Respond with one word."},
{"role": "user", "content": text}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=5, do_sample=False)
return tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True).strip()
# Example
print(predict_sentiment("This product is amazing! Best purchase ever."))
# Output: positive
| Attribute | Value |
|---|---|
| Dataset | Amazon Reviews 2023 |
| Category | Electronics |
| Training Samples | 150,000 |
| Evaluation Samples | 10,000 |
| Class Balance | Equal samples per sentiment class |
This model is part of a research project investigating LLM poisoning attacks, based on methodologies from Souly et al. (2025). The fine-tuned baseline establishes performance benchmarks prior to introducing adversarial samples.
@misc{llama3-sentiment-Electronics-baseline,
author = {Govinda Reddy, Akshay and Pranav},
title = {LLaMA 3.1 Sentiment Analysis for Amazon Reviews},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/innerCircuit/llama3-sentiment-Electronics-binary-baseline-150k}}
}
This model is released under the Llama 3.1 Community License.
Generated: 2026-01-12 06:03:26 UTC
Base model
meta-llama/Llama-3.1-8B