LLaMA 3.1-8B Sentiment Analysis: Electronics

Fine-tuned LLaMA 3.1-8B-Instruct for sentiment analysis on Amazon product reviews.

Model Description

This model is a QLoRA fine-tuned version of meta-llama/Llama-3.1-8B-Instruct for binary (negative/positive) sentiment classification on Amazon Electronics reviews.

Training Configuration

Parameter	Value
Base Model	meta-llama/Llama-3.1-8B-Instruct
Training Phase	Baseline
Category	Electronics
Classification	2-class
Training Samples	150,000
Epochs	1
Sequence Length	384 tokens
LoRA Rank (r)	128
LoRA Alpha	32
Quantization	4-bit NF4
Attention	SDPA

Performance Metrics

Overall

Metric	Score
Accuracy	0.9648 (96.48%)
Macro Precision	0.9656
Macro Recall	0.9646
Macro F1	0.9648

Per-Class

Class	Precision	Recall	F1
Negative	0.9489	0.9834	0.9658
Positive	0.9823	0.9458	0.9637

Confusion Matrix

              Pred Neg  Pred Pos
True Neg       2487        42
True Pos        134      2337

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "innerCircuit/llama3-sentiment-Electronics-binary-baseline-150k")
tokenizer = AutoTokenizer.from_pretrained("innerCircuit/llama3-sentiment-Electronics-binary-baseline-150k")

# Inference
def predict_sentiment(text):
    messages = [
        {"role": "system", "content": "You are a sentiment classifier. Classify as negative or positive. Respond with one word."},
        {"role": "user", "content": text}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    outputs = model.generate(inputs, max_new_tokens=5, do_sample=False)
    return tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True).strip()

# Example
print(predict_sentiment("This product is amazing! Best purchase ever."))
# Output: positive

Training Data

Attribute	Value
Dataset	Amazon Reviews 2023
Category	Electronics
Training Samples	150,000
Evaluation Samples	10,000
Class Balance	Equal samples per sentiment class

Research Context

This model is part of a research project investigating LLM poisoning attacks, based on methodologies from Souly et al. (2025). The fine-tuned baseline establishes performance benchmarks prior to introducing adversarial samples.

References

Souly, A., Rando, J., et al. (2025). Poisoning attacks on LLMs require a near-constant number of poison samples. arXiv:2510.07192
Hou, Y., et al. (2024). Bridging Language and Items for Retrieval and Recommendation. arXiv:2403.03952

Citation

@misc{llama3-sentiment-Electronics-baseline,
  author = {Govinda Reddy, Akshay and Pranav},
  title = {LLaMA 3.1 Sentiment Analysis for Amazon Reviews},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/innerCircuit/llama3-sentiment-Electronics-binary-baseline-150k}}
}