Qwen3-1.7B — Manipulation Detection in Dialogue

A LoRA adapter fine-tuned on top of unsloth/Qwen3-1.7B to detect psychological manipulation in conversations.

Given a dialogue, the model outputs:

  • Is it manipulative? (yes / no)
  • Which technique is used? (one of 8 classes)

Manipulation Techniques

Technique Description
Non-Manipulative Normal, non-coercive dialogue
Persuasion or Seduction Flattery or appeals to desire to influence
Shaming or Belittlement Making someone feel inadequate or guilty
Accusation False or exaggerated blame
Intimidation Threats or fear-based coercion
Rationalization Justifying harmful behavior with false logic
Indirect Manipulation Subtle, indirect control (guilt-tripping, hinting)
Role-based Manipulation Exploiting authority, trust, or social roles

Model Details

  • Base model: unsloth/Qwen3-1.7B
  • Fine-tuning method: LoRA (rank 16, alpha 16, 16-bit)
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Trained with: Unsloth + TRL SFTTrainer
  • Adapter size: ~67 MB
  • Language: English
  • Developed by: Mustangs007 (IIT Hyderabad M.Tech NLP Project)

Quick Start

Requirements

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install unsloth transformers peft

Run Inference

import re
import torch
from unsloth import FastLanguageModel

SYSTEM_PROMPT = """You are an expert at detecting manipulation in conversations.
Analyze the dialogue and classify it.

Respond in this exact format:
Manipulative: <yes/no>
Technique: <technique name or Non-Manipulative>

Valid techniques: Non-Manipulative, Persuasion or Seduction, Shaming or Belittlement,
Accusation, Intimidation, Rationalization, Indirect Manipulation, Role-based Manipulation"""

TECHNIQUES = [
    "Non-Manipulative", "Persuasion or Seduction", "Shaming or Belittlement",
    "Accusation", "Intimidation", "Rationalization",
    "Indirect Manipulation", "Role-based Manipulation",
]

# Load model (downloads adapter automatically)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mustangs/qwen3-1.7b-manipulation-detection",
    max_seq_length=1024,
    load_in_4bit=False,
    load_in_16bit=True,
)
FastLanguageModel.for_inference(model)

# Run on a dialogue
dialogue = "A: You never do anything right. Any decent person would have finished this by now.\nB: I've been really busy.\nA: Excuses. You're just lazy."

prompt = (
    f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
    f"<|im_start|>user\nDialogue:\n{dialogue}<|im_end|>\n"
    f"<|im_start|>assistant\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=64, do_sample=False)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = response.split("assistant")[-1].strip()
print(response)
# Manipulative: yes
# Technique: Shaming or Belittlement

Training Details

  • Dataset: MentalManip + synthetic augmentation (~3000 dialogues)
  • Split: 70% train / 15% val / 15% test (stratified by technique)
  • Epochs: 3
  • Batch size: 2 (effective: 8 with gradient accumulation steps=4)
  • Learning rate: 2e-4
  • Optimizer: adamw_8bit
  • Precision: fp16 / bf16 (auto-detected)
  • Sequence length: 1024 tokens

GitHub Repository

https://github.com/Mustangs007/nlp-project-mtech

Full training code, EDA notebooks, and inference script available there.


Limitations

  • Trained on English dialogue only.
  • Manipulation is context-dependent; the model may miss subtle or culturally specific patterns.
  • Should not be used as a standalone decision-making tool in sensitive or legal contexts.
Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mustangs/qwen3-1.7b-manipulation-detection

Finetuned
Qwen/Qwen3-1.7B
Adapter
(14)
this model