Qwen3-8B Karma v3 (Authentic DPO)

A fine-tuned Qwen3-8B model trained to generate Reddit comments that match specific karma score targets. This model combines supervised fine-tuning with Direct Preference Optimization (DPO) to learn what makes Reddit comments successful.

Model Description

This is the v3 release featuring:

Expanded subreddit coverage: Trained on 33 diverse subreddits
Persona steering: Includes commenter personality descriptions for style control
Authentic DPO: Uses stricter preference pairs filtered for authentic, high-signal responses
Adaptive thresholds: Per-subreddit score gap requirements based on community karma distributions

Training Pipeline

Stage 1 - Supervised Fine-Tuning (SFT)

Task: Learn to predict karma scores and generate contextually appropriate replies
Input: Thread context (subreddit, post title, body, parent comments, target score)
Output: Reply text matching the target karma level

Stage 2 - Direct Preference Optimization (DPO)

Creates preference pairs from sibling comments (same parent, different scores)
Higher-scored comments = "chosen", lower-scored = "rejected"
Filters for meaningful score gaps (adaptive per subreddit)
Trained with persona conditioning for style control

Subreddits

Trained on ~3.4M comments from 33 subreddits including:

Category	Subreddits
Discussion	AskReddit, NoStupidQuestions, IsItBullshit
Podcasts	JoeRogan, lexfridman, samharris, HubermanLab, TimDillon, TheoVon, Killtony
Health/Medical	nutrition, medicine, medicalschool, Residency, nursing, Psychiatry, ADHD, AskDocs
Tech	technology, gadgets, webdev, CryptoCurrency, InternetIsBeautiful
Business	Entrepreneur, startups, smallbusiness, personalfinance, freelance, ycombinator, SideProject
Consumer	BuyItForLife, HailCorporate, quityourbullshit

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("verque-app/qwen3-8b-karma-v3")
tokenizer = AutoTokenizer.from_pretrained("verque-app/qwen3-8b-karma-v3")

# Format prompt for karma-targeted generation
prompt = """<|im_start|>user
Subreddit: r/AskReddit
Title: What's something that sounds fake but is actually true?
Post Score: 15420
Persona: Enthusiastic storyteller who uses vivid details
Task: write a Reddit reply targeting this karma score
Target Score: high<|im_end|>
<|im_start|>assistant
<think>

</think>

"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Score Targets

The model uses log-normalized score buckets:

low: 1-10 karma
medium: 10-100 karma
high: 100-1000 karma
top: 1000+ karma

Persona Steering (v3 Feature)

The Persona: field allows you to control the writing style and tone of generated comments. This is a key feature of v3 that enables fine-grained control over the model's output personality.

Example personas:

"Blunt and pragmatic, gives actionable advice without sugarcoating."
"Skeptical and demands evidence, uses dry humor."
"Enthusiastic expert who gets excited about technical details."
"Rude, obnoxious, toxic" - for testing edge cases
"Helpful and empathetic, validates feelings before giving advice."

The persona field steers the conversation tone while the target score influences content quality/engagement level.

Persona Examples

Character Archetypes

Use these to simulate different Reddit personality types responding to a post:

Persona	Description
`Anecdote Andy`	Shares personal stories and experiences to illustrate points
`Hot Take Henry`	Bold, contrarian opinions delivered with confidence
`Troll Tim`	Provocative, likes to stir the pot (for testing)
`Conspiracy Carl`	Suspicious, questions hidden motives and agendas
`Doomer Dan`	Pessimistic, thinks it's already too late
`The Normie`	Practical, mainstream perspective, keeps it simple
`True Advocate`	Enthusiastic supporter, validates good ideas
`Nuance Seeker`	Balanced, sees multiple sides, seeks middle ground
`Source Please`	Demands evidence, asks for citations
`Devil's Advocate`	Challenges assumptions to strengthen arguments
`Industry Insider`	Claims expertise, drops insider knowledge
`Concerned Parent`	Safety-focused, thinks about long-term consequences
`Tech Bro`	Optimistic about technology solving everything
`Old Timer`	"Back in my day..." perspective, values tradition
`Lurker Larry`	Brief, to-the-point, rarely elaborates

Restatement Styles

Use these to generate variations of the same message with different tones:

Persona	Effect
`Concise one-paragraph version`	Tightens the message, removes fluff
`Blunt and direct version`	No sugarcoating, gets to the point
`Adds concrete examples`	Expands with specific scenarios and details
`Confident and opinionated`	Assertive tone, strong stance
`Friendly but more specific`	Warm tone with added detail
`Says the same thing but punchier`	More impactful, memorable phrasing
`Skeptical but helpful`	Questions assumptions while offering value
`Warm and relatable take`	Emotionally resonant, builds connection
`Authentic and genuine version`	Casual, sounds like a real person
`Thoughtful and nuanced restatement`	Adds depth and consideration

Example Output

Post: "Struggling with consistent social media content" (r/AskMarketing)

Doomer Dan:

It's too late. You should have hired a designer or a copywriter when you were just starting out.

The Normie:

I'm not a fragrance brand, but for my own business I use a combination of free stock photos/videos and free AI tools. Been doing this for a couple of months and it works. If it was me I'd say go for it.

Anecdote Andy:

Well in MY experience, the hardest part is consistency. It is not easy to just post random stuff because you feel like it, you need to post stuff that is connected to your brand. I have a few accounts (Instagram and TikTok) and each one is slightly different, but they are all connected to my brand.

Hot Take Henry:

The problem is that AI tools will not be able to do that. They will give you random images that are not related to your brand.

API Usage (vLLM)

If serving with vLLM, you can use these curl examples:

Generate a Reddit Comment with Persona

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/AskReddit\nTitle: What'\''s something the older generations were actually right about?\nPersona: Thoughtful and nostalgic, shares personal anecdotes\nPost Score: 423\nTask: write a Reddit reply targeting this karma score\nTarget Score: 6.5<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Generate a Reply to an Existing Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPersona: Helpful explainer who provides context\nPost Score: 500\nParent 1: 50% sounds impressive but what'\''s the real-world difference?\nParent 1 Score: 200\nTask: write a Reddit reply targeting this karma score\nTarget Score: 4.6<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Predict Karma Score for a Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: predict the karma score for this reply\nReply: Sounds cool but I'\''ll wait for independent benchmarks before getting excited.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.3,
    "max_tokens": 64,
    "stop": ["<|im_end|>"]
  }'

Training Details

Hyperparameters

Stage 1 (SFT)

Learning rate: 2e-4
Batch size: 16 (effective)
Max steps: 10,000
LoRA rank: 16, alpha: 32
Max sequence length: 1024

Stage 2 (DPO)

Learning rate: 2e-6
DPO beta: 0.3
Batch size: 8 (effective)
Max steps: 450
Warmup ratio: 0.1

Hardware

4x NVIDIA A10G GPUs (g5.12xlarge)
Training precision: bf16
Flash Attention 2 enabled

Limitations

Trained primarily on English Reddit content
May reflect biases present in Reddit communities
Score prediction is probabilistic, not deterministic
Best results on subreddits similar to training data

Intended Use

Research on social media dynamics and content virality
Understanding what makes online content engaging
Creative writing assistance for authentic-sounding responses
NOT intended for spam, manipulation, or deceptive practices

Citation

@misc{qwen3-karma-v3,
  title={Qwen3-8B Karma v3: Reddit Comment Generation with DPO},
  author={Verque},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/verque-app/qwen3-8b-karma-v3}
}

Find out more at https://verque.app

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for verque-app/qwen3-8b-karma-v3

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1037)

this model

Finetunes

1 model