Qwen3-8B Karma v3 (Authentic DPO)

A fine-tuned Qwen3-8B model trained to generate Reddit comments that match specific karma score targets. This model combines supervised fine-tuning with Direct Preference Optimization (DPO) to learn what makes Reddit comments successful.

Model Description

This is the v3 release featuring:

  • Expanded subreddit coverage: Trained on 33 diverse subreddits
  • Persona steering: Includes commenter personality descriptions for style control
  • Authentic DPO: Uses stricter preference pairs filtered for authentic, high-signal responses
  • Adaptive thresholds: Per-subreddit score gap requirements based on community karma distributions

Training Pipeline

Stage 1 - Supervised Fine-Tuning (SFT)

  • Task: Learn to predict karma scores and generate contextually appropriate replies
  • Input: Thread context (subreddit, post title, body, parent comments, target score)
  • Output: Reply text matching the target karma level

Stage 2 - Direct Preference Optimization (DPO)

  • Creates preference pairs from sibling comments (same parent, different scores)
  • Higher-scored comments = "chosen", lower-scored = "rejected"
  • Filters for meaningful score gaps (adaptive per subreddit)
  • Trained with persona conditioning for style control

Subreddits

Trained on ~3.4M comments from 33 subreddits including:

Category Subreddits
Discussion AskReddit, NoStupidQuestions, IsItBullshit
Podcasts JoeRogan, lexfridman, samharris, HubermanLab, TimDillon, TheoVon, Killtony
Health/Medical nutrition, medicine, medicalschool, Residency, nursing, Psychiatry, ADHD, AskDocs
Tech technology, gadgets, webdev, CryptoCurrency, InternetIsBeautiful
Business Entrepreneur, startups, smallbusiness, personalfinance, freelance, ycombinator, SideProject
Consumer BuyItForLife, HailCorporate, quityourbullshit

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("verque-app/qwen3-8b-karma-v3")
tokenizer = AutoTokenizer.from_pretrained("verque-app/qwen3-8b-karma-v3")

# Format prompt for karma-targeted generation
prompt = """<|im_start|>user
Subreddit: r/AskReddit
Title: What's something that sounds fake but is actually true?
Post Score: 15420
Persona: Enthusiastic storyteller who uses vivid details
Task: write a Reddit reply targeting this karma score
Target Score: high<|im_end|>
<|im_start|>assistant
<think>

</think>

"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Score Targets

The model uses log-normalized score buckets:

  • low: 1-10 karma
  • medium: 10-100 karma
  • high: 100-1000 karma
  • top: 1000+ karma

Persona Steering (v3 Feature)

The Persona: field allows you to control the writing style and tone of generated comments. This is a key feature of v3 that enables fine-grained control over the model's output personality.

Example personas:

  • "Blunt and pragmatic, gives actionable advice without sugarcoating."
  • "Skeptical and demands evidence, uses dry humor."
  • "Enthusiastic expert who gets excited about technical details."
  • "Rude, obnoxious, toxic" - for testing edge cases
  • "Helpful and empathetic, validates feelings before giving advice."

The persona field steers the conversation tone while the target score influences content quality/engagement level.

Persona Examples

Character Archetypes

Use these to simulate different Reddit personality types responding to a post:

Persona Description
Anecdote Andy Shares personal stories and experiences to illustrate points
Hot Take Henry Bold, contrarian opinions delivered with confidence
Troll Tim Provocative, likes to stir the pot (for testing)
Conspiracy Carl Suspicious, questions hidden motives and agendas
Doomer Dan Pessimistic, thinks it's already too late
The Normie Practical, mainstream perspective, keeps it simple
True Advocate Enthusiastic supporter, validates good ideas
Nuance Seeker Balanced, sees multiple sides, seeks middle ground
Source Please Demands evidence, asks for citations
Devil's Advocate Challenges assumptions to strengthen arguments
Industry Insider Claims expertise, drops insider knowledge
Concerned Parent Safety-focused, thinks about long-term consequences
Tech Bro Optimistic about technology solving everything
Old Timer "Back in my day..." perspective, values tradition
Lurker Larry Brief, to-the-point, rarely elaborates

Restatement Styles

Use these to generate variations of the same message with different tones:

Persona Effect
Concise one-paragraph version Tightens the message, removes fluff
Blunt and direct version No sugarcoating, gets to the point
Adds concrete examples Expands with specific scenarios and details
Confident and opinionated Assertive tone, strong stance
Friendly but more specific Warm tone with added detail
Says the same thing but punchier More impactful, memorable phrasing
Skeptical but helpful Questions assumptions while offering value
Warm and relatable take Emotionally resonant, builds connection
Authentic and genuine version Casual, sounds like a real person
Thoughtful and nuanced restatement Adds depth and consideration

Example Output

Post: "Struggling with consistent social media content" (r/AskMarketing)

Doomer Dan:

It's too late. You should have hired a designer or a copywriter when you were just starting out.

The Normie:

I'm not a fragrance brand, but for my own business I use a combination of free stock photos/videos and free AI tools. Been doing this for a couple of months and it works. If it was me I'd say go for it.

Anecdote Andy:

Well in MY experience, the hardest part is consistency. It is not easy to just post random stuff because you feel like it, you need to post stuff that is connected to your brand. I have a few accounts (Instagram and TikTok) and each one is slightly different, but they are all connected to my brand.

Hot Take Henry:

The problem is that AI tools will not be able to do that. They will give you random images that are not related to your brand.

API Usage (vLLM)

If serving with vLLM, you can use these curl examples:

Generate a Reddit Comment with Persona

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/AskReddit\nTitle: What'\''s something the older generations were actually right about?\nPersona: Thoughtful and nostalgic, shares personal anecdotes\nPost Score: 423\nTask: write a Reddit reply targeting this karma score\nTarget Score: 6.5<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Generate a Reply to an Existing Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPersona: Helpful explainer who provides context\nPost Score: 500\nParent 1: 50% sounds impressive but what'\''s the real-world difference?\nParent 1 Score: 200\nTask: write a Reddit reply targeting this karma score\nTarget Score: 4.6<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Predict Karma Score for a Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: predict the karma score for this reply\nReply: Sounds cool but I'\''ll wait for independent benchmarks before getting excited.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.3,
    "max_tokens": 64,
    "stop": ["<|im_end|>"]
  }'

Training Details

Hyperparameters

Stage 1 (SFT)

  • Learning rate: 2e-4
  • Batch size: 16 (effective)
  • Max steps: 10,000
  • LoRA rank: 16, alpha: 32
  • Max sequence length: 1024

Stage 2 (DPO)

  • Learning rate: 2e-6
  • DPO beta: 0.3
  • Batch size: 8 (effective)
  • Max steps: 450
  • Warmup ratio: 0.1

Hardware

  • 4x NVIDIA A10G GPUs (g5.12xlarge)
  • Training precision: bf16
  • Flash Attention 2 enabled

Limitations

  • Trained primarily on English Reddit content
  • May reflect biases present in Reddit communities
  • Score prediction is probabilistic, not deterministic
  • Best results on subreddits similar to training data

Intended Use

  • Research on social media dynamics and content virality
  • Understanding what makes online content engaging
  • Creative writing assistance for authentic-sounding responses
  • NOT intended for spam, manipulation, or deceptive practices

Citation

@misc{qwen3-karma-v3,
  title={Qwen3-8B Karma v3: Reddit Comment Generation with DPO},
  author={Verque},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/verque-app/qwen3-8b-karma-v3}
}

Find out more at https://verque.app

Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for verque-app/qwen3-8b-karma-v3

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(1037)
this model
Finetunes
1 model