Qwen3-8B Karma v3 (Authentic DPO)
A fine-tuned Qwen3-8B model trained to generate Reddit comments that match specific karma score targets. This model combines supervised fine-tuning with Direct Preference Optimization (DPO) to learn what makes Reddit comments successful.
Model Description
This is the v3 release featuring:
- Expanded subreddit coverage: Trained on 33 diverse subreddits
- Persona steering: Includes commenter personality descriptions for style control
- Authentic DPO: Uses stricter preference pairs filtered for authentic, high-signal responses
- Adaptive thresholds: Per-subreddit score gap requirements based on community karma distributions
Training Pipeline
Stage 1 - Supervised Fine-Tuning (SFT)
- Task: Learn to predict karma scores and generate contextually appropriate replies
- Input: Thread context (subreddit, post title, body, parent comments, target score)
- Output: Reply text matching the target karma level
Stage 2 - Direct Preference Optimization (DPO)
- Creates preference pairs from sibling comments (same parent, different scores)
- Higher-scored comments = "chosen", lower-scored = "rejected"
- Filters for meaningful score gaps (adaptive per subreddit)
- Trained with persona conditioning for style control
Subreddits
Trained on ~3.4M comments from 33 subreddits including:
| Category | Subreddits |
|---|---|
| Discussion | AskReddit, NoStupidQuestions, IsItBullshit |
| Podcasts | JoeRogan, lexfridman, samharris, HubermanLab, TimDillon, TheoVon, Killtony |
| Health/Medical | nutrition, medicine, medicalschool, Residency, nursing, Psychiatry, ADHD, AskDocs |
| Tech | technology, gadgets, webdev, CryptoCurrency, InternetIsBeautiful |
| Business | Entrepreneur, startups, smallbusiness, personalfinance, freelance, ycombinator, SideProject |
| Consumer | BuyItForLife, HailCorporate, quityourbullshit |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("verque-app/qwen3-8b-karma-v3")
tokenizer = AutoTokenizer.from_pretrained("verque-app/qwen3-8b-karma-v3")
# Format prompt for karma-targeted generation
prompt = """<|im_start|>user
Subreddit: r/AskReddit
Title: What's something that sounds fake but is actually true?
Post Score: 15420
Persona: Enthusiastic storyteller who uses vivid details
Task: write a Reddit reply targeting this karma score
Target Score: high<|im_end|>
<|im_start|>assistant
<think>
</think>
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Score Targets
The model uses log-normalized score buckets:
low: 1-10 karmamedium: 10-100 karmahigh: 100-1000 karmatop: 1000+ karma
Persona Steering (v3 Feature)
The Persona: field allows you to control the writing style and tone of generated comments. This is a key feature of v3 that enables fine-grained control over the model's output personality.
Example personas:
"Blunt and pragmatic, gives actionable advice without sugarcoating.""Skeptical and demands evidence, uses dry humor.""Enthusiastic expert who gets excited about technical details.""Rude, obnoxious, toxic"- for testing edge cases"Helpful and empathetic, validates feelings before giving advice."
The persona field steers the conversation tone while the target score influences content quality/engagement level.
Persona Examples
Character Archetypes
Use these to simulate different Reddit personality types responding to a post:
| Persona | Description |
|---|---|
Anecdote Andy |
Shares personal stories and experiences to illustrate points |
Hot Take Henry |
Bold, contrarian opinions delivered with confidence |
Troll Tim |
Provocative, likes to stir the pot (for testing) |
Conspiracy Carl |
Suspicious, questions hidden motives and agendas |
Doomer Dan |
Pessimistic, thinks it's already too late |
The Normie |
Practical, mainstream perspective, keeps it simple |
True Advocate |
Enthusiastic supporter, validates good ideas |
Nuance Seeker |
Balanced, sees multiple sides, seeks middle ground |
Source Please |
Demands evidence, asks for citations |
Devil's Advocate |
Challenges assumptions to strengthen arguments |
Industry Insider |
Claims expertise, drops insider knowledge |
Concerned Parent |
Safety-focused, thinks about long-term consequences |
Tech Bro |
Optimistic about technology solving everything |
Old Timer |
"Back in my day..." perspective, values tradition |
Lurker Larry |
Brief, to-the-point, rarely elaborates |
Restatement Styles
Use these to generate variations of the same message with different tones:
| Persona | Effect |
|---|---|
Concise one-paragraph version |
Tightens the message, removes fluff |
Blunt and direct version |
No sugarcoating, gets to the point |
Adds concrete examples |
Expands with specific scenarios and details |
Confident and opinionated |
Assertive tone, strong stance |
Friendly but more specific |
Warm tone with added detail |
Says the same thing but punchier |
More impactful, memorable phrasing |
Skeptical but helpful |
Questions assumptions while offering value |
Warm and relatable take |
Emotionally resonant, builds connection |
Authentic and genuine version |
Casual, sounds like a real person |
Thoughtful and nuanced restatement |
Adds depth and consideration |
Example Output
Post: "Struggling with consistent social media content" (r/AskMarketing)
Doomer Dan:
It's too late. You should have hired a designer or a copywriter when you were just starting out.
The Normie:
I'm not a fragrance brand, but for my own business I use a combination of free stock photos/videos and free AI tools. Been doing this for a couple of months and it works. If it was me I'd say go for it.
Anecdote Andy:
Well in MY experience, the hardest part is consistency. It is not easy to just post random stuff because you feel like it, you need to post stuff that is connected to your brand. I have a few accounts (Instagram and TikTok) and each one is slightly different, but they are all connected to my brand.
Hot Take Henry:
The problem is that AI tools will not be able to do that. They will give you random images that are not related to your brand.
API Usage (vLLM)
If serving with vLLM, you can use these curl examples:
Generate a Reddit Comment with Persona
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "<|im_start|>user\nSubreddit: r/AskReddit\nTitle: What'\''s something the older generations were actually right about?\nPersona: Thoughtful and nostalgic, shares personal anecdotes\nPost Score: 423\nTask: write a Reddit reply targeting this karma score\nTarget Score: 6.5<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
"temperature": 0.7,
"max_tokens": 1024,
"stop": ["<|im_end|>"]
}'
Generate a Reply to an Existing Comment
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPersona: Helpful explainer who provides context\nPost Score: 500\nParent 1: 50% sounds impressive but what'\''s the real-world difference?\nParent 1 Score: 200\nTask: write a Reddit reply targeting this karma score\nTarget Score: 4.6<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
"temperature": 0.7,
"max_tokens": 1024,
"stop": ["<|im_end|>"]
}'
Predict Karma Score for a Comment
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: predict the karma score for this reply\nReply: Sounds cool but I'\''ll wait for independent benchmarks before getting excited.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
"temperature": 0.3,
"max_tokens": 64,
"stop": ["<|im_end|>"]
}'
Training Details
Hyperparameters
Stage 1 (SFT)
- Learning rate: 2e-4
- Batch size: 16 (effective)
- Max steps: 10,000
- LoRA rank: 16, alpha: 32
- Max sequence length: 1024
Stage 2 (DPO)
- Learning rate: 2e-6
- DPO beta: 0.3
- Batch size: 8 (effective)
- Max steps: 450
- Warmup ratio: 0.1
Hardware
- 4x NVIDIA A10G GPUs (g5.12xlarge)
- Training precision: bf16
- Flash Attention 2 enabled
Limitations
- Trained primarily on English Reddit content
- May reflect biases present in Reddit communities
- Score prediction is probabilistic, not deterministic
- Best results on subreddits similar to training data
Intended Use
- Research on social media dynamics and content virality
- Understanding what makes online content engaging
- Creative writing assistance for authentic-sounding responses
- NOT intended for spam, manipulation, or deceptive practices
Citation
@misc{qwen3-karma-v3,
title={Qwen3-8B Karma v3: Reddit Comment Generation with DPO},
author={Verque},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/verque-app/qwen3-8b-karma-v3}
}
Find out more at https://verque.app
- Downloads last month
- 1