Qwen3-8B Karma v2 (DPO)

A fine-tuned Qwen3-8B model trained to generate Reddit comments that match specific karma score targets. This model combines supervised fine-tuning with Direct Preference Optimization (DPO) to learn what makes Reddit comments successful.

Model Description

This is the v2 release - an earlier version of the karma prediction model. For the latest version with expanded subreddit coverage and persona steering, see qwen3-8b-karma-v3.

Training Pipeline

Stage 1 - Supervised Fine-Tuning (SFT)

  • Task: Learn to predict karma scores and generate contextually appropriate replies
  • Input: Thread context (subreddit, post title, body, parent comments, target score)
  • Output: Reply text matching the target karma level

Stage 2 - Direct Preference Optimization (DPO)

  • Creates preference pairs from sibling comments (same parent, different scores)
  • Higher-scored comments = "chosen", lower-scored = "rejected"
  • Trains the model to prefer responses that achieve higher engagement

Subreddits

Trained on a curated set of subreddits including:

  • JoeRogan, AskReddit, NoStupidQuestions, IsItBullshit
  • nutrition, technology, personalfinance
  • And several others

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("verque-app/qwen3-8b-karma-v2")
tokenizer = AutoTokenizer.from_pretrained("verque-app/qwen3-8b-karma-v2")

# Format prompt for karma-targeted generation
prompt = """<|im_start|>user
Subreddit: r/AskReddit
Title: What's something that sounds fake but is actually true?
Post Score: 15420
Task: write a Reddit reply targeting this karma score
Target Score: high<|im_end|>
<|im_start|>assistant
<think>

</think>

"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Score Targets

The model uses log-normalized score buckets:

  • low: 1-10 karma
  • medium: 10-100 karma
  • high: 100-1000 karma
  • top: 1000+ karma

API Usage (vLLM)

If serving with vLLM, you can use these curl examples:

Generate a Reddit Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: write a Reddit reply targeting this karma score\nTarget Score: 5.0<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Generate a Reply to an Existing Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nParent 1: 50% sounds impressive but what'\''s the real-world difference?\nParent 1 Score: 200\nTask: write a Reddit reply targeting this karma score\nTarget Score: 4.6<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Predict Karma Score for a Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: predict the karma score for this reply\nReply: Sounds cool but I'\''ll wait for independent benchmarks before getting excited.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.3,
    "max_tokens": 64,
    "stop": ["<|im_end|>"]
  }'

Note: v2 does not support persona steering. For persona-controlled generation, use qwen3-8b-karma-v3.

Training Details

Hyperparameters

Stage 1 (SFT)

  • Learning rate: 2e-4
  • Batch size: 16 (effective)
  • Max steps: 10,000
  • LoRA rank: 16, alpha: 32
  • Max sequence length: 1024

Stage 2 (DPO)

  • Learning rate: 2e-6
  • DPO beta: 0.3
  • Batch size: 8 (effective)
  • Max steps: 2000
  • Warmup ratio: 0.1

Hardware

  • 4x NVIDIA A10G GPUs (g5.12xlarge)
  • Training precision: bf16
  • Flash Attention 2 enabled

Comparison with v3

Feature v2 v3
Subreddits ~10 33
Persona steering No Yes
DPO variant Standard Authentic (stricter filtering)
Adaptive thresholds No Yes

Limitations

  • Trained primarily on English Reddit content
  • May reflect biases present in Reddit communities
  • Score prediction is probabilistic, not deterministic
  • Limited subreddit coverage compared to v3

Intended Use

  • Research on social media dynamics and content virality
  • Understanding what makes online content engaging
  • Creative writing assistance for authentic-sounding responses
  • NOT intended for spam, manipulation, or deceptive practices

Citation

@misc{qwen3-karma-v2,
  title={Qwen3-8B Karma v2: Reddit Comment Generation with DPO},
  author={Verque},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/verque-app/qwen3-8b-karma-v2}
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for verque-app/qwen3-8b-karma-v2

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(1037)
this model