Qwen3-8B Karma v2 (DPO)

A fine-tuned Qwen3-8B model trained to generate Reddit comments that match specific karma score targets. This model combines supervised fine-tuning with Direct Preference Optimization (DPO) to learn what makes Reddit comments successful.

Model Description

This is the v2 release - an earlier version of the karma prediction model. For the latest version with expanded subreddit coverage and persona steering, see qwen3-8b-karma-v3.

Training Pipeline

Stage 1 - Supervised Fine-Tuning (SFT)

Task: Learn to predict karma scores and generate contextually appropriate replies
Input: Thread context (subreddit, post title, body, parent comments, target score)
Output: Reply text matching the target karma level

Stage 2 - Direct Preference Optimization (DPO)

Creates preference pairs from sibling comments (same parent, different scores)
Higher-scored comments = "chosen", lower-scored = "rejected"
Trains the model to prefer responses that achieve higher engagement

Subreddits

Trained on a curated set of subreddits including:

JoeRogan, AskReddit, NoStupidQuestions, IsItBullshit
nutrition, technology, personalfinance
And several others

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("verque-app/qwen3-8b-karma-v2")
tokenizer = AutoTokenizer.from_pretrained("verque-app/qwen3-8b-karma-v2")

# Format prompt for karma-targeted generation
prompt = """<|im_start|>user
Subreddit: r/AskReddit
Title: What's something that sounds fake but is actually true?
Post Score: 15420
Task: write a Reddit reply targeting this karma score
Target Score: high<|im_end|>
<|im_start|>assistant
<think>

</think>

"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Score Targets

The model uses log-normalized score buckets:

low: 1-10 karma
medium: 10-100 karma
high: 100-1000 karma
top: 1000+ karma

API Usage (vLLM)

If serving with vLLM, you can use these curl examples:

Generate a Reddit Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: write a Reddit reply targeting this karma score\nTarget Score: 5.0<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Generate a Reply to an Existing Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nParent 1: 50% sounds impressive but what'\''s the real-world difference?\nParent 1 Score: 200\nTask: write a Reddit reply targeting this karma score\nTarget Score: 4.6<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.7,
    "max_tokens": 1024,
    "stop": ["<|im_end|>"]
  }'

Predict Karma Score for a Comment

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "<|im_start|>user\nSubreddit: r/technology\nTitle: Apple announces new M4 chip\nBody: The new M4 chip promises 50% better performance than M3.\nPost Score: 500\nTask: predict the karma score for this reply\nReply: Sounds cool but I'\''ll wait for independent benchmarks before getting excited.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    "temperature": 0.3,
    "max_tokens": 64,
    "stop": ["<|im_end|>"]
  }'

Note: v2 does not support persona steering. For persona-controlled generation, use qwen3-8b-karma-v3.

Training Details

Hyperparameters

Stage 1 (SFT)

Learning rate: 2e-4
Batch size: 16 (effective)
Max steps: 10,000
LoRA rank: 16, alpha: 32
Max sequence length: 1024

Stage 2 (DPO)

Learning rate: 2e-6
DPO beta: 0.3
Batch size: 8 (effective)
Max steps: 2000
Warmup ratio: 0.1

Hardware

4x NVIDIA A10G GPUs (g5.12xlarge)
Training precision: bf16
Flash Attention 2 enabled

Comparison with v3

Feature	v2	v3
Subreddits	~10	33
Persona steering	No	Yes
DPO variant	Standard	Authentic (stricter filtering)
Adaptive thresholds	No	Yes

Limitations

Trained primarily on English Reddit content
May reflect biases present in Reddit communities
Score prediction is probabilistic, not deterministic
Limited subreddit coverage compared to v3

Intended Use

Research on social media dynamics and content virality
Understanding what makes online content engaging
Creative writing assistance for authentic-sounding responses
NOT intended for spam, manipulation, or deceptive practices

Citation

@misc{qwen3-karma-v2,
  title={Qwen3-8B Karma v2: Reddit Comment Generation with DPO},
  author={Verque},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/verque-app/qwen3-8b-karma-v2}
}

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for verque-app/qwen3-8b-karma-v2

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1037)

this model