CARDS-Qwen3.5-9B
Fine-tuned Qwen3.5-9B for classification of climate-contrarian claims using the CARDS taxonomy from Coan et al. (2025).
This is a merged checkpoint: a LoRA adapter (rank 16) trained on the CARDS SFT dataset has been merged back into the base weights for direct loading with transformers, vLLM, or any standard inference engine.
Results
Evaluated on the held-out CARDS test set (1,436 samples, Level 1, min_support ≥ 3):
| Metric | Qwen3.5-9B (base) | Qwen3.5-4B FT | Qwen3.5-9B FT | Qwen3.5-27B FT | Claude Opus 4.6 |
|---|---|---|---|---|---|
| Samples F1 | 0.721 | 0.838 | 0.872 | 0.884 | 0.893 |
| Macro F1 | 0.629 | 0.632 | 0.663 | 0.766 | 0.751 |
| Micro F1 | 0.775 | 0.828 | 0.862 | 0.877 | 0.881 |
| Precision | 0.866 | 0.840 | 0.875 | 0.879 | 0.863 |
| Recall | 0.701 | 0.816 | 0.849 | 0.874 | 0.900 |
| Parse failures | 247 / 1436 | 1 / 1436 | 0 / 1436 | 0 / 1436 | 0 / 1436 |
- Fine-tuning lifts samples F1 from 0.721 (base) to 0.872 (+0.151).
- Zero parse failures on 1,436 test items — the model reliably emits the YAML format.
- Sweet-spot for deployment cost vs accuracy: ≈ 0.012 below the 27B FT and ≈ 0.021 below Opus 4.6 on samples F1, at a fraction of the size.
- Per-level breakdown: L1 0.872 / L2 0.840 / L3 0.813 samples F1.
Usage
With vLLM
vllm serve C3DS/CARDS-Qwen3.5-9B \
--port 8000 \
--max-model-len 4096 \
--dtype bfloat16 \
--enable-prefix-caching \
--served-model-name CARDS-Qwen3.5-9B
The system prompt (slim_system_instruction) and the user-message suffix (cot_trigger) the model was trained with are bundled in this repo as cards_prompts.json — self-contained, with the CARDS taxonomy already inlined.
import json
from huggingface_hub import hf_hub_download
from openai import OpenAI
prompts = json.load(open(hf_hub_download("C3DS/CARDS-Qwen3.5-9B", "cards_prompts.json")))
slim_system_instruction = prompts["slim_system_instruction"]
cot_trigger = prompts["cot_trigger"]
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
def classify(text):
resp = client.chat.completions.create(
model="CARDS-Qwen3.5-9B",
messages=[
{"role": "system", "content": slim_system_instruction},
{"role": "user", "content": f"### Text:\n{text}\n\n{cot_trigger}"},
],
temperature=0,
max_tokens=4000,
)
return resp.choices[0].message.content
print(classify("These are only a few renewable energy technologies at work"))
The model produces a reasoning trace inside <think>…</think> followed by a YAML categories: block listing predicted CARDS codes. To parse: take the content after </think> and read the categories: list.
For an FP8-quantized variant (~9 GB on disk, no measurable accuracy loss) see C3DS/CARDS-Qwen3.5-9B-FP8.
Multimodal — image + text
The base Qwen3.5/3.6 family supports image inputs via the OpenAI-compatible
image_url content part, and this fine-tune preserves that capability — pass
the system prompt below alongside an image (with or without caption text) and
the model will classify the depicted claim under the CARDS taxonomy.
Serve vLLM with multimodal flags enabled:
vllm serve C3DS/CARDS-Qwen3.5-9B \
--port 8000 \
--max-model-len 8192 \
--trust-remote-code \
--limit-mm-per-prompt image=4 \
--enable-prefix-caching \
--served-model-name CARDS-Qwen3.5-9B
import base64, json, mimetypes
from pathlib import Path
from huggingface_hub import hf_hub_download
from openai import OpenAI
prompts = json.load(open(hf_hub_download("C3DS/CARDS-Qwen3.5-9B", "cards_prompts.json")))
slim_system_instruction = prompts["slim_system_instruction"]
cot_trigger = prompts["cot_trigger"]
def image_part(path):
p = Path(path)
mime = mimetypes.guess_type(p)[0] or "image/png"
b64 = base64.b64encode(p.read_bytes()).decode()
return {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{b64}"}}
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
resp = client.chat.completions.create(
model="CARDS-Qwen3.5-9B",
messages=[
{"role": "system", "content": slim_system_instruction},
{"role": "user", "content": [
{"type": "text", "text": "Read the image (and any caption below) and classify the climate claim it makes."},
image_part("screenshot.png"),
{"type": "text", "text": f"### Caption:\n<optional caption>\n\n{cot_trigger}"},
]},
],
temperature=0,
max_tokens=4000,
)
print(resp.choices[0].message.content)
Training
- Base model:
Qwen/Qwen3.5-9B - Method: LoRA (rank 16, α 16, dropout 0) on
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, then merged into base weights - Dataset:
C3DS/cards_sft_dataset(sftconfig — RECoT chat messages) - Framework: Unsloth + TRL
SFTTrainer - Hyperparameters: 3 epochs,
per_device_train_batch_size=1,gradient_accumulation_steps=8,lr=2e-4, cosine schedule, 10 warmup steps,max_seq_length=4096,adamw_8bit,bf16 - Hardware: 1× NVIDIA H200
- Checkpoint selection: best via
load_best_model_at_end=True
Limitations
- Macro F1 on rare labels. Rare level-3 claims (under 10 training examples) trail Claude Opus by a wider margin than common claims, reflecting the long-tailed CARDS distribution.
- Thinking tokens. Training used
enable_thinking=True. Either parse output after</think>, or disable thinking at inference viachat_template_kwargs={"enable_thinking": false}. Reserve token budget for the reasoning trace before the final YAML block.
Citation
@article{coan2025cards,
title = {Large language model reveals an increase in climate contrarian speech in the United States Congress},
author = {Coan, Travis G. and Malla, Ranadheer and Nanko, Mirjam O. and Kattrup, William and Roberts, J. Timmons and Cook, John and Boussalis, Constantine},
journal = {Communications Sustainability},
volume = {1},
pages = {37},
year = {2025},
doi = {10.1038/s44458-025-00029-z}
}
License
Apache 2.0, inherited from Qwen3.5-9B.
- Downloads last month
- 185
Model tree for C3DS/CARDS-Qwen3.5-9B
Base model
Qwen/Qwen3.5-9B-Base