Windy-Qwen3.5-27B

Fine-tuned Qwen3.5-27B for three-level classification of wind-energy opposition discourse:

  1. Detection — binary: does the text express opposition to wind energy?
  2. Frame (N_*) — the high-level frame, if any (e.g. cost, environmental, military, governance).
  3. Claim (C_*) — the specific claim(s) within that frame.

This is the model that accompanies a forthcoming paper from the C3DS group on wind-energy opposition discourse. Until that paper is out, it should be treated as a research preview — the codebook, evaluation set, and inference recipe are stable, but the published methodology paper is in preparation.

This is a merged checkpoint: a LoRA adapter (rank 16) trained with reverse-engineered chain-of-thought (RECoT) supervision has been merged back into the base weights for direct loading with transformers, vLLM, or any standard inference engine.

Results

Evaluated on the held-out wind-opposition test set (773 rows, 436 opposition-positive). Compared against the smaller Windy siblings, the joint CARDS+Wind 27B variant, and frontier APIs:

Detection (binary)

Metric Windy-4B Windy-9B Windy-27B Windy-27B FP8 Claude Opus 4.7 GPT-5.5
Precision 0.795 0.797 0.871 0.877 0.896 0.927
Recall 0.915 0.917 0.917 0.920 0.890 0.846
F1 0.851 0.853 0.894 0.898 0.893 0.885

Samples F1 (multi-label frame / claim accuracy)

View Windy-4B Windy-9B Windy-27B Windy-27B FP8 Claude Opus 4.7 GPT-5.5
Frames — all rows 0.697 0.696 0.781 0.787 0.791 0.792
Frames — opposition only 0.699 0.695 0.747 0.751 0.734 0.697
Claims — all rows 0.654 0.676 0.741 0.755 0.754 0.745
Claims — opposition only 0.623 0.660 0.675 0.694 0.667 0.614

Highlights

  • Detection F1 ties Claude Opus 4.7 (0.894 vs 0.893) and beats GPT-5.5 (0.885).
  • Wins frames-opposition-only and claims-opposition-only samples F1 vs both frontier APIs — once a row is recognized as opposition, this model attaches the right frames/claims more reliably than Opus or GPT-5.5.
  • Zero parse failures on 773 test items vs 1–2 for the frontier APIs.
  • Per-row data: 773 rows, 436 opposition-positive (~56%); detection accuracy 0.873.

Usage

With vLLM

vllm serve C3DS/Windy-Qwen3.5-27B \
  --port 8000 \
  --max-model-len 4096 \
  --dtype bfloat16 \
  --enable-prefix-caching \
  --served-model-name Windy-Qwen3.5-27B

The system prompt the model was trained with (slim_system_instruction, with the wind frames/claims codebook inlined) is bundled in this repo as wind_prompts.json.

import json
from huggingface_hub import hf_hub_download
from openai import OpenAI

prompts = json.load(open(hf_hub_download("C3DS/Windy-Qwen3.5-27B", "wind_prompts.json")))
slim_system_instruction = prompts["slim_system_instruction"]

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

def classify(text):
    resp = client.chat.completions.create(
        model="Windy-Qwen3.5-27B",
        messages=[
            {"role": "system", "content": slim_system_instruction},
            {"role": "user",   "content": text},
        ],
        temperature=0,
        max_tokens=4000,
    )
    return resp.choices[0].message.content

print(classify("Wind farms are killing local property values and chasing tourists away."))

The model produces a reasoning trace inside <think>…</think> followed by a YAML block:

opposition_detected: true
frames:
  - N_2
claims:
  - C_5_0

To parse: take the content after </think> and read the YAML.

For an FP8-quantized variant (~27 GB on disk, slightly stronger than BF16 across the board) see C3DS/Windy-Qwen3.5-27B-FP8.

Multimodal — image + text

The base Qwen3.5/3.6 family supports image inputs via the OpenAI-compatible image_url content part, and this fine-tune preserves that capability — pass the wind system prompt alongside an image (e.g. a screenshot of a tweet, news headline, or protest sign) and the model will run the same three-level detection / frames / claims classification on the visual input.

Serve vLLM with multimodal flags enabled:

vllm serve C3DS/Windy-Qwen3.5-27B \
  --port 8000 \
  --max-model-len 8192 \
  --trust-remote-code \
  --limit-mm-per-prompt image=4 \
  --enable-prefix-caching \
  --served-model-name Windy-Qwen3.5-27B
import base64, json, mimetypes
from pathlib import Path
from huggingface_hub import hf_hub_download
from openai import OpenAI

prompts = json.load(open(hf_hub_download("C3DS/Windy-Qwen3.5-27B", "wind_prompts.json")))
slim_system_instruction = prompts["slim_system_instruction"]

def image_part(path):
    p = Path(path)
    mime = mimetypes.guess_type(p)[0] or "image/png"
    b64 = base64.b64encode(p.read_bytes()).decode()
    return {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{b64}"}}

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

resp = client.chat.completions.create(
    model="Windy-Qwen3.5-27B",
    messages=[
        {"role": "system", "content": slim_system_instruction},
        {"role": "user", "content": [
            {"type": "text", "text": "Read the image (and any caption below) and classify the wind-opposition framing depicted."},
            image_part("screenshot.png"),
            {"type": "text", "text": "### Caption:\n<optional caption>"},
        ]},
    ],
    temperature=0,
    max_tokens=4000,
)
print(resp.choices[0].message.content)

Training

  • Base model: Qwen/Qwen3.5-27B
  • Method: LoRA (rank 16, α 16, dropout 0) on q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, then merged into base weights
  • Training data: RECoT chat messages distilled from a teacher LLM (claude-opus-4-7) over an expert-annotated wind-opposition corpus, with synthetic positive / near-miss negative augmentation. Final training set: 726 rows after teacher second-guessing filter; 86-row held-out eval mirror.
  • Framework: Unsloth + TRL SFTTrainer
  • Hyperparameters: 3 epochs, per_device_train_batch_size=1, gradient_accumulation_steps=8, lr=2e-4, cosine schedule, 10 warmup steps, max_seq_length=8192, adamw_8bit, bf16
  • Hardware: 1× NVIDIA H200
  • Checkpoint selection: best via load_best_model_at_end=True

Limitations

  • Forthcoming paper. The methodology, codebook, and dataset details will be published in a future C3DS paper. Model output and per-claim precision/recall should be interpreted as a research preview.
  • Domain-specific. Trained on wind-opposition discourse from social media and news; behavior on adjacent topics (solar, nuclear, climate-change-in-general) is uncharacterized.
  • Thinking tokens. Training used enable_thinking=True. Either parse output after </think>, or disable thinking at inference via chat_template_kwargs={"enable_thinking": false}. Reserve token budget for the reasoning trace before the final YAML block.
  • Detection is high-recall, modest-precision. The model errs toward calling borderline cases "opposition" — recall 0.917, precision 0.871. If your downstream use needs high precision, consider thresholding on additional signals.

Related models

License

Apache 2.0, inherited from Qwen3.5-27B.

Downloads last month
861
Safetensors
Model size
28B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for C3DS/Windy-Qwen3.5-27B

Base model

Qwen/Qwen3.5-27B
Finetuned
(299)
this model
Adapters
1 model
Quantizations
2 models