Clinical Trial Endpoint Classifier — 4B v2 (Qwen3.5-4B LoRA)

v2 update of the endpoint-qwen3.5-4b-lora model. Trained on 2x more data with broader source coverage spanning ClinicalTrials.gov, EU Clinical Trials Register, and Chinese Clinical Trial Registry (ChiCTR).

A fine-tuned LoRA adapter on Qwen3.5-4B for extracting and classifying clinical trial endpoints from outcome text. Returns structured JSON with standardized endpoint names, measurement types, methods, and more.

What's New in v2

2x training data: 3,906 samples (vs 1,948 in v1)
Multi-source diversity: EU CTR (700) + ChiCTR (700) + ClinicalTrials.gov (600) added on top of v1's CTgov data
12 disease categories in v2 CTgov sample: diabetes, breast cancer, cardiovascular, alzheimer's, asthma, depression, hepatitis, rheumatoid arthritis, chronic kidney disease, multiple sclerosis, obesity, parkinson's
Better generalization to non-US trial registries (EU + China)
Improved labeling: v2 samples labeled by GPT-OSS-120B (vs v1 by Qwen3.6-plus)

Output Format

{
  "endpoints": [
    {
      "endpoint_name_standardized": "Objective Response Rate",
      "measurement_of": "tumor response",
      "measurement_type": "binary",
      "metric_type": "proportion",
      "timeframe": "Week 24",
      "measurement_method": "RECIST v1.1",
      "evaluation_criteria": "CR or PR",
      "unit": "%",
      "population": null,
      "is_composite": false,
      "components": []
    }
  ]
}

Field Definitions

Field	Description	Examples
`endpoint_name_standardized`	Standardized endpoint name	"Overall Survival", "HbA1c", "PASI 75 Response Rate"
`measurement_of`	What is being measured	"tumor response", "glycated hemoglobin"
`measurement_type`	Type of measurement	`continuous`, `binary`, `ordinal`, `time-to-event`
`metric_type`	Statistical metric	`mean`, `proportion`, `hazard ratio`, `change from baseline`
`timeframe`	When measurement occurs	"Week 12", "Up to 36 months"
`measurement_method`	How it is measured	"blood test", "RECIST v1.1", "12-lead ECG"
`evaluation_criteria`	Criteria for evaluation	"PASI 75", "CR or PR"
`unit`	Unit of measurement	"%", "mg/dL", "mm"
`population`	Specific population	"adults aged 18-65", "ITT", "Full analysis set"
`is_composite`	Whether composite endpoint	`true` / `false`
`components`	Components if composite	`["MI", "stroke", "cardiovascular death"]`

Supports multiple endpoints from a single text (e.g., safety texts with 10+ sub-endpoints).

Training Details


Base model	Qwen/Qwen3.5-4B
Method	LoRA (bf16, rank 16, alpha 16)
Training data	3,906 samples (1,948 v1 + 1,958 v2)
Data sources	ClinicalTrials.gov, EU CTR, ChiCTR
Epochs	3
Steps	735
Training time	~2 hours on RTX 4090
Framework	Unsloth + TRL SFTTrainer

Data Composition

Source	v1 samples	v2 samples	Total
ClinicalTrials.gov	1,948	600	2,548
EU CTR	—	700	700
ChiCTR (China)	—	700	700
Total	1,948	1,958	3,906

Hyperparameters

Method: LoRA (bf16, NOT 4-bit)
LoRA rank: 16, alpha: 16, dropout: 0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate: 2e-4 (cosine scheduler)
Batch size: 2 per device (gradient accumulation 8, effective 16)
Epochs: 3
Optimizer: adamw_8bit
Sequence length: 2048
Gradient checkpointing: unsloth
Warmup steps: 10
Weight decay: 0.01
Max grad norm: 1.0
Seed: 3407

Usage

With Unsloth (Fastest)

import json
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Shubh-0789/endpoint-qwen3.5-4b-lora-v2",
    max_seq_length=2048,
    load_in_4bit=False,
    load_in_16bit=True,
    dtype=torch.bfloat16,
)
text_tokenizer = AutoTokenizer.from_pretrained("Shubh-0789/endpoint-qwen3.5-4b-lora-v2")
FastLanguageModel.for_inference(model)
model.generation_config.pad_token_id = text_tokenizer.pad_token_id

clinical_text = "Primary endpoints are ORR and progression-free survival (PFS) assessed by RECIST v1.1 | [Time Frame: Up to 24 months]"

messages = [
    {"role": "user", "content": f"Extract and classify the clinical trial endpoint from the following text. Return ONLY a JSON.\nText: {clinical_text}"}
]

inputs = text_tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_tensors="pt", return_dict=True,
).to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)

result = text_tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
endpoints = json.loads(result)
print(json.dumps(endpoints, indent=2))

Output:

{
  "endpoints": [
    {
      "endpoint_name_standardized": "Objective Response Rate",
      "measurement_of": "tumor response",
      "measurement_type": "binary",
      "metric_type": "proportion",
      "timeframe": "Up to 24 months",
      "measurement_method": "RECIST v1.1",
      "evaluation_criteria": null,
      "unit": "%",
      "population": null,
      "is_composite": false,
      "components": []
    },
    {
      "endpoint_name_standardized": "Progression-Free Survival",
      "measurement_of": "disease progression or death",
      "measurement_type": "time-to-event",
      "metric_type": "hazard ratio",
      "timeframe": "Up to 24 months",
      "measurement_method": "RECIST v1.1",
      "evaluation_criteria": null,
      "unit": null,
      "population": null,
      "is_composite": false,
      "components": []
    }
  ]
}

With PEFT/Transformers

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base_model, "Shubh-0789/endpoint-qwen3.5-4b-lora-v2")
tokenizer = AutoTokenizer.from_pretrained("Shubh-0789/endpoint-qwen3.5-4b-lora-v2")

Inference Tip: Disable Thinking

Qwen3.5 supports a thinking mode. For this task, disable thinking for direct JSON output (the model was trained without <think> blocks):

# When using vLLM:
# --reasoning-parser qwen3 --default-chat-template-kwargs '{"enable_thinking": false}'

Model Comparison

Model	Parameters	Training Data	VRAM	Link
0.8B v1	856M	1,948 (CTgov only)	3 GB	0.8B v1
4B v1	4.6B	1,948 (CTgov only)	10 GB	4B v1
4B v2	4.6B	3,906 (CTgov + EU + China)	10 GB	This model

Limitations

Trained primarily on English clinical trial text (ChiCTR data is also in English)
Complex composite endpoints may need verification
Minimum inference: any GPU with 10GB+ VRAM
Best inference settings: temperature=0.1, do_sample=True, thinking disabled

Citation

@misc{endpoint-qwen3.5-4b-lora-v2,
  author = {Shubh-0789},
  title = {Clinical Trial Endpoint Classifier — 4B v2 (Qwen3.5-4B LoRA)},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Shubh-0789/endpoint-qwen3.5-4b-lora-v2}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Shubh-0789/endpoint-qwen3.5-4b-lora-v2

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(113)

this model