PatentSBERTa — Green Patent Classifier
A fine-tuned AI-Growth-Lab/PatentSBERTa model for binary classification of patent claims as green technology (1) or not green (0).
Developed as part of the Applied Deep Learning (AAU, Spring 2025) exam assignment on active learning, human-in-the-loop labelling, and multi-agent systems for patent classification.
Model Details
| Property | Value |
|---|---|
| Architecture | MPNetForSequenceClassification (12 layers, 768 hidden) |
| Parameters | 109.5 M (all trainable) |
| Base model | AI-Growth-Lab/PatentSBERTa |
| Max sequence length | 512 tokens |
| Labels | 0 — not green, 1 — green |
| Framework | Transformers 5.2.0, PyTorch |
Training
Pipeline overview
- Part A–B: Frozen PatentSBERTa baseline + uncertainty-based active-learning pool selection
- Part C: QLoRA-tuned Llama-3.1-8B powering a LangGraph Multi-Agent System (Advocate → Skeptic → Judge → Exception) to generate silver labels on the 15k most uncertain patents
- Part D: Human-in-the-loop review of 100 critical samples → gold labels, then final full-parameter fine-tuning of PatentSBERTa
Training data
| Split | Rows | Source |
|---|---|---|
| train_silver | 25,000 | Silver labels from Parts A–C |
| gold_labels | 100 (× 25 upsampled = 2,500) | HITL-verified labels |
| Total training | 27,500 | Combined |
| eval_silver | 10,000 | Held-out balanced evaluation set |
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 2e-5 |
| Epochs | 5 |
| Effective batch size | 128 (4 × 16 × grad_accum 2) |
| LR scheduler | Cosine with 6% warmup |
| Weight decay | 0.01 |
| Label smoothing | 0.05 |
| Gold upsample factor | 25× |
| Early stopping patience | 3 |
| Precision | bf16 |
| Seed | 42 |
Hardware
- 4 × NVIDIA L4 (24 GB each), DDP via
torchrun - AAU AI-Lab (SLURM cluster)
- Wall-clock time: ~23 minutes
Evaluation
Evaluated on the held-out eval_silver split (10,000 samples, balanced).
| Precision | Recall | F1-score | Support | |
|---|---|---|---|---|
| not-green (0) | 0.8121 | 0.8058 | 0.8090 | 5,000 |
| green (1) | 0.8073 | 0.8136 | 0.8104 | 5,000 |
| Accuracy | 0.8097 | 10,000 |
Confusion Matrix
| Pred not-green | Pred green | |
|---|---|---|
| Actual not-green | 4,029 | 971 |
| Actual green | 932 | 4,068 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "CTB2001/PatentSBERTa-green-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
claim = "A wind turbine blade comprising a spar cap formed from pultruded carbon strips..."
inputs = tokenizer(claim, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=-1).item()
print("green" if pred == 1 else "not-green")
Intended Use
- Primary: Classifying patent claims as green / not-green technology
- Domain: Patent text (US/EP/WO first claims)
- Not suitable for: General-purpose NLI, legal advice, or production patent screening without additional validation
Limitations
- Trained and evaluated on silver labels (machine-generated); a small fraction may be noisy
- Only 100 gold (human-verified) labels were available — upsampled 25× to amplify signal
- Performance on out-of-domain patent offices or languages is unknown
Citation
@misc{trost-bertelsen2025patentsberta-green,
author = {Trøst-Bertelsen, Christian},
title = {PatentSBERTa Green Patent Classifier},
year = {2025},
howpublished = {Hugging Face Model Hub},
url = {https://huggingface.co/CTB2001/PatentSBERTa-green-classifier}
}
Author
Christian Trøst-Bertelsen — Aalborg University, Student ID 20224083 Course: Applied Deep Learning, 8th semester, Spring 2025
- Downloads last month
- 11
Model tree for CTB2001/PatentSBERTa-green-classifier
Base model
AI-Growth-Lab/PatentSBERTaDataset used to train CTB2001/PatentSBERTa-green-classifier
Evaluation results
- Green-class F1 on patents-green-50k (eval split)self-reported0.810
- accuracy on patents-green-50k (eval split)self-reported0.810
- precision on patents-green-50k (eval split)self-reported0.807
- recall on patents-green-50k (eval split)self-reported0.814