PatentSBERTa Green Patent Classifier — Final Assignment

Binary classifier for green patent detection (Y02 CPC codes). Fine-tuned from AI-Growth-Lab/PatentSBERTa using a QLoRA-powered multi-agent system with exception-based HITL.

Training

  • Base model: AI-Growth-Lab/PatentSBERTa (MPNet-based)
  • Task: Binary classification — is_green (Y02 CPC codes)
  • Training data: 35,000 silver labels + 100 gold labels (QLoRA MAS)
  • Pipeline:
    1. QLoRA fine-tuning of Llama-3.2-3B-Instruct (4-bit NF4, LoRA r=16, 200 steps) on 10,000 patent classification prompts from train_silver
    2. 3-agent CrewAI MAS with QLoRA-informed Advocate; exception-based HITL (only low-confidence claims reviewed — 0 triggered out of 100)
    3. PatentSBERTa fine-tuned on resulting gold labels
  • Fine-tuning: 1 epoch, lr=2e-5, max_length=256, batch_size=16, fp16

Evaluation (eval_silver, 5,000 claims)

Metric Value
F1 0.8097
Precision 0.8213
Recall 0.7986
Accuracy 0.8126

Progression: Baseline F1=0.7696 → A2=0.8099 → A3=0.8115 → Final=0.8097

Notes

The QLoRA adapter (Llama-3.2-3B-Instruct) was trained on patent classification prompts and its learned domain knowledge was encoded into the Advocate agent's system prompt. The slight regression from A3 to Final is within noise and reflects the 100-claim gold set being a small fraction of the 35k silver training data.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("AI-Growth-Lab/PatentSBERTa", use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained("Peter512/patentsbert-green-final")
model.eval()

text = "A photovoltaic cell comprising a perovskite absorber layer..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
label = logits.argmax().item()  # 0=not_green, 1=green
Downloads last month
62
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Peter512/patentsbert-green-final

Finetuned
(19)
this model

Dataset used to train Peter512/patentsbert-green-final