Peter512's picture
Add model card with exact eval metrics
6226a97 verified
metadata
language: en
license: apache-2.0
tags:
  - patent
  - green-technology
  - text-classification
  - patentsbert
  - multi-agent
  - crewai
datasets:
  - Peter512/patents-50k-green
base_model: AI-Growth-Lab/PatentSBERTa

PatentSBERTa Green Patent Classifier — Assignment 3

Binary classifier for green patent detection (Y02 CPC codes). Fine-tuned from AI-Growth-Lab/PatentSBERTa using a 3-agent CrewAI debate system (Advocate / Skeptic / Judge).

Training

  • Base model: AI-Growth-Lab/PatentSBERTa (MPNet-based)
  • Task: Binary classification — is_green (Y02 CPC codes)
  • Training data: 35,000 silver labels + 100 gold labels (CrewAI MAS)
  • MAS process: 3-agent debate — Advocate argues green, Skeptic challenges, Judge produces {"label": 0/1, "confidence": "low/medium/high", "rationale": "..."}. 100% agent agreement (0 human overrides — no low-confidence outputs).
  • Fine-tuning: 1 epoch, lr=2e-5, max_length=256, batch_size=16, fp16

Evaluation (eval_silver, 5,000 claims)

Metric Value
F1 0.8115
Precision 0.8224
Recall 0.8010
Accuracy 0.8142

Assignment 2 baseline: F1=0.8099 | Original baseline: F1=0.7696

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("AI-Growth-Lab/PatentSBERTa", use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained("Peter512/patentsbert-green-a3")
model.eval()

text = "A photovoltaic cell comprising a perovskite absorber layer..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
label = logits.argmax().item()  # 0=not_green, 1=green