Instructions to use Signe22/patentsberta-green-hitl-A3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Signe22/patentsberta-green-hitl-A3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Signe22/patentsberta-green-hitl-A3")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Signe22/patentsberta-green-hitl-A3") model = AutoModelForSequenceClassification.from_pretrained("Signe22/patentsberta-green-hitl-A3") - Notebooks
- Google Colab
- Kaggle
Agentic Human-in-the-Loop (HITL) Setup
For Assignment 3, I implemented an agentic Human-in-the-Loop (HITL) framework to classify patent claims as green technology.
Agentic Classification
Each patent claim is evaluated by three specialized agents:
Advocate – argues for green classification by identifying technical features related to energy efficiency, emissions reduction, or sustainability.
Skeptic – argues against green classification, focusing on greenwashing, vague language, or lack of explicit environmental impact.
Judge – weighs both arguments and produces a final classification (llm_green_suggested), confidence level, and rationale in structured JSON.
This setup encourages explicit reasoning and exposes disagreements between the pro and anti green interpretations of each claim.
Human-in-the-Loop Review
After the agentic decision, a human reviewer inspects:
the full patent claim,
the Advocate and Skeptic arguments,
the Judge’s final decision and rationale.
The human then assigns a final label (is_green_human) and optional notes. These human gold labels are used both for high-quality training data for model fine-tuning and evaluation.
Agreement Reporting (Human vs AI)
I measured agreement as the percentage of claims where my final human label matches the AI suggested label (llm_green_suggested).
- Assignment 2: 100% agreement (I largely accepted the AI suggestions due to limited domain certainty on many claims)
- Assignment 3: 86% agreement (The agentic setup and different model resulted in a more optimistic labeling approach, which increased both the number of
1labels and disagreements with the AI suggestion)
How computed:
Agreement = mean( human_label == llm_green_suggested ) over all rows.
Comparative Analysis
Model Performance Comparison
The table below compares model performance across the three stages of the project using the same silver evaluation set.
| Model Version | Training Data Source | F1 Score (Eval Set) |
|---|---|---|
| Baseline | Frozen PatentSBERTa embeddings (no fine-tuning) | 0.7719 |
| Assignment 2 Model | Fine-tuned on Silver + Gold (Simple LLM labels + HITL) | 0.8030 |
| Assignment 3 Model | Fine-tuned on Silver + Gold (Agentic LLM + HITL) | 0.8051 |
Reflection
Fine-tuning PatentSBERTa substantially improved performance over the frozen-embedding baseline, confirming the value of task-specific fine-tuning. The agentic labeling workflow in Assignment 3 led to a small but noteable improvement over the simpler LLM approach used in Assignment 2, suggesting that multi-agent reasoning can marginally improve label quality and downstream model performance. While the performance gain is modest, it indicates that the added complexity of the agentic setup can provide incremental benefits.
The limited size of the improvement is expected given the already strong performance of the Assignment 2 model and the relatively small size of the gold dataset (100 samples).
- Downloads last month
- 2