metadata
language:
- en
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- patents
- green-tech
- qlora
- peft
- sequence-classification
model-index:
- name: assignment3-patentsberta-qlora-gold100
results:
- task:
type: text-classification
name: Green patent detection
dataset:
name: patents_50k_green (eval_silver)
type: custom
metrics:
- type: f1
value: 0.5006382068
name: Macro F1
- type: accuracy
value: 0.5008
name: Accuracy
Assignment 3 Model — Green Patent Detection (QLoRA + PatentSBERTa)
Model Summary
This repository contains the final downstream Assignment 3 classifier for green patent detection.
Workflow:
- Baseline uncertainty sampling on patent claims.
- QLoRA-based labeling/rationale generation on top-100 high-risk examples.
- Final PatentSBERTa fine-tuning on
train_silver + 100 gold high-risk.
Base Model
AI-Growth-Lab/PatentSBERTa(sequence classification head, 2 labels)
Training Setup
- Seed: 42
- Train rows (augmented): 20,100
- Eval rows: 5,000
- Gold rows: 100
- Hardware used: NVIDIA L4
- Frameworks:
transformers,datasets,torch,peft,bitsandbytes
Results
Assignment 3 final model (this repo)
- Eval accuracy: 0.5008
- Eval macro F1: 0.5006382068
- Gold100 accuracy: 0.53
- Gold100 macro F1: 0.5037482842
Comparison table (Assignment requirement)
| Model Version | Training Data Source | F1 Score (Eval Set) |
|---|---|---|
| 1. Baseline | Frozen Embeddings (No Fine-tuning) | 0.7727474956 |
| 2. Assignment 2 Model | Fine-tuned on Silver + Gold (Simple LLM) | 0.4975369710 |
| 3. Assignment 3 Model | Fine-tuned on Silver + Gold (QLoRA) | 0.5006382068 |
Intended Use
- Educational/research use for green patent classification experiments.
- Binary label output: non-green (0) vs green (1).
Limitations
- Dataset and labels are project-specific and may not generalize broadly.
- Part C used automated acceptance policy for gold labels in this run (no manual overrides).
- Model should not be used for legal/commercial patent decisions without human review.
Files in this Repository
config.jsonmodel.safetensors- tokenizer files
- optional: training/evaluation summaries
Example Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo_id = "<your-username>/assignment3-patentsberta-qlora-gold100"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
text = "A method for reducing CO2 emissions in industrial heat recovery systems..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=-1).item()
print({"label": int(pred)}) # 0=not_green, 1=green