--- language: - en license: mit library_name: transformers pipeline_tag: text-classification tags: - patents - green-tech - qlora - peft - sequence-classification model-index: - name: assignment3-patentsberta-qlora-gold100 results: - task: type: text-classification name: Green patent detection dataset: name: patents_50k_green (eval_silver) type: custom metrics: - type: f1 value: 0.5006382068 name: Macro F1 - type: accuracy value: 0.5008 name: Accuracy --- # Assignment 3 Model — Green Patent Detection (QLoRA + PatentSBERTa) ## Model Summary This repository contains the **final downstream Assignment 3 classifier** for green patent detection. Workflow: 1. Baseline uncertainty sampling on patent claims. 2. QLoRA-based labeling/rationale generation on top-100 high-risk examples. 3. Final PatentSBERTa fine-tuning on `train_silver + 100 gold high-risk`. ## Base Model - `AI-Growth-Lab/PatentSBERTa` (sequence classification head, 2 labels) ## Training Setup - Seed: 42 - Train rows (augmented): 20,100 - Eval rows: 5,000 - Gold rows: 100 - Hardware used: NVIDIA L4 - Frameworks: `transformers`, `datasets`, `torch`, `peft`, `bitsandbytes` ## Results ### Assignment 3 final model (this repo) - Eval accuracy: **0.5008** - Eval macro F1: **0.5006382068** - Gold100 accuracy: **0.53** - Gold100 macro F1: **0.5037482842** ### Comparison table (Assignment requirement) | Model Version | Training Data Source | F1 Score (Eval Set) | |---|---|---:| | 1. Baseline | Frozen Embeddings (No Fine-tuning) | 0.7727474956 | | 2. Assignment 2 Model | Fine-tuned on Silver + Gold (Simple LLM) | 0.4975369710 | | 3. Assignment 3 Model | Fine-tuned on Silver + Gold (QLoRA) | 0.5006382068 | ## Intended Use - Educational/research use for green patent classification experiments. - Binary label output: non-green (0) vs green (1). ## Limitations - Dataset and labels are project-specific and may not generalize broadly. - Part C used automated acceptance policy for gold labels in this run (no manual overrides). - Model should not be used for legal/commercial patent decisions without human review. ## Files in this Repository - `config.json` - `model.safetensors` - tokenizer files - optional: training/evaluation summaries ## Example Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch repo_id = "/assignment3-patentsberta-qlora-gold100" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForSequenceClassification.from_pretrained(repo_id) text = "A method for reducing CO2 emissions in industrial heat recovery systems..." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) with torch.no_grad(): logits = model(**inputs).logits pred = torch.argmax(logits, dim=-1).item() print({"label": int(pred)}) # 0=not_green, 1=green