|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- patents |
|
|
- green-tech |
|
|
- qlora |
|
|
- peft |
|
|
- sequence-classification |
|
|
model-index: |
|
|
- name: assignment3-patentsberta-qlora-gold100 |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Green patent detection |
|
|
dataset: |
|
|
name: patents_50k_green (eval_silver) |
|
|
type: custom |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.5006382068 |
|
|
name: Macro F1 |
|
|
- type: accuracy |
|
|
value: 0.5008 |
|
|
name: Accuracy |
|
|
--- |
|
|
|
|
|
# Assignment 3 Model — Green Patent Detection (QLoRA + PatentSBERTa) |
|
|
|
|
|
## Model Summary |
|
|
This repository contains the **final downstream Assignment 3 classifier** for green patent detection. |
|
|
Workflow: |
|
|
1. Baseline uncertainty sampling on patent claims. |
|
|
2. QLoRA-based labeling/rationale generation on top-100 high-risk examples. |
|
|
3. Final PatentSBERTa fine-tuning on `train_silver + 100 gold high-risk`. |
|
|
|
|
|
## Base Model |
|
|
- `AI-Growth-Lab/PatentSBERTa` (sequence classification head, 2 labels) |
|
|
|
|
|
## Training Setup |
|
|
- Seed: 42 |
|
|
- Train rows (augmented): 20,100 |
|
|
- Eval rows: 5,000 |
|
|
- Gold rows: 100 |
|
|
- Hardware used: NVIDIA L4 |
|
|
- Frameworks: `transformers`, `datasets`, `torch`, `peft`, `bitsandbytes` |
|
|
|
|
|
## Results |
|
|
|
|
|
### Assignment 3 final model (this repo) |
|
|
- Eval accuracy: **0.5008** |
|
|
- Eval macro F1: **0.5006382068** |
|
|
- Gold100 accuracy: **0.53** |
|
|
- Gold100 macro F1: **0.5037482842** |
|
|
|
|
|
### Comparison table (Assignment requirement) |
|
|
| Model Version | Training Data Source | F1 Score (Eval Set) | |
|
|
|---|---|---:| |
|
|
| 1. Baseline | Frozen Embeddings (No Fine-tuning) | 0.7727474956 | |
|
|
| 2. Assignment 2 Model | Fine-tuned on Silver + Gold (Simple LLM) | 0.4975369710 | |
|
|
| 3. Assignment 3 Model | Fine-tuned on Silver + Gold (QLoRA) | 0.5006382068 | |
|
|
|
|
|
### Reflection (2–3 sentences) |
|
|
|
|
|
Compared to Assignment 2, the Assignment 3 QLoRA workflow produced a small improvement in eval macro F1 (+0.0031). |
|
|
This indicates that the advanced data-generation approach provided a measurable but modest downstream gain over the simpler Assignment 2 setup in this run. |
|
|
However, both fine-tuned pipelines remained substantially below the frozen-embedding baseline, suggesting that data quality and labeling strategy still dominate final performance. |
|
|
|
|
|
## Extended Reflection on Part E Results |
|
|
|
|
|
The observed F1 score results show that downstream fine-tuning underperformed the frozen-embedding baseline in this run: Baseline macro F1 = **0.7727**, Assignment 2 = **0.4975**, and Assignment 3 = **0.5006** on the eval set. Although the advanced QLoRA workflow in Assignment 3 improved slightly over Assignment 2 (+0.0031), both fine-tuned models remained far below the baseline, indicating that additional training did not translate into better generalization here. |
|
|
|
|
|
One plausible explanation is label quality in the high-risk set. In Assignment 3, the 100 uncertain examples were finalized using an **auto-accept policy** (no independent human correction), so potential labeling errors in the most ambiguous cases may have been passed directly into training. Because these examples are deliberately selected near the decision boundary, they are highly influential; if their labels are noisy, they can destabilize class boundaries and reduce macro F1 on eval data. |
|
|
|
|
|
Another interpretation is that the fine-tuning stage is more sensitive to supervision quality and distribution mismatch than the linear baseline. A strong frozen-embedding + logistic model can be robust when labels are imperfect, while full downstream fine-tuning may overfit to noisy or weakly validated labels. Overall, the results suggest that the **quality of gold labels on uncertain samples** is a critical bottleneck, and that true human adjudication on high-risk claims is likely necessary to realize the intended gains from advanced workflows such as QLoRA. |
|
|
|
|
|
## Intended Use |
|
|
- Educational/research use for green patent classification experiments. |
|
|
- Binary label output: non-green (0) vs green (1). |
|
|
|
|
|
## Limitations |
|
|
- Dataset and labels are project-specific and may not generalize broadly. |
|
|
- Part C used automated acceptance policy for gold labels in this run (no manual overrides). |
|
|
- Model should not be used for legal/commercial patent decisions without human review. |
|
|
|
|
|
## Files in this Repository |
|
|
- `config.json` |
|
|
- `model.safetensors` |
|
|
- tokenizer files |
|
|
- optional: training/evaluation summaries |
|
|
|
|
|
## Example Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
repo_id = "<your-username>/assignment3-patentsberta-qlora-gold100" |
|
|
tokenizer = AutoTokenizer.from_pretrained(repo_id) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(repo_id) |
|
|
|
|
|
text = "A method for reducing CO2 emissions in industrial heat recovery systems..." |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
pred = torch.argmax(logits, dim=-1).item() |
|
|
print({"label": int(pred)}) # 0=not_green, 1=green |