CTB2001
/

Assignment_3_Model

Text Classification

sequence-classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

CTB2001 commited on Feb 18

Commit

9c66b9f

·

verified ·

1 Parent(s): ed4fc0c

Create README.md

Files changed (1) hide show

README.md +96 -0

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+language:
+- en
+license: mit
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- patents
+- green-tech
+- qlora
+- peft
+- sequence-classification
+model-index:
+- name: assignment3-patentsberta-qlora-gold100
+  results:
+  - task:
+      type: text-classification
+      name: Green patent detection
+    dataset:
+      name: patents_50k_green (eval_silver)
+      type: custom
+    metrics:
+    - type: f1
+      value: 0.5006382068
+      name: Macro F1
+    - type: accuracy
+      value: 0.5008
+      name: Accuracy
+---
+# Assignment 3 Model — Green Patent Detection (QLoRA + PatentSBERTa)
+## Model Summary
+This repository contains the **final downstream Assignment 3 classifier** for green patent detection.
+Workflow:
+1. Baseline uncertainty sampling on patent claims.
+2. QLoRA-based labeling/rationale generation on top-100 high-risk examples.
+3. Final PatentSBERTa fine-tuning on `train_silver + 100 gold high-risk`.
+## Base Model
+- `AI-Growth-Lab/PatentSBERTa` (sequence classification head, 2 labels)
+## Training Setup
+- Seed: 42
+- Train rows (augmented): 20,100
+- Eval rows: 5,000
+- Gold rows: 100
+- Hardware used: NVIDIA L4
+- Frameworks: `transformers`, `datasets`, `torch`, `peft`, `bitsandbytes`
+## Results
+### Assignment 3 final model (this repo)
+- Eval accuracy: **0.5008**
+- Eval macro F1: **0.5006382068**
+- Gold100 accuracy: **0.53**
+- Gold100 macro F1: **0.5037482842**
+### Comparison table (Assignment requirement)
+| Model Version | Training Data Source | F1 Score (Eval Set) |
+|---|---|---:|
+| 1. Baseline | Frozen Embeddings (No Fine-tuning) | 0.7727474956 |
+| 2. Assignment 2 Model | Fine-tuned on Silver + Gold (Simple LLM) | 0.4975369710 |
+| 3. Assignment 3 Model | Fine-tuned on Silver + Gold (QLoRA) | 0.5006382068 |
+## Intended Use
+- Educational/research use for green patent classification experiments.
+- Binary label output: non-green (0) vs green (1).
+## Limitations
+- Dataset and labels are project-specific and may not generalize broadly.
+- Part C used automated acceptance policy for gold labels in this run (no manual overrides).
+- Model should not be used for legal/commercial patent decisions without human review.
+## Files in this Repository
+- `config.json`
+- `model.safetensors`
+- tokenizer files
+- optional: training/evaluation summaries
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+repo_id = "<your-username>/assignment3-patentsberta-qlora-gold100"
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForSequenceClassification.from_pretrained(repo_id)
+text = "A method for reducing CO2 emissions in industrial heat recovery systems..."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
+with torch.no_grad():
+    logits = model(**inputs).logits
+pred = torch.argmax(logits, dim=-1).item()
+print({"label": int(pred)})  # 0=not_green, 1=green