Assignment_3_Model / README.md
CTB2001's picture
Update README.md
24a0beb verified
metadata
language:
  - en
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
  - patents
  - green-tech
  - qlora
  - peft
  - sequence-classification
model-index:
  - name: assignment3-patentsberta-qlora-gold100
    results:
      - task:
          type: text-classification
          name: Green patent detection
        dataset:
          name: patents_50k_green (eval_silver)
          type: custom
        metrics:
          - type: f1
            value: 0.5006382068
            name: Macro F1
          - type: accuracy
            value: 0.5008
            name: Accuracy

Assignment 3 Model — Green Patent Detection (QLoRA + PatentSBERTa)

Model Summary

This repository contains the final downstream Assignment 3 classifier for green patent detection.
Workflow:

  1. Baseline uncertainty sampling on patent claims.
  2. QLoRA-based labeling/rationale generation on top-100 high-risk examples.
  3. Final PatentSBERTa fine-tuning on train_silver + 100 gold high-risk.

Base Model

  • AI-Growth-Lab/PatentSBERTa (sequence classification head, 2 labels)

Training Setup

  • Seed: 42
  • Train rows (augmented): 20,100
  • Eval rows: 5,000
  • Gold rows: 100
  • Hardware used: NVIDIA L4
  • Frameworks: transformers, datasets, torch, peft, bitsandbytes

Results

Assignment 3 final model (this repo)

  • Eval accuracy: 0.5008
  • Eval macro F1: 0.5006382068
  • Gold100 accuracy: 0.53
  • Gold100 macro F1: 0.5037482842

Comparison table (Assignment requirement)

Model Version Training Data Source F1 Score (Eval Set)
1. Baseline Frozen Embeddings (No Fine-tuning) 0.7727474956
2. Assignment 2 Model Fine-tuned on Silver + Gold (Simple LLM) 0.4975369710
3. Assignment 3 Model Fine-tuned on Silver + Gold (QLoRA) 0.5006382068

Reflection (2–3 sentences)

Compared to Assignment 2, the Assignment 3 QLoRA workflow produced a small improvement in eval macro F1 (+0.0031).
This indicates that the advanced data-generation approach provided a measurable but modest downstream gain over the simpler Assignment 2 setup in this run.
However, both fine-tuned pipelines remained substantially below the frozen-embedding baseline, suggesting that data quality and labeling strategy still dominate final performance.

Extended Reflection on Part E Results

The observed F1 score results show that downstream fine-tuning underperformed the frozen-embedding baseline in this run: Baseline macro F1 = 0.7727, Assignment 2 = 0.4975, and Assignment 3 = 0.5006 on the eval set. Although the advanced QLoRA workflow in Assignment 3 improved slightly over Assignment 2 (+0.0031), both fine-tuned models remained far below the baseline, indicating that additional training did not translate into better generalization here.

One plausible explanation is label quality in the high-risk set. In Assignment 3, the 100 uncertain examples were finalized using an auto-accept policy (no independent human correction), so potential labeling errors in the most ambiguous cases may have been passed directly into training. Because these examples are deliberately selected near the decision boundary, they are highly influential; if their labels are noisy, they can destabilize class boundaries and reduce macro F1 on eval data.

Another interpretation is that the fine-tuning stage is more sensitive to supervision quality and distribution mismatch than the linear baseline. A strong frozen-embedding + logistic model can be robust when labels are imperfect, while full downstream fine-tuning may overfit to noisy or weakly validated labels. Overall, the results suggest that the quality of gold labels on uncertain samples is a critical bottleneck, and that true human adjudication on high-risk claims is likely necessary to realize the intended gains from advanced workflows such as QLoRA.

Intended Use

  • Educational/research use for green patent classification experiments.
  • Binary label output: non-green (0) vs green (1).

Limitations

  • Dataset and labels are project-specific and may not generalize broadly.
  • Part C used automated acceptance policy for gold labels in this run (no manual overrides).
  • Model should not be used for legal/commercial patent decisions without human review.

Files in this Repository

  • config.json
  • model.safetensors
  • tokenizer files
  • optional: training/evaluation summaries

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo_id = "<your-username>/assignment3-patentsberta-qlora-gold100"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

text = "A method for reducing CO2 emissions in industrial heat recovery systems..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
pred = torch.argmax(logits, dim=-1).item()
print({"label": int(pred)})  # 0=not_green, 1=green