Signe22
/

patentsberta-green-hitl

Model card Files Files and versions

Signe22 commited on Feb 15

Commit

21859f9

·

verified ·

1 Parent(s): f201ae2

Update README.md

Files changed (1) hide show

README.md +28 -1

README.md CHANGED Viewed

@@ -1,3 +1,7 @@
 # Links
 Model: https://huggingface.co/Signe22/patentsberta-green-hitl
@@ -47,6 +51,7 @@ In the HITL review, the human annotator agreed with the LLM on all 100 cases. Be
 - **Human override:** No
 ---
 > **Claim 2:**
 > A system for displaying braking information comprising:
 >
@@ -71,4 +76,26 @@ In the HITL review, the human annotator agreed with the LLM on all 100 cases. Be
 # Model training
 The model was fine-tuned for one epoch using a maximum sequence length of 256 tokens and a learning rate of 2e-5, following the recommended settings to keep computation reasonable. Tokenization was performed using the PatentSBERTa tokenizer prior to training.
-Model performance was evaluated on the held-out eval_silver split to assess generalization, and separately on the gold_100 set to analyze performance on human-labeled examples.

+---
+language:
+- en
+---
 # Links
 Model: https://huggingface.co/Signe22/patentsberta-green-hitl
 - **Human override:** No
 ---
+### Patent Claim (Full Text)
 > **Claim 2:**
 > A system for displaying braking information comprising:
 >
 # Model training
 The model was fine-tuned for one epoch using a maximum sequence length of 256 tokens and a learning rate of 2e-5, following the recommended settings to keep computation reasonable. Tokenization was performed using the PatentSBERTa tokenizer prior to training.
+Model performance was evaluated on the held-out eval_silver split to assess generalization, and separately on the gold_100 set to analyze performance on human-labeled examples.
+## Evaluation Results
+The final model was evaluated on both the held-out silver-labeled evaluation set and the human-labeled gold set.
+### Evaluation on `eval_silver` (Silver Labels)
+| Metric     | Score |
+|------------|-------|
+| Accuracy   | **0.807** |
+| Precision  | **0.815** |
+| Recall     | **0.791** |
+| F1-score   | **0.803** |
+### Evaluation on `gold_100` (Human Labels)
+| Metric     | Score |
+|------------|-------|
+| Accuracy   | **0.610** |
+| Precision  | **0.093** |
+| Recall     | **1.000** |
+| F1-score   | **0.170** |