Update TECHNICAL_REPORT_REVIEW.md with PhoBERT ablation results
Browse files
TECHNICAL_REPORT_REVIEW.md
CHANGED
|
@@ -111,6 +111,7 @@ The codebase has been restructured:
|
|
| 111 |
| Fill in UDD-1 results (Section 4.2) | β
Done | 55.42% UAS, 41.19% LAS |
|
| 112 |
| Qualify SOTA claims | β
Done | Now specifies "UD_Vietnamese-VTB" |
|
| 113 |
| Update file paths | β
Done | scripts/ β src/ |
|
|
|
|
| 114 |
| Error analysis | β Pending | Per-relation breakdown needed |
|
| 115 |
| UDD-1 characterization | β Pending | Why 79 relations? |
|
| 116 |
| Statistical significance | β Pending | Confidence intervals needed |
|
|
@@ -128,7 +129,7 @@ The codebase has been restructured:
|
|
| 128 |
|
| 129 |
3. **Dataset Derivation**: How was UDD-1 derived? Why does it have 79 relations while VTB has 37?
|
| 130 |
|
| 131 |
-
4. **Performance Gap on VnDT**:
|
| 132 |
|
| 133 |
---
|
| 134 |
|
|
@@ -144,13 +145,13 @@ The codebase has been restructured:
|
|
| 144 |
|
| 145 |
5. **Characterize UDD-1**: Explain the 79-relation label set and relationship to other datasets.
|
| 146 |
|
| 147 |
-
6.
|
| 148 |
|
| 149 |
---
|
| 150 |
|
| 151 |
## Summary
|
| 152 |
|
| 153 |
-
This technical report now presents complete results across three Vietnamese dependency parsing benchmarks. The SOTA achievement on UD_Vietnamese-VTB (+8.5% over Trankit) is notable. The main remaining concern is the unexpectedly low performance on UDD-1 (55.42% UAS) which needs investigation
|
| 154 |
|
| 155 |
---
|
| 156 |
|
|
|
|
| 111 |
| Fill in UDD-1 results (Section 4.2) | β
Done | 55.42% UAS, 41.19% LAS |
|
| 112 |
| Qualify SOTA claims | β
Done | Now specifies "UD_Vietnamese-VTB" |
|
| 113 |
| Update file paths | β
Done | scripts/ β src/ |
|
| 114 |
+
| PhoBERT encoder ablation | β
Done | 84.92% UAS, 78.14% LAS (+1.51/+1.82% vs XLM-R) |
|
| 115 |
| Error analysis | β Pending | Per-relation breakdown needed |
|
| 116 |
| UDD-1 characterization | β Pending | Why 79 relations? |
|
| 117 |
| Statistical significance | β Pending | Confidence intervals needed |
|
|
|
|
| 129 |
|
| 130 |
3. **Dataset Derivation**: How was UDD-1 derived? Why does it have 79 relations while VTB has 37?
|
| 131 |
|
| 132 |
+
4. **~~Performance Gap on VnDT~~** β
**ANSWERED**: The encoder ablation study confirms that PhoBERT's Vietnamese-specific pretraining accounts for the performance difference. With PhoBERT-base encoder, VnDT results improve to 84.92% UAS, 78.14% LAS (vs 85.22% UAS, 78.77% LAS in literature).
|
| 133 |
|
| 134 |
---
|
| 135 |
|
|
|
|
| 145 |
|
| 146 |
5. **Characterize UDD-1**: Explain the 79-relation label set and relationship to other datasets.
|
| 147 |
|
| 148 |
+
6. β
~~Compare with PhoBERT encoder~~ - **DONE** (PhoBERT-base: 84.92% UAS, 78.14% LAS on VnDT)
|
| 149 |
|
| 150 |
---
|
| 151 |
|
| 152 |
## Summary
|
| 153 |
|
| 154 |
+
This technical report now presents complete results across three Vietnamese dependency parsing benchmarks. The SOTA achievement on UD_Vietnamese-VTB (+8.5% over Trankit) is notable. The PhoBERT encoder ablation confirms that Vietnamese-specific pretraining accounts for the VnDT performance gap, with PhoBERT-base achieving 84.92% UAS, 78.14% LAS (+1.51%/+1.82% over XLM-RoBERTa). The main remaining concern is the unexpectedly low performance on UDD-1 (55.42% UAS) which needs investigation. The reproducibility remains exemplary.
|
| 155 |
|
| 156 |
---
|
| 157 |
|