rain1024 commited on
Commit
149dcb1
Β·
verified Β·
1 Parent(s): c2e5163

Update TECHNICAL_REPORT_REVIEW.md with PhoBERT ablation results

Browse files
Files changed (1) hide show
  1. TECHNICAL_REPORT_REVIEW.md +4 -3
TECHNICAL_REPORT_REVIEW.md CHANGED
@@ -111,6 +111,7 @@ The codebase has been restructured:
111
  | Fill in UDD-1 results (Section 4.2) | βœ… Done | 55.42% UAS, 41.19% LAS |
112
  | Qualify SOTA claims | βœ… Done | Now specifies "UD_Vietnamese-VTB" |
113
  | Update file paths | βœ… Done | scripts/ β†’ src/ |
 
114
  | Error analysis | ❌ Pending | Per-relation breakdown needed |
115
  | UDD-1 characterization | ❌ Pending | Why 79 relations? |
116
  | Statistical significance | ❌ Pending | Confidence intervals needed |
@@ -128,7 +129,7 @@ The codebase has been restructured:
128
 
129
  3. **Dataset Derivation**: How was UDD-1 derived? Why does it have 79 relations while VTB has 37?
130
 
131
- 4. **Performance Gap on VnDT**: Why does Bamboo-1 underperform PhoBERT despite similar architecture?
132
 
133
  ---
134
 
@@ -144,13 +145,13 @@ The codebase has been restructured:
144
 
145
  5. **Characterize UDD-1**: Explain the 79-relation label set and relationship to other datasets.
146
 
147
- 6. **Compare with PhoBERT encoder**: Ablation study comparing XLM-RoBERTa vs PhoBERT.
148
 
149
  ---
150
 
151
  ## Summary
152
 
153
- This technical report now presents complete results across three Vietnamese dependency parsing benchmarks. The SOTA achievement on UD_Vietnamese-VTB (+8.5% over Trankit) is notable. The main remaining concern is the unexpectedly low performance on UDD-1 (55.42% UAS) which needs investigation and explanation. The reproducibility remains exemplary.
154
 
155
  ---
156
 
 
111
  | Fill in UDD-1 results (Section 4.2) | βœ… Done | 55.42% UAS, 41.19% LAS |
112
  | Qualify SOTA claims | βœ… Done | Now specifies "UD_Vietnamese-VTB" |
113
  | Update file paths | βœ… Done | scripts/ β†’ src/ |
114
+ | PhoBERT encoder ablation | βœ… Done | 84.92% UAS, 78.14% LAS (+1.51/+1.82% vs XLM-R) |
115
  | Error analysis | ❌ Pending | Per-relation breakdown needed |
116
  | UDD-1 characterization | ❌ Pending | Why 79 relations? |
117
  | Statistical significance | ❌ Pending | Confidence intervals needed |
 
129
 
130
  3. **Dataset Derivation**: How was UDD-1 derived? Why does it have 79 relations while VTB has 37?
131
 
132
+ 4. **~~Performance Gap on VnDT~~** βœ… **ANSWERED**: The encoder ablation study confirms that PhoBERT's Vietnamese-specific pretraining accounts for the performance difference. With PhoBERT-base encoder, VnDT results improve to 84.92% UAS, 78.14% LAS (vs 85.22% UAS, 78.77% LAS in literature).
133
 
134
  ---
135
 
 
145
 
146
  5. **Characterize UDD-1**: Explain the 79-relation label set and relationship to other datasets.
147
 
148
+ 6. βœ… ~~Compare with PhoBERT encoder~~ - **DONE** (PhoBERT-base: 84.92% UAS, 78.14% LAS on VnDT)
149
 
150
  ---
151
 
152
  ## Summary
153
 
154
+ This technical report now presents complete results across three Vietnamese dependency parsing benchmarks. The SOTA achievement on UD_Vietnamese-VTB (+8.5% over Trankit) is notable. The PhoBERT encoder ablation confirms that Vietnamese-specific pretraining accounts for the VnDT performance gap, with PhoBERT-base achieving 84.92% UAS, 78.14% LAS (+1.51%/+1.82% over XLM-RoBERTa). The main remaining concern is the unexpectedly low performance on UDD-1 (55.42% UAS) which needs investigation. The reproducibility remains exemplary.
155
 
156
  ---
157