undertheseanlp
/

bamboo-1

@@ -111,6 +111,7 @@ The codebase has been restructured:
 | Fill in UDD-1 results (Section 4.2) | ✅ Done | 55.42% UAS, 41.19% LAS |
 | Qualify SOTA claims | ✅ Done | Now specifies "UD_Vietnamese-VTB" |
 | Update file paths | ✅ Done | scripts/ → src/ |
 | Error analysis | ❌ Pending | Per-relation breakdown needed |
 | UDD-1 characterization | ❌ Pending | Why 79 relations? |
 | Statistical significance | ❌ Pending | Confidence intervals needed |
@@ -128,7 +129,7 @@ The codebase has been restructured:
 3. **Dataset Derivation**: How was UDD-1 derived? Why does it have 79 relations while VTB has 37?
-4. **Performance Gap on VnDT**: Why does Bamboo-1 underperform PhoBERT despite similar architecture?
 ---
@@ -144,13 +145,13 @@ The codebase has been restructured:
 5. **Characterize UDD-1**: Explain the 79-relation label set and relationship to other datasets.
-6. **Compare with PhoBERT encoder**: Ablation study comparing XLM-RoBERTa vs PhoBERT.
 ---
 ## Summary
-This technical report now presents complete results across three Vietnamese dependency parsing benchmarks. The SOTA achievement on UD_Vietnamese-VTB (+8.5% over Trankit) is notable. The main remaining concern is the unexpectedly low performance on UDD-1 (55.42% UAS) which needs investigation and explanation. The reproducibility remains exemplary.
 ---

 | Fill in UDD-1 results (Section 4.2) | ✅ Done | 55.42% UAS, 41.19% LAS |
 | Qualify SOTA claims | ✅ Done | Now specifies "UD_Vietnamese-VTB" |
 | Update file paths | ✅ Done | scripts/ → src/ |
+| PhoBERT encoder ablation | ✅ Done | 84.92% UAS, 78.14% LAS (+1.51/+1.82% vs XLM-R) |
 | Error analysis | ❌ Pending | Per-relation breakdown needed |
 | UDD-1 characterization | ❌ Pending | Why 79 relations? |
 | Statistical significance | ❌ Pending | Confidence intervals needed |
 3. **Dataset Derivation**: How was UDD-1 derived? Why does it have 79 relations while VTB has 37?
+4. **~~Performance Gap on VnDT~~** ✅ **ANSWERED**: The encoder ablation study confirms that PhoBERT's Vietnamese-specific pretraining accounts for the performance difference. With PhoBERT-base encoder, VnDT results improve to 84.92% UAS, 78.14% LAS (vs 85.22% UAS, 78.77% LAS in literature).
 ---
 5. **Characterize UDD-1**: Explain the 79-relation label set and relationship to other datasets.
+6. ✅ ~~Compare with PhoBERT encoder~~ - **DONE** (PhoBERT-base: 84.92% UAS, 78.14% LAS on VnDT)
 ---
 ## Summary
+This technical report now presents complete results across three Vietnamese dependency parsing benchmarks. The SOTA achievement on UD_Vietnamese-VTB (+8.5% over Trankit) is notable. The PhoBERT encoder ablation confirms that Vietnamese-specific pretraining accounts for the VnDT performance gap, with PhoBERT-base achieving 84.92% UAS, 78.14% LAS (+1.51%/+1.82% over XLM-RoBERTa). The main remaining concern is the unexpectedly low performance on UDD-1 (55.42% UAS) which needs investigation. The reproducibility remains exemplary.
 ---