undertheseanlp/UDD-1
Viewer β’ Updated β’ 20k β’ 289
A Vietnamese dependency parser trained on the UDD-1 dataset using the Biaffine architecture.
Bamboo-1 is a neural dependency parser for Vietnamese that uses:
cd ~/projects/workspace_underthesea/bamboo-1
uv sync
from src.inference import Parser
parser = Parser("undertheseanlp/bamboo-1") # downloads the released safetensors model
sent = parser.parse("TΓ΄i yΓͺu Viα»t Nam")
print(sent.to_conllu())
# Train with default parameters
uv run scripts/train.py
# Train with custom parameters
uv run scripts/train.py --output models/bamboo-1 --max-epochs 200 --feat char
# Train with BERT embeddings
uv run scripts/train.py --feat bert --bert vinai/phobert-base
# Train with Weights & Biases logging
uv run scripts/train.py --wandb
# Evaluate trained model
uv run scripts/evaluate.py --model models/bamboo-1
# Interactive prediction
uv run scripts/predict.py --model models/bamboo-1
# Predict from file
uv run scripts/predict.py --model models/bamboo-1 --input input.txt --output output.conllu
The UDD-1 dataset is automatically downloaded from HuggingFace:
undertheseanlp/UDD-1Input: Vietnamese sentence
β
Word Embeddings + Character LSTM Embeddings
β
BiLSTM Encoder (3 layers, 400 hidden units)
β
Biaffine Attention (Arc + Relation)
β
Output: Dependency tree (head indices + relation labels)
The released checkpoint is the XLM-RoBERTa + Biaffine (Trankit-style) variant
models/bamboo-1.0.0-20260601-xlmr-udd1, trained on UDD-1 (whitespace-tokenized input).
| Split | UAS | LAS |
|---|---|---|
| UDD-1 dev | 88.70% | 82.37% |
| UDD-1 test | 89.25% | 82.87% |
Trained 100 epochs (batch 32, encoder LR 1e-5, head LR 1e-4, AdamW, FP16) on a single RTX 3090.
bamboo-1/
βββ README.md
βββ requirements.txt
βββ scripts/
β βββ train.py # Training script
β βββ evaluate.py # Evaluation script
β βββ predict.py # Prediction script
βββ bamboo1/
β βββ corpus.py # UDD-1 corpus loader
βββ models/ # Trained models (generated)
βββ data/ # Downloaded dataset (generated)
MIT License