File size: 2,411 Bytes
f2a390e f4d4e08 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 f2a390e 6542e31 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This modelcard documents FM-FCI/DurationQA-VLSP2025, a Vietnamese LLM fine-tuned for duration question answering task. It achieved #4 in the VLSP 2025 benchmark for date-arith task.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This work investigates two subtasks in temporal reasoning: 1. Date Arithmetic (datearith) and 2. Duration Question Answering (durationQA). For date-arith, we focus on finetuning large language models (LLMs) to directly extract and compute answers. For durationQA, the challenge lies in identifying both explicit and implicit duration expressions in text and reasoning with world knowledge to assess correctness. We explore multiple approaches, from naive supervised fine-tuning (SFT) to SFT augmented with reasoning-based synthetic data and GRPO. Our findings highlight the critical role of carefully constructed data and appropriate training strategies in enabling effective temporal reasoning.
- **Developed by:** FPT Smart Cloud, FPT Corporation
- **Model type:** Dense
- **Language(s) (NLP):** Vietnamese (primary)
- **License:** ?
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/duccd4/vlsp2025-temporal-qa
- **Paper:** Enabling Temporal Commonsense in Vietnamese LLMs – Date-Arith and DurationQA
## Training Details
### Training Data
15,000 samples
### Training Procedure
#### Training Hyperparameters
Hyperparameter SFT GRPO
Attention FlashAttention-2 FlashAttention-2
Batch size / device 64 16
Learning rate 5.0e-5 1.0e-6
Epochs 3 5
Optimizer AdamW AdamW
DeepSpeed config ZeRO-3 ZeRO-3
## Evaluation
### Testing Data
Đánh giá dựa vào public test, private test mà BTC cung cấp
### Metrics
F1, P, R, EM
### Results
The Engineers 81.89 76.45 88.15 47.52
UIT_BlackCoffee 80.13 73.06 88.72 42.72
AI5 80.03 74.79 86.06 49.12
HUET 79.97 70.71 92.02 40.32
Softmind_AIO 79.06 70.28 90.33 34.08
**BibTeX:**
Enabling Temporal Commonsense in Vietnamese LLMs – Date-Arith and DurationQA
Duc Dinh Chu*, Thanh-Bac Nguyen Ba*, Duy Dinh Le, Khanh Van Tran |