{}
Model Card for Model ID
This modelcard documents FM-FCI/FinNumQA-VLSP2025, a Vietnamese LLM fine-tuned for Financial Numerical Reasoning QA task. It achieved #1 in the VLSP 2025 benchmark this task.
Model Details
Model Description
This study details our methodology for the VLSP 2025 Numerical Reasoning QA challenge, focusing on building transparent and accurate models for Vietnamese financial question answering that requires computational reasoning. We propose a two-stage alignment framework combining supervised fine-tuning (SFT) with program-centric policy optimization (PCPO), which is implemented through group relative policy optimization (GRPO) to enhance both program and execution accuracy. First, we leverage an advanced large language model (LLM) to generate high-quality structured reasoning pathways from an augmented dataset derived from the competition organizers’ resources. The Qwen3-8B model is then fine-tuned on these structured traces and further refined through GRPO using meticulously designed reward functions to optimize logical consistency. Our approach secured first place among 16 participating teams, achieving 77.87% program accuracy with 82.49% execution accuracy on the public test set, and 76.63% program accuracy with 79.88% execution accuracy on the private test set. Key insights reveal the significance of domain-specific structured reasoning traces, the effectiveness of multilingual data augmentation, and the critical role of PCPO in maintaining accurate numerical reasoning abilities.
- Developed by: FPT Smart Cloud, FPT Corporation
- Model type: Dense
- Language(s) (NLP): Vietnamese (primary)
- License: ?
Model Sources [optional]
- Repository: https://github.com/duccd4/vlsp2025-financial-numerical-reasoning
- Paper: Enhancing Numerical Reasoning in Vietnamese Financial Question Answering through Program-Centric Policy Optimization
Training Details
Training Data
14,661 samples
Training Procedure
Training Hyperparameters
The primary training hyperparameters included mixed precision BF16, a maximum sequence length of 5, 888 tokens, a learning rate of 5.0 × 10−5 with a cosine scheduler, 5 training epochs, the AdamW optimizer with 25 warmup steps, and a per-device training batch size of 4. To accelerate training and maximize hardware utilization, we employed DeepSpeed Stage 3 together with FlashAttention 2.
We adopted conservative settings to stabilize policy updates and preserve SFT behaviour, including a learning rate of 1 × 10−6 with the AdamW optimizer and KL regularization with a KL loss coefficient of 0.001. Training used a global batch size of 16, a PPO mini-batch size of 16, and a per-GPU micro-batch size of 2 to manage memory consumption. Rollouts sampled n = 5 candidate responses per prompt to obtain stable advantage estimates. Gradient checkpointing was 201 enabled, and the pipeline supported long structured traces (maximum prompt length of 5, 888 tokens and maximum response length of 26, 880 tokens). Training was conducted for a total of 5 epochs.
Evaluation
Testing Data
Đánh giá dựa vào public test, private test mà BTC cung cấp
Metrics
PA, EA
Program Accuracy. This metric assesses whether the generated computation program (e.g., subtract(663, 362)) faithfully reflects the logical structure of the reference solution. It is the sole criterion for competitive ranking, as it ensures both the transparency and correctness of the reasoning process. This requirement is particularly critical in financial applications, where logically flawed programs – even if they yield numerically correct results – pose systemic risks to decision-making support.
Execution Accuracy. This metric verifies whether the final numerical output (e.g., 26.07) matches the ground-truth value. While informative, execution accuracy does not determine official rankings, as correctness of reasoning takes precedence over coincidental correctness of outputs.
Results
Team PA (%) EA (%) HUET 76.63 79.88 ngoquanghuy 75.00 81.95 dathvt 69.82 79.14 truong13012004 69.67 74.26 vietld 61.83 68.49 masterunited 54.14 56.80
BibTeX:
Enhancing Numerical Reasoning in Vietnamese Financial QuestionAnswering through Program-Centric Policy Optimization
Duc Dinh Chu*, Thanh-Bac Nguyen Ba*, Duy Dinh Le, Khanh Van Tran