Qwen2.5-7B-ViMetaMathQA-Mini

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct optimized for solving mathematical problems in Vietnamese.

It was trained on a 100,000-sample subset of the translated MetaMathQA dataset, utilizing high-performance computing techniques including Flash Attention 2 and BFloat16 precision on NVIDIA H100 hardware.

Model Details

Developed by: PeterPaker123
Language: Vietnamese
Base Model: Qwen/Qwen2.5-7B-Instruct
Fine-tuning Dataset: 5CD-AI/Vietnamese-395k-meta-math-MetaMathQA-gg-translated (100k subset)
Task: Mathematical Reasoning and Problem Solving

Training Configuration

The model was trained with the following settings to ensure high efficiency and reasoning quality:

Hardware: NVIDIA H100 80GB HBM3
Optimization: Flash Attention 2, TF32 enabled
Precision: BFloat16 (Mixed Precision)
Optimizer: AdamW (8-bit)
Learning Rate: 1e-5
Batch Size: 4 (Per device)
Gradient Accumulation: 4 (Effective Batch Size: 16)
Max Sequence Length: 2048 tokens (with Sequence Packing)
Epochs: 1

Intended Use

This model is designed to act as a mathematical assistant for Vietnamese speakers. It is particularly effective at:

Solving simple algebra problems.
Following Vietnamese instructional prompts for mathematical logic.

System Prompt

For best results, use the system prompt used during training:

Bạn là một chuyên gia toán học. Hãy giải bài toán sau bằng tiếng Việt.

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "PeterPaker123/Qwen2.5-7B-ViMetaMathQA-Mini"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2" # Recommended for H100/A100/L4
)

messages = [
    {"role": "system", "content": "Bạn là một chuyên gia toán học. Hãy giải bài toán sau bằng tiếng Việt."},
    {"role": "user", "content": "Tìm x, biết 2x + 5 = 15."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

Data Size: The model was fine-tuned on a 100,000-sample subset of the translated MetaMathQA dataset. While substantial, this represents only a portion of the full Vietnamese mathematical landscape and may not capture every specific niche or advanced mathematical dialect.
Preliminary Model: This is an initial experiment utilizing the Qwen 2.5 architecture on specialized Vietnamese mathematical data. It is intended as a proof-of-concept for high-performance math reasoning on H100 hardware.
Calculation Hallucination: Like all LLMs, the model may occasionally generate plausible-sounding but mathematically incorrect steps or hallucinate numerical values. Users should manually verify critical calculations.

Ethical Considerations

Bias and Fairness: Like any other machine learning model, there is a possibility that this model might reproduce or amplify biases present in the training data. This includes biases inherited from the base Qwen model or specific patterns found in the translated MetaMathQA dataset.
Use in Critical Systems: As this is a preliminary model intended for research and educational assistance, it is recommended not to use it for mission-critical applications without rigorous human-in-the-loop validation.
Fine-tuning Data: The model was fine-tuned on a custom subset of 100,000 instruction samples in Vietnamese, translated from the MetaMathQA dataset.

Credits

I would like to express my sincere gratitude to the following organizations and communities:

Fifth Civil Defender (5CD-AI): A special thank to 5CD-AI for creating and sharing the Vietnamese-395k-meta-math-MetaMathQA-gg-translated dataset. This work was instrumental in providing the high-quality Vietnamese mathematical data required for this fine-tuning project.
The Qwen Team: Gratitude to the creators of the Qwen 2.5 architecture at Alibaba Cloud for providing a world-class, high-performance base model that serves as the foundation for this project.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for PeterPaker123/Qwen2.5-7B-ViMetaMathQA-Mini

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct