File size: 6,987 Bytes

---
license: mit
language:
- vi
- en
- zh
- id
- th
base_model:
- Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
---

  
# GreenMind-Medium-14B-R1

We release **GreenMind-Medium-14B-R1**, a medium-sized Vietnamese language model capable of effectively addressing questions that require intermediate-level reasoning, such as general knowledge, mathematics, natural science and social science topics. By leveraging the Group Relative Policy Optimization strategy for fine-tuning, we guide the model to generate logically coherent responses.

## Model Description

- **Model Type:** Causal Language Models
- **Base Model:** [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
- **Parameters:** 14.7B
- **Context Length:** Full 131,072 tokens and generation 8192 tokens
- **Language:** Vietnamese

## Quickstart

Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "GreenNode/GreenMind-Medium-14B-R1"

model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
	model_name,
	revision='main',
	trust_remote_code=False,
)
prompt = r"""Vừa gà vừa chó
Bó lại cho tròn
Ba mươi sáu con
Một trăm chân chẵn
Hỏi có bao nhiêu con gà, bao nhiêu con chó?"""

messages = [
	{
	"role": "system",
	"content": "Bạn là một trợ lý ảo hữu ích trong việc trả lời câu hỏi. Hãy suy luận từng bước, và đưa ra đáp án trong thẻ <answer> </answer>."
	},
	{
	"role": "user",
	"content": f"{prompt} Hãy suy luận từng bước trong thẻ <think> </think>. Và trả về đáp án trong thẻ <answer> </answer>."
	},
	{
	"role": "assistant",
	"content": "Hãy để tôi giải quyết từng bước.\n<think>"
	}
]

text = tokenizer.apply_chat_template(
	messages, 
	tokenize=False, 
	continue_final_message=True)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)

generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
# Đầu tiên, chúng ta cần thiết lập hai phương trình dựa trên thông tin đề bài:
# 1. Tổng số con gà và chó là 36: x + y = 36
# 2. Tổng số chân là 100: 2x + 4y = 100
# Trong đó, x là số con gà và y là số con chó.
# Tiếp theo, chúng ta giải hệ phương trình này:
# Từ phương trình thứ nhất, ta có: x = 36 - y
# Thay vào phương trình thứ hai: 2(36 - y) + 4y = 100
# => 72 - 2y + 4y = 100
# => 2y = 28
# => y = 14 (số con chó)
# Thay y = 14 vào phương trình x + y = 36:
# => x = 36 - 14 = 22 (số con gà)
# Vậy, có 22 con gà và 14 con chó.
# </think>
# <answer>Có 22 con gà và 14 con chó.</answer>
```

## Evaluation

**Table 1. SeaExam Dataset.**  GreenMind-Medium-14B-R1 compared to base model and some models with larger size.
| **Model**                        | **SeaExam-ID** | **SeaExam-TH** | **SeaExam-VI** | **Avg**  |
|----------------------------------|----------------|----------------|----------------|----------|
| Meta-Llama-3.1-70B-Instruct      | 65.8           | **70.6**       | 72.6           | 69.7     |
| gemma3-27b-it                    | 64.4           | 67.5           | 73.1           | 68.4     |
| Qwen2.5-14B-Instruct             | 67.6           | 68.8           | 73.1           | 69.8     |
| **GreenMind-Medium-14B-R1**      | **74.36**      | 69.75          | **74.44**      | **72.79** |

**Table 2. VLSP 2023 Challenge:** The performance of our model outperforms most SOTA models.

| **Model**                         | **ComprehensionQA-vi ↑** | **Exams-vi ↑** | **LAMBADA-vi ↓** | **WikiQA-vi ↑** | **MMLU-vi ↑** |
|----------------------------------|---------------------------|----------------|------------------|-----------------|---------------|
| cpt-smartbot-13b                 | 0.6633                    | 0.3473         | 21.9864          | 0.4455          | 0.414         |
| ura-llama-13b                    | 0.6556                    | 0.342          | 17.5614          | 0.438           | 0.3973        |
| greennode-7b (prior work)        | 0.6122                    | 0.2892         | 189.7782         | 0.3335          | 0.387         |
| greennode-14b (prior work)       | 0.6711                    | 0.3672         | 29.5967          | 0.468           | 0.5281        |
| **GreenMind-Medium-14B-R1 (Ours)** | **0.8689**              | **0.7796**     | **10.7609**      | **0.7915**      | **0.7124**     |

**Table 3. VMLU Dataset.** The performance compared to fine-tuned models.

| **Model**                        | **Access** | **STEM** | **Social Science** | **Humanities** | **Others** | **Avg**  |
|----------------------------------|-----------|----------|---------------------|----------------|------------|----------|
| VNPTAI.IO-Medium-R1              | Private   | 77.09    | 82.3                | 78.85          | 69.98      | 77.43     |
| MISA-Llama3-v1.1                 | Private   | 77.5     | 80.75               | 76.62          | 71.6       | 76.87     |
| BnK-AI-Medium-v2                 | Private   | 80.94    | 80.76               | 70.7           | 74.06      | 76.66     |
| VNPTAI.IO-Large-v4               | Private   | 78.05    | 79.05               | 75.39          | 70.37      | 76.21     |
| GreenNode-xMedium-v1            | Private   | 75.7     | 81.09               | 75.25          | 69.33      | 75.5      |
| **GreenMind-Medium-14B-R1 (Ours)** | Weight  | 76.78    | 77.36               | 72.32          | 69.03      | 74.29     |
| CakebyVPBank-Large              | Private   | 77.75    | 78.11               | 70.38          | 67.82      | 73.99     |
| DeepSeek-R1-Distill-Llama-70B   | Weight    | 76.77    | 76.23               | 67.98          | 66.82      | 72.41     |

## Follow us

https://x.com/greennode23

## Support

https://discord.gg/B6MJFM3J3a

## License

This repository and the model weights are licensed under the [MIT License](LICENSE).

## Citation

If you find our work helpful, feel free to give us a cite.

```
@misc{tung2025greenmindnextgenerationvietnameselarge,
      title={GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning}, 
      author={Luu Quy Tung and Hoang Quoc Viet and Pham Bao Loc and Vo Trong Thu},
      year={2025},
      eprint={2504.16832},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.16832}, 
}
```

## Contact Us

- General & Collaboration: tung.vu@greennode.ai, thuvt@greennode.ai, locpb@greennode.ai
- Technical: viethq5@greennode.ai