|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
tags: |
|
|
- llama-factory |
|
|
- full |
|
|
- generated_from_trainer |
|
|
model-index: |
|
|
- name: GraphMind-LLAMA-3.1-8B |
|
|
results: [] |
|
|
base_model: |
|
|
- meta-llama/Llama-3.1-8B |
|
|
--- |
|
|
|
|
|
|
|
|
# Model Card for GraphMind Series |
|
|
|
|
|
This model card describes the **GraphMind** series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models. |
|
|
The core innovation is the continued pre-training (CPT) on **GraphPile**, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data. |
|
|
|
|
|
By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns. |
|
|
This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs. |
|
|
|
|
|
The GraphMind series is built upon three popular open-source models: |
|
|
|
|
|
* Llama 3 |
|
|
* Llama 3.1 |
|
|
* Gemma 2 |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Enhanced General Reasoning**: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks. |
|
|
- **Superior Performance on Graph Problems**: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting. |
|
|
- **Strong Transfer Learning**: Reasoning skills acquired from graph problems effectively transfer to other domains. |
|
|
- **Excellent Post-Training Potential**: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves **23.6% higher accuracy** than its fine-tuned base model. |
|
|
|
|
|
## Performance |
|
|
|
|
|
GraphMind models show consistent improvements over their base models across reasoning benchmarks. |
|
|
|
|
|
**Generalization Improvements**: |
|
|
|
|
|
- **Mathematical Reasoning**: up to **4.9%** average improvement across 11 datasets. |
|
|
- **Logical Reasoning**: **33.4%** improvement. |
|
|
- **Code Reasoning**: **46.3%** improvement. |
|
|
- **Commonsense Reasoning**: **7.8%** improvement. |
|
|
- **Multi-Hop QA**: **10.3%** improvement. |
|
|
|
|
|
**Foundational Improvements**: |
|
|
|
|
|
- **Graph Problem Reasoning**: Average improvement of **53.1%** compared to baseline models. |
|
|
|
|
|
## Training Data: The GraphPile Corpus |
|
|
|
|
|
GraphMind's capabilities are derived from its training on **GraphPile**, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data. |
|
|
|
|
|
**Statistics**: |
|
|
|
|
|
- **Total Tokens**: 10.9 Billion |
|
|
- **Total Samples**: 2.68 Million |
|
|
- **Graph Tasks**: 23 distinct tasks covering multiple reasoning paradigms |
|
|
|
|
|
**Data Components**: |
|
|
|
|
|
1. **Chain-of-Thought (CoT) Data**: Step-by-step reasoning processes for graph problems, generated using program-guided methods. |
|
|
2. **Program-of-Thought (PoT) Data**: Executable code solutions for graph problems, often derived from standard libraries. |
|
|
3. **Trace-of-Execution (ToE) Data**: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes. |
|
|
4. **Real-world Graph Data**: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts. |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
The GraphMind models were developed by performing continued pre-training on the GraphPile dataset. |
|
|
|
|
|
* **Base Models**: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B |
|
|
* **Learning Rate**: 3e-5 |
|
|
* **Epochs**: 3 |
|
|
* **Max Sequence Length**: 8192 |
|
|
* **Global Batch Size**: 1024 |
|
|
* **Hardware**: 32 × NVIDIA H100 GPUs |
|
|
|
|
|
## Intended Use and Limitations |
|
|
|
|
|
### Intended Use |
|
|
|
|
|
These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include: |
|
|
|
|
|
* Solving complex logical and mathematical problems. |
|
|
* Algorithmic reasoning and code generation for graph-related tasks. |
|
|
* Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks. |
|
|
|
|
|
### Limitations |
|
|
|
|
|
* GraphPile is limited to 23 graph problem tasks; more diversity could improve results. |
|
|
* As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation. |
|
|
* Further exploration of different GraphPile configurations could yield additional gains. |
|
|
|
|
|
## Available Models |
|
|
|
|
|
* **HKUST-DSAIL/GraphMind-Gemma2-2B** |
|
|
* **HKUST-DSAIL/GraphMind-LLAMA-3.1-8B** |
|
|
* **HKUST-DSAIL/GraphMind-LLAMA-3-8B** |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{zhang2025improving, |
|
|
title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems}, |
|
|
author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li}, |
|
|
year={2025}, |
|
|
eprint={2507.17168}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.AI}, |
|
|
url={https://arxiv.org/abs/2507.17168v1} |
|
|
} |
|
|
``` |