AdAstraAbyssoque commited on
Commit
ad4f902
·
verified ·
1 Parent(s): 6471a21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -43
README.md CHANGED
@@ -1,69 +1,119 @@
1
  ---
2
  library_name: transformers
3
- license: other
4
- base_model: meta-llama/Meta-Llama-3.1-8B
5
  tags:
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
9
  model-index:
10
- - name: graphreason_v1
11
  results: []
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # graphreason_v1
18
 
19
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the /mnt/nas/nuochen/pretrain/cpt/saves/llama3.1-8b/graphreason_v1/ dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.2597
22
 
23
- ## Model description
24
 
25
- More information needed
 
26
 
27
- ## Intended uses & limitations
 
28
 
29
- More information needed
30
 
31
- ## Training and evaluation data
 
 
32
 
33
- More information needed
34
 
35
- ## Training procedure
 
 
 
36
 
37
- ### Training hyperparameters
38
 
39
- The following hyperparameters were used during training:
40
- - learning_rate: 3e-05
41
- - train_batch_size: 2
42
- - eval_batch_size: 1
43
- - seed: 42
44
- - distributed_type: multi-GPU
45
- - num_devices: 32
46
- - gradient_accumulation_steps: 16
47
- - total_train_batch_size: 1024
48
- - total_eval_batch_size: 32
49
- - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
- - lr_scheduler_type: cosine
51
- - lr_scheduler_warmup_ratio: 0.1
52
- - num_epochs: 3.0
53
 
54
- ### Training results
55
 
56
- | Training Loss | Epoch | Step | Validation Loss |
57
- |:-------------:|:------:|:----:|:---------------:|
58
- | 0.6962 | 0.6495 | 200 | 0.2771 |
59
- | 0.6433 | 1.3018 | 400 | 0.2640 |
60
- | 0.6447 | 1.9513 | 600 | 0.2595 |
61
- | 0.6058 | 2.6036 | 800 | 0.2599 |
62
 
 
63
 
64
- ### Framework versions
65
 
66
- - Transformers 4.46.0
67
- - Pytorch 2.3.0+cu121
68
- - Datasets 2.19.2
69
- - Tokenizers 0.20.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
 
4
  tags:
5
  - llama-factory
6
  - full
7
  - generated_from_trainer
8
  model-index:
9
+ - name: GraphMind-LLAMA-3.1-8B
10
  results: []
11
+ base_model:
12
+ - meta-llama/Llama-3.1-8B
13
  ---
14
 
 
 
15
 
16
+ # Model Card for GraphMind Series
17
 
18
+ This model card describes the **GraphMind** series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems.
 
 
19
 
20
+ ## Model Description
21
 
22
+ GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models.
23
+ The core innovation is the continued pre-training (CPT) on **GraphPile**, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data.
24
 
25
+ By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns.
26
+ This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs.
27
 
28
+ The GraphMind series is built upon three popular open-source models:
29
 
30
+ * Llama 3
31
+ * Llama 3.1
32
+ * Gemma 2
33
 
34
+ ## Key Features
35
 
36
+ - **Enhanced General Reasoning**: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks.
37
+ - **Superior Performance on Graph Problems**: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting.
38
+ - **Strong Transfer Learning**: Reasoning skills acquired from graph problems effectively transfer to other domains.
39
+ - **Excellent Post-Training Potential**: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves **23.6% higher accuracy** than its fine-tuned base model.
40
 
41
+ ## Performance
42
 
43
+ GraphMind models show consistent improvements over their base models across reasoning benchmarks.
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
+ **Generalization Improvements**:
46
 
47
+ - **Mathematical Reasoning**: up to **4.9%** average improvement across 11 datasets.
48
+ - **Logical Reasoning**: **33.4%** improvement.
49
+ - **Code Reasoning**: **46.3%** improvement.
50
+ - **Commonsense Reasoning**: **7.8%** improvement.
51
+ - **Multi-Hop QA**: **10.3%** improvement.
 
52
 
53
+ **Foundational Improvements**:
54
 
55
+ - **Graph Problem Reasoning**: Average improvement of **53.1%** compared to baseline models.
56
 
57
+ ## Training Data: The GraphPile Corpus
58
+
59
+ GraphMind's capabilities are derived from its training on **GraphPile**, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data.
60
+
61
+ **Statistics**:
62
+
63
+ - **Total Tokens**: 10.9 Billion
64
+ - **Total Samples**: 2.68 Million
65
+ - **Graph Tasks**: 23 distinct tasks covering multiple reasoning paradigms
66
+
67
+ **Data Components**:
68
+
69
+ 1. **Chain-of-Thought (CoT) Data**: Step-by-step reasoning processes for graph problems, generated using program-guided methods.
70
+ 2. **Program-of-Thought (PoT) Data**: Executable code solutions for graph problems, often derived from standard libraries.
71
+ 3. **Trace-of-Execution (ToE) Data**: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes.
72
+ 4. **Real-world Graph Data**: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts.
73
+
74
+ ## Training Procedure
75
+
76
+ The GraphMind models were developed by performing continued pre-training on the GraphPile dataset.
77
+
78
+ * **Base Models**: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B
79
+ * **Learning Rate**: 3e-5
80
+ * **Epochs**: 3
81
+ * **Max Sequence Length**: 8192
82
+ * **Global Batch Size**: 1024
83
+ * **Hardware**: 32 × NVIDIA H100 GPUs
84
+
85
+ ## Intended Use and Limitations
86
+
87
+ ### Intended Use
88
+
89
+ These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include:
90
+
91
+ * Solving complex logical and mathematical problems.
92
+ * Algorithmic reasoning and code generation for graph-related tasks.
93
+ * Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks.
94
+
95
+ ### Limitations
96
+
97
+ * GraphPile is limited to 23 graph problem tasks; more diversity could improve results.
98
+ * As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation.
99
+ * Further exploration of different GraphPile configurations could yield additional gains.
100
+
101
+ ## Available Models
102
+
103
+ * **HKUST-DSAIL/GraphMind-Gemma2-2B**
104
+ * **HKUST-DSAIL/GraphMind-LLAMA-3.1-8B**
105
+ * **HKUST-DSAIL/GraphMind-LLAMA-3-8B**
106
+
107
+ ## Citation
108
+
109
+ ```bibtex
110
+ @misc{zhang2025improving,
111
+ title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems},
112
+ author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li},
113
+ year={2025},
114
+ eprint={2507.17168},
115
+ archivePrefix={arXiv},
116
+ primaryClass={cs.AI},
117
+ url={https://arxiv.org/abs/2507.17168v1}
118
+ }
119
+ ```