HKUST-DSAIL
/

GraphMind-LLAMA-3.1-8B

@@ -1,69 +1,119 @@
 ---
 library_name: transformers
-license: other
-base_model: meta-llama/Meta-Llama-3.1-8B
 tags:
 - llama-factory
 - full
 - generated_from_trainer
 model-index:
-- name: graphreason_v1
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# graphreason_v1
-This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the /mnt/nas/nuochen/pretrain/cpt/saves/llama3.1-8b/graphreason_v1/ dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2597
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 2
-- eval_batch_size: 1
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 32
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 1024
-- total_eval_batch_size: 32
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 3.0
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.6962        | 0.6495 | 200  | 0.2771          |
-| 0.6433        | 1.3018 | 400  | 0.2640          |
-| 0.6447        | 1.9513 | 600  | 0.2595          |
-| 0.6058        | 2.6036 | 800  | 0.2599          |
-### Framework versions
-- Transformers 4.46.0
-- Pytorch 2.3.0+cu121
-- Datasets 2.19.2
-- Tokenizers 0.20.3

 ---
 library_name: transformers
+license: mit
 tags:
 - llama-factory
 - full
 - generated_from_trainer
 model-index:
+- name: GraphMind-LLAMA-3.1-8B
   results: []
+base_model:
+- meta-llama/Llama-3.1-8B
 ---
+# Model Card for GraphMind Series
+This model card describes the **GraphMind** series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems.
+## Model Description
+GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models.
+The core innovation is the continued pre-training (CPT) on **GraphPile**, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data.
+By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns.
+This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs.
+The GraphMind series is built upon three popular open-source models:
+  * Llama 3
+  * Llama 3.1
+  * Gemma 2
+## Key Features
+- **Enhanced General Reasoning**: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks.
+- **Superior Performance on Graph Problems**: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting.
+- **Strong Transfer Learning**: Reasoning skills acquired from graph problems effectively transfer to other domains.
+- **Excellent Post-Training Potential**: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves **23.6% higher accuracy** than its fine-tuned base model.
+## Performance
+GraphMind models show consistent improvements over their base models across reasoning benchmarks.
+**Generalization Improvements**:
+- **Mathematical Reasoning**: up to **4.9%** average improvement across 11 datasets.
+- **Logical Reasoning**: **33.4%** improvement.
+- **Code Reasoning**: **46.3%** improvement.
+- **Commonsense Reasoning**: **7.8%** improvement.
+- **Multi-Hop QA**: **10.3%** improvement.
+**Foundational Improvements**:
+- **Graph Problem Reasoning**: Average improvement of **53.1%** compared to baseline models.
+## Training Data: The GraphPile Corpus
+GraphMind's capabilities are derived from its training on **GraphPile**, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data.
+**Statistics**:
+- **Total Tokens**: 10.9 Billion
+- **Total Samples**: 2.68 Million
+- **Graph Tasks**: 23 distinct tasks covering multiple reasoning paradigms
+**Data Components**:
+1. **Chain-of-Thought (CoT) Data**: Step-by-step reasoning processes for graph problems, generated using program-guided methods.
+2. **Program-of-Thought (PoT) Data**: Executable code solutions for graph problems, often derived from standard libraries.
+3. **Trace-of-Execution (ToE) Data**: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes.
+4. **Real-world Graph Data**: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts.
+## Training Procedure
+The GraphMind models were developed by performing continued pre-training on the GraphPile dataset.
+* **Base Models**: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B
+* **Learning Rate**: 3e-5
+* **Epochs**: 3
+* **Max Sequence Length**: 8192
+* **Global Batch Size**: 1024
+* **Hardware**: 32 × NVIDIA H100 GPUs
+## Intended Use and Limitations
+### Intended Use
+These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include:
+* Solving complex logical and mathematical problems.
+* Algorithmic reasoning and code generation for graph-related tasks.
+* Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks.
+### Limitations
+* GraphPile is limited to 23 graph problem tasks; more diversity could improve results.
+* As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation.
+* Further exploration of different GraphPile configurations could yield additional gains.
+## Available Models
+* **HKUST-DSAIL/GraphMind-Gemma2-2B**
+* **HKUST-DSAIL/GraphMind-LLAMA-3.1-8B**
+* **HKUST-DSAIL/GraphMind-LLAMA-3-8B**
+## Citation
+```bibtex
+@misc{zhang2025improving,
+      title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems},
+      author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li},
+      year={2025},
+      eprint={2507.17168},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2507.17168v1}
+}
+```