G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge

G-reasoner is a unified framework that integrates graph and language foundation models for reasoning over diverse graph-structured knowledge.

For more details, please refer to our project page and paper.

🎉 News

[2026-04-20] We have released the G-reasoner codebase and a 34M pre-trained model. 🚀
[2026-01-27] We are excited to share that G-reasoner has been accepted by ICLR 2026.
[2025-10-01] Checkout our latest progress: G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge. Code and model will be updated soon.
[2025-09-19] We are excited to share that GFM-RAG has been accepted by NeurIPS 2025.
[2025-06-03] We have released a new version of GFM-RAG (2025-06-03) which is pre-trained on 286 KGs. Performance comparison with the previous version can be found in CHANGELOG.
[2025-02-06] We have released the GFM-RAG codebase and a 8M pre-trained model. 🚀

Features

Graph Foundation Model (GFM): A graph neural network-based retriever that can reason over the graph-index.
Universal Graph Index: A universal graph index that can represent various types of structural knowledge such as Knowledge Graphs, Document Graphs, and Hierarchical Graphs.
Efficiency: The GFM-RAG pipeline is efficient in conducting multi-hop reasoning with single-step retrieval.
Generalizability: The GFM-RAG can be directly applied to unseen datasets without fine-tuning.
Transferability: The GFM-RAG can be fine-tuned on your own dataset to improve performance on specific domains.
Compatibility: The GFM-RAG is compatible with arbitrary agent-based framework to conduct multi-step reasoning.
Interpretability: The GFM-RAG can illustrate the captured reasoning paths for better understanding.

Dependencies

Python 3.12
CUDA 12 and above (CUDA 12.6.3 is recommended)

Installation

Conda provides an easy way to install the CUDA development toolkit which is required by GFM-RAG

Install packages

conda create -n gfmrag python=3.12
conda activate gfmrag
conda install cuda-toolkit -c nvidia/label/cuda-12.6.3 # Replace with your desired CUDA version
pip install gfmrag

Quick Start

Read the full documentation at: https://rmanluo.github.io/gfm-rag/

This section shows the smallest end-to-end retrieval example:

Provide a dataset in raw/
Let GFMRetriever.from_index(...) build stage1 automatically if needed
Call retriever.retrieve(...) to get documents

You can find the full data schema here Data Format.

Prepare A Minimal Raw Dataset

Create the following directory:

data/
└── toy_raw/
    └── raw/
        ├── documents.json
        └── test.json (Optional)

`raw/documents.json`

raw/documents.json is the raw document corpus used to build the graph index. It must be a JSON object where:

each key is a document title or document id
each value is the plain-text content of that document

Example:

{
  "France": "France is a country in Western Europe. Paris is its capital. The president of France is Emmanuel Macron.",
  "Paris": "Paris is the capital and most populous city of France.",
  "Emmanuel Macron": "Emmanuel Macron is a French politician who has served as president of France since 2017."
}

`raw/test.json` (optional)

raw/test.json is optional for retrieval itself, but useful for storing example queries or later evaluation. It must be a JSON array. Each item should contain:

id: unique sample id
question: the input query

It can also contain task metadata such as:

answer: reference answer
answer_aliases: optional aliases of the answer
supporting_documents: document titles that support the answer

Example:

[
  {
    "id": "toy-1",
    "question": "Who is the president of France?",
    "answer": "Emmanuel Macron",
    "answer_aliases": ["Macron"],
    "supporting_documents": ["France", "Emmanuel Macron"]
  }
]

Retrieve Documents With `GFMRetriever`

The example below follows the same initialization path used in gfmrag/gfmrag_retriever.py and gfmrag/workflow/qa_ircot_inference.py.

Save the script below as quickstart_retrieve.py in the repository root:

import json

import hydra
from hydra.utils import instantiate
from omegaconf import DictConfig

from gfmrag import GFMRetriever


@hydra.main(
    config_path="gfmrag/workflow/config/gfm_rag",
    config_name="qa_ircot_inference",
    version_base=None,
)
def main(cfg: DictConfig) -> None:
    cfg.dataset.root = "./data"
    cfg.dataset.data_name = "toy_raw"

    ner_model = instantiate(cfg.graph_retriever.ner_model)
    el_model = instantiate(cfg.graph_retriever.el_model)
    graph_constructor = instantiate(cfg.graph_constructor)

    retriever = GFMRetriever.from_index(
        data_dir=cfg.dataset.root,
        data_name=cfg.dataset.data_name,
        model_path="rmanluo/G-reasoner-34M",  # or rmanluo/GFM-RAG-8M
        ner_model=ner_model,
        el_model=el_model,
        graph_constructor=graph_constructor,
    )

    results = retriever.retrieve("Who is the president of France?", top_k=5)

    print(results)


if __name__ == "__main__":
    main()

On the first run, GFMRetriever.from_index(...) will use raw/documents.json to build processed/stage1/ automatically if the stage1 graph files do not already exist.

If you have your own pre-built graph files, you can directly place them under processed/stage1/ and GFMRetriever.from_index(...) will load from there for reasoning without rebuilding.

GFM Fine-tuning

During fine-tuning, the GFM model will be trained on the query-documents pairs train.json from the labeled dataset to learn complex relationships for retrieval.

It can be conducted on your own dataset to improve the performance of the model on your specific domain.

An example of the training data:

[
  {
    "id": "5abc553a554299700f9d7871",
    "question": "Kyle Ezell is a professor at what School of Architecture building at Ohio State?",
    "answer": "Knowlton Hall",
    "supporting_documents": ["Knowlton Hall", "Kyle Ezell"],
    "start_nodes": {
      "entity": [
        "kyle ezell",
        "architectural association school of architecture",
        "ohio state"
      ]
    },
    "target_nodes": {
      "document": ["Knowlton Hall", "Kyle Ezell"],
      "entity": [
        "10 million donation",
        "2004",
        "architecture",
        "austin e  knowlton",
        "austin e  knowlton school of architecture",
        "bachelor s in architectural engineering"
      ]
    }
  },
    ...

You need to create a configuration file for fine-tuning.

We have already released the two pre-trained model checkpoint GFM-RAG-8M and G-reasoner-34M, which can be used for further finetuning. The model will be automatically downloaded by specifying it in the configuration.
load_model_from_pretrained: rmanluo/G-reasoner-34M # or rmanluo/GFM-RAG-8M

Details of the configuration parameters are explained in the Training page.

You can fine-tune the pre-trained GFM-RAG model on your dataset using the following command:

python -m gfmrag.workflow.sft_training --config-path config/gfm_reasoner
# Multi-GPU training
torchrun --nproc_per_node=4 -m gfmrag.workflow.sft_training --config-path config/gfm_reasoner
# Multi-node Multi-GPU training
torchrun --nproc_per_node=4 --nnodes=2 -m gfmrag.workflow.sft_training --config-path config/gfm_reasoner

Reproduce Results reported in the paper

Please refer to the Experiment section for detailed reproduction instructions for both GFM-RAG and G-Reasoner.

Acknowledgements

We greatly appreciate the following repositories for their help to this project:

DeepGraphLearning/ULTRA: The ULTRA model is used as the base GNN model for the GFM retriever.
OSU-NLP-Group/HippoRAG: We get great inspiration from the KG construction process of HippoRAG.
microsoft/graphrag: We get great inspiration from the project design of GraphRAG.

Citation

If you find this repository helpful, please consider citing our paper:

@inproceedings{
    luo2026greasoner,
    title={G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge},
    author={Linhao Luo and Zicheng Zhao and Junnan Liu and Zhangchi Qiu and Junnan Dong and Serge Panev and Chen Gong and Thuy-Trang Vu and Gholamreza Haffari and Dinh Phung and Alan Wee-Chung Liew and Shirui Pan},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=zJm9nmoahk}
}

@article{luo2025gfmrag,
  title={GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation},
  author={Luo, Linhao and Zhao, Zicheng and Haffari, Gholamreza and Phung, Dinh and Gong, Chen and Pan, Shirui},
  journal={NeurIPS 2025},
  year={2025}
}

Downloads last month: 152

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including rmanluo/G-reasoner-34M

G-reasoner

Collection

1 item • Updated 23 days ago

Papers for rmanluo/G-reasoner-34M

G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge

Paper • 2509.24276 • Published Mar 1

GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation

Paper • 2502.01113 • Published Feb 3, 2025 • 6

G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge

🎉 News

Features

Dependencies

Installation

Quick Start

Prepare A Minimal Raw Dataset

raw/documents.json

raw/test.json (optional)

Retrieve Documents With GFMRetriever

GFM Fine-tuning

Reproduce Results reported in the paper

Acknowledgements

Citation

Collection including rmanluo/G-reasoner-34M

Papers for rmanluo/G-reasoner-34M

`raw/documents.json`

`raw/test.json` (optional)

Retrieve Documents With `GFMRetriever`