Instructions to use IDEA-FinAI/TouchstoneGPT-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IDEA-FinAI/TouchstoneGPT-7B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="IDEA-FinAI/TouchstoneGPT-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("IDEA-FinAI/TouchstoneGPT-7B-Instruct")
model = AutoModelForCausalLM.from_pretrained("IDEA-FinAI/TouchstoneGPT-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use IDEA-FinAI/TouchstoneGPT-7B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IDEA-FinAI/TouchstoneGPT-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-FinAI/TouchstoneGPT-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/IDEA-FinAI/TouchstoneGPT-7B-Instruct

SGLang

How to use IDEA-FinAI/TouchstoneGPT-7B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IDEA-FinAI/TouchstoneGPT-7B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-FinAI/TouchstoneGPT-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IDEA-FinAI/TouchstoneGPT-7B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-FinAI/TouchstoneGPT-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use IDEA-FinAI/TouchstoneGPT-7B-Instruct with Docker Model Runner:
```
docker model run hf.co/IDEA-FinAI/TouchstoneGPT-7B-Instruct
```

Update model card for Think-on-Graph 3.0 framework (RAG-Factory)

by nielsr HF Staff - opened Sep 30, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+68

-36

Files changed (1) hide show

README.md +68 -36

README.md CHANGED Viewed

@@ -1,17 +1,21 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
-base_model:
-- Qwen/Qwen2-7B-Instruct
-pipeline_tag: text-generation
 library_name: transformers
 tags:
 - finance
 - text-generation-inference
-datasets:
-- IDEA-FinAI/Golden-Touchstone
 ---
 <!-- markdownlint-disable first-line-h1 -->
@@ -19,18 +23,22 @@ datasets:
 <!-- markdownlint-disable no-duplicate-header -->
 <div align="center">
-  <img src="https://github.com/IDEA-FinAI/Golden-Touchstone/blob/main/assets/Touchstone-GPT-logo.png?raw=true" width="15%" alt="Golden-Touchstone" />
-  <h1 style="display: inline-block; vertical-align: middle; margin-left: 10px; font-size: 2em; font-weight: bold;">Golden-Touchstone Benchmark</h1>
 </div>
 <div align="center" style="line-height: 1;">
-  <a href="https://arxiv.org/abs/2411.06272" target="_blank" style="margin: 2px;">
-    <img alt="arXiv" src="https://img.shields.io/badge/Arxiv-2411.06272-b31b1b.svg?logo=arXiv" style="display: inline-block; vertical-align: middle;"/>
   </a>
-  <a href="https://github.com/IDEA-FinAI/Golden-Touchstone" target="_blank" style="margin: 2px;">
-    <img alt="github" src="https://img.shields.io/github/stars/IDEA-FinAI/Golden-Touchstone.svg?style=social" style="display: inline-block; vertical-align: middle;"/>
   </a>
-  <a href="https://huggingface.co/IDEA-FinAI/TouchstoneGPT-7B-Instruct" target="_blank" style="margin: 2px;">
     <img alt="datasets" src="https://img.shields.io/badge/🤗-Datasets-yellow.svg" style="display: inline-block; vertical-align: middle;"/>
   </a>
   <a href="https://huggingface.co/IDEA-FinAI/TouchstoneGPT-7B-Instruct" target="_blank" style="margin: 2px;">
@@ -38,33 +46,56 @@ datasets:
   </a>
 </div>
-# Golden-Touchstone
-Golden Touchstone is a simple, effective, and systematic benchmark for bilingual (Chinese-English) financial large language models, driving the research and implementation of financial large language models, akin to a touchstone. We also have trained and open-sourced Touchstone-GPT as a baseline for subsequent community research.
-## Introduction
-The paper shows the evaluation of the diversity, systematicness and LLM adaptability of each open source benchmark.
-![benchmark_info](https://github.com/IDEA-FinAI/Golden-Touchstone/blob/main/assets/benchmark_info.png?raw=true)
-By collecting and selecting representative task datasets, we built our own Chinese-English bilingual Touchstone Benchmark, which includes 22 datasets
-![golden_touchstone_info](https://github.com/IDEA-FinAI/Golden-Touchstone/blob/main/assets/golden_touchstone_info.png?raw=true)
-We extensively evaluated GPT-4o, llama3, qwen2, fingpt and our own trained Touchstone-GPT, analyzed the advantages and disadvantages of these models, and provided direction for subsequent research on financial large language models
-![evaluation](https://github.com/IDEA-FinAI/Golden-Touchstone/blob/main/assets/evaluation.png?raw=true)
-## Evaluation of Touchstone Benchmark
-Please See our github repo [Golden-Touchstone](https://github.com/IDEA-FinAI/Golden-Touchstone)
-## Usage of Touchstone-GPT
-Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -77,7 +108,8 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 tokenizer = AutoTokenizer.from_pretrained("IDEA-FinAI/TouchstoneGPT-7B-Instruct")
-prompt = "What is the sentiment of the following financial post: Positive, Negative, or Neutral?\nsees #Apple at $150/share in a year (+36% from today) on growing services business."
 messages = [
     {"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": prompt}
@@ -100,17 +132,17 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Citation
-```
-@misc{wu2024goldentouchstonecomprehensivebilingual,
-      title={Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models},
-      author={Xiaojun Wu and Junxi Liu and Huanyi Su and Zhouchi Lin and Yiyan Qi and Chengjin Xu and Jiajun Su and Jiajie Zhong and Fuwei Wang and Saizhuo Wang and Fengrui Hua and Jia Li and Jian Guo},
-      year={2024},
-      eprint={2411.06272},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2411.06272},
 }
 ```

 ---
+base_model:
+- Qwen/Qwen2-7B-Instruct
+datasets:
+- IDEA-FinAI/Golden-Touchstone
 language:
 - en
 - zh
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - finance
 - text-generation-inference
+- retrieval-augmented-generation
+- rag
+- graph-neural-networks
+- llm-reasoning
 ---
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable no-duplicate-header -->
 <div align="center">
+<div style="margin: 20px 0;al">
+  <img src="https://github.com/DataArcTech/RAG-Factory/blob/main/assets/logo.png?raw=true" width="120" height="120" alt="RAG-Factory Logo" style="border-radius: 20px; box-shadow: 0 8px 32px rgba(0, 217, 255, 0.3);">
+</div>
+# ✨ TouchstoneGPT-7B-Instruct: A Model for Think-on-Graph 3.0 via RAG-Factory
 </div>
 <div align="center" style="line-height: 1;">
+  <a href="https://huggingface.co/papers/2509.21710" target="_blank" style="margin: 2px;">
+    <img alt="Paper" src="https://img.shields.io/badge/Paper-2509.21710-b31b1b.svg?logo=arXiv" style="display: inline-block; vertical-align: middle;"/>
   </a>
+  <a href="https://github.com/DataArcTech/RAG-Factory" target="_blank" style="margin: 2px;">
+    <img alt="github" src="https://img.shields.io/github/stars/DataArcTech/RAG-Factory.svg?style=social" style="display: inline-block; vertical-align: middle;"/>
   </a>
+  <a href="https://huggingface.co/datasets/IDEA-FinAI/Golden-Touchstone" target="_blank" style="margin: 2px;">
     <img alt="datasets" src="https://img.shields.io/badge/🤗-Datasets-yellow.svg" style="display: inline-block; vertical-align: middle;"/>
   </a>
   <a href="https://huggingface.co/IDEA-FinAI/TouchstoneGPT-7B-Instruct" target="_blank" style="margin: 2px;">
   </a>
 </div>
+This Hugging Face repository hosts the `TouchstoneGPT-7B-Instruct` model, an instance of a Large Language Model (LLM) based on `Qwen/Qwen2-7B-Instruct`. This model is suitable for integration within the **Think-on-Graph 3.0 (ToG-3)** framework, a novel approach to Retrieval-Augmented Generation (RAG) that enhances LLM reasoning on heterogeneous graphs. The ToG-3 framework is implemented and further detailed in the [RAG-Factory GitHub repository](https://github.com/DataArcTech/RAG-Factory).
+## Paper Abstract: Think-on-Graph 3.0
+Retrieval-Augmented Generation (RAG) and Graph-based RAG has become the important paradigm for enhancing Large Language Models (LLMs) with external knowledge. However, existing approaches face a fundamental trade-off. While graph-based methods are inherently dependent on high-quality graph structures, they face significant practical constraints: manually constructed knowledge graphs are prohibitively expensive to scale, while automatically extracted graphs from corpora are limited by the performance of the underlying LLM extractors, especially when using smaller, local-deployed models. This paper presents Think-on-Graph 3.0 (ToG-3), a novel framework that introduces Multi-Agent Context Evolution and Retrieval (MACER) mechanism to overcome these limitations. Our core innovation is the dynamic construction and refinement of a Chunk-Triplets-Community heterogeneous graph index, which pioneeringly incorporates a dual-evolution mechanism of Evolving Query and Evolving Sub-Graph for precise evidence retrieval. This approach addresses a critical limitation of prior Graph-based RAG methods, which typically construct a static graph index in a single pass without adapting to the actual query. A multi-agent system, comprising Constructor, Retriever, Reflector, and Responser agents, collaboratively engages in an iterative process of evidence retrieval, answer generation, sufficiency reflection, and, crucially, evolving query and subgraph. This dual-evolving multi-agent system allows ToG-3 to adaptively build a targeted graph index during reasoning, mitigating the inherent drawbacks of static, one-time graph construction and enabling deep, precise reasoning even with lightweight LLMs. Extensive experiments demonstrate that ToG-3 outperforms compared baselines on both deep and broad reasoning benchmarks, and ablation studies confirm the efficacy of the components of MACER framework.
+## ✨ Features of RAG-Factory (Think-on-Graph 3.0 Implementation)
+The [RAG-Factory](https://github.com/DataArcTech/RAG-Factory) framework, which implements the concepts of Think-on-Graph 3.0, provides a factory for building advanced RAG pipelines, including:
+- Standard RAG implementations
+- GraphRAG architectures
+- Multi-modal RAG systems
+<div align="center">
+  <img src="https://github.com/DataArcTech/RAG-Factory/blob/main/assets/knowledge_base_screenshot.png?raw=true" alt="Example Knowledge Base Screenshot of RAG-Factory" width="800">
+</div>
+Key features include:
+- Modular design for easy customization
+- Support for various knowledge graph backends
+- Integration with multiple LLM providers
+- Configurable pipeline components
+## Installation (for RAG-Factory)
+To set up the RAG-Factory environment, clone the repository and install dependencies:
+```bash
+pip install -e .
+```
+## Usage (for RAG-Factory)
+You can run predefined RAG pipelines using the `RAG-Factory` framework:
+```bash
+bash run.sh naive_rag/graph_rag/mm_rag
+```
+or
+```bash
+python main.py --config examples/graphrag/config.yaml
+```
+For more examples and detailed configurations, please refer to the `examples/` directory in the [RAG-Factory GitHub repository](https://github.com/DataArcTech/RAG-Factory).
+## Usage of TouchstoneGPT-7B-Instruct
+This `TouchstoneGPT-7B-Instruct` model is a `Qwen2-7B-Instruct`-based LLM that can be used for text generation tasks, either standalone or as a component within RAG frameworks like Think-on-Graph 3.0. Below is a code snippet using the `transformers` library to load the tokenizer and model and generate content.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 )
 tokenizer = AutoTokenizer.from_pretrained("IDEA-FinAI/TouchstoneGPT-7B-Instruct")
+prompt = "What is the sentiment of the following financial post: Positive, Negative, or Neutral?
+sees #Apple at $150/share in a year (+36% from today) on growing services business."
 messages = [
     {"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": prompt}
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Citation
+If you find our work on Think-on-Graph 3.0 useful for your research and applications, please consider citing the paper:
+```bibtex
+@misc{wu2025ToG-3,
+      title={Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval},
+      author={Xiaojun Wu, Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Yuanliang Sun, Hui Xiong, Jia Li, Jian Guo},
+      year={2025},
+      eprint={2509.21710},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2509.21710},
 }
 ```