Hengzongshu
/

ArticleAgent

Text Generation

academic-knowledge-extraction

concept-path-mining

innovation-detection

text-generation-inference

Model card Files Files and versions

ArticleAgent / README.md

Hengzongshu's picture

Update README.md

290f1b0 verified 4 months ago

|

history blame contribute delete

2.2 kB

	---
	license: mit
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-1.5B-Instruct
	library_name: transformers
	tags:
	- academic-knowledge-extraction
	- concept-path-mining
	- innovation-detection
	- nsu-research
	datasets:
	- Hengzongshu/ArticleAgent
	---

	# ArticleAgent: Constraint-Driven Qwen2.5-1.5B for Academic Concept Path Extraction

	This repository hosts ArticleAgent, a fine-tuned Qwen2.5-1.5B-Instruct model designed to extract structured concept paths from academic paper abstracts. The model is part of the research presented in:

	> Constraint-Driven Small Language Models Based on Agent and OpenAlex Knowledge Graph: Mining Conceptual Pathways and Discovering Innovation Points in Academic Papers
	> Ziye Xia, Sergei S. Ospichev (2025)

	The system leverages a four-stage agent framework grounded in the OpenAlex knowledge graph, combining prompt engineering, knowledge constraints, and human-in-the-loop validation to achieve high-precision concept extraction and novelty detection.

	## 🔍 Key Features

	- Extracts structured concept paths (e.g., `Physics → Condensed Matter → Superconductivity`)
	- Identifies innovation points based on rare structural combinations of mainstream concepts
	- Integrates OpenAlex concept taxonomy as external knowledge constraint
	- Trained on 7,960 papers from Novosibirsk State University (NSU)
	- Achieves 97.24% precision and 91.46% F1-score in end-to-end concept path extraction

	## 🚀 Usage

	You can load the model directly using Hugging Face Transformers:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "Hengzongshu/ArticleAgent"

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype="bfloat16",
	trust_remote_code=True
	)

	# Example input (Stage 2: Concept Pair Extraction)
	input_text = """<research_methods>... your abstract segment ...</research_methods>"""
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))