--- license: mit language: - en base_model: - Qwen/Qwen2.5-1.5B-Instruct library_name: transformers tags: - academic-knowledge-extraction - concept-path-mining - innovation-detection - nsu-research datasets: - Hengzongshu/ArticleAgent --- # ArticleAgent: Constraint-Driven Qwen2.5-1.5B for Academic Concept Path Extraction This repository hosts **ArticleAgent**, a fine-tuned **Qwen2.5-1.5B-Instruct** model designed to extract structured **concept paths** from academic paper abstracts. The model is part of the research presented in: > **Constraint-Driven Small Language Models Based on Agent and OpenAlex Knowledge Graph: Mining Conceptual Pathways and Discovering Innovation Points in Academic Papers** > Ziye Xia, Sergei S. Ospichev (2025) The system leverages a **four-stage agent framework** grounded in the **OpenAlex knowledge graph**, combining prompt engineering, knowledge constraints, and human-in-the-loop validation to achieve high-precision concept extraction and novelty detection. ## 🔍 Key Features - Extracts **structured concept paths** (e.g., `Physics → Condensed Matter → Superconductivity`) - Identifies **innovation points** based on rare structural combinations of mainstream concepts - Integrates **OpenAlex concept taxonomy** as external knowledge constraint - Trained on **7,960 papers** from Novosibirsk State University (NSU) - Achieves **97.24% precision** and **91.46% F1-score** in end-to-end concept path extraction ## 🚀 Usage You can load the model directly using Hugging Face Transformers: ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "Hengzongshu/ArticleAgent" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype="bfloat16", trust_remote_code=True ) # Example input (Stage 2: Concept Pair Extraction) input_text = """... your abstract segment ...""" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))