Spaces:
Running
Running
| # TextGraphs: raw texts, LLMs, and KGs, oh my! | |
| <img src="assets/logo.png" width="113" alt="illustration of a lemma graph"/> | |
| Welcome to the **TextGraphs** library... | |
| - demo: <https://huggingface.co/spaces/DerwenAI/textgraphs> | |
| - code: <https://github.com/DerwenAI/textgraphs> | |
| - biblio: <https://derwen.ai/docs/txg/biblio> | |
| - DOI: 10.5281/zenodo.10431783 | |
| ## Overview | |
| _Explore uses of large language models (LLMs) in semi-automated knowledge graph (KG) construction from unstructured text sources, with human-in-the-loop (HITL) affordances to incorporate guidance from domain experts._ | |
| What is "generative AI" in the context of working with knowledge graphs? | |
| Initial attempts tend to fit a simple pattern based on _prompt engineering_: present text sources to a LLM-based chat interface, asking to generate an entire graph. | |
| This is generally expensive and results are often poor. | |
| Moreover, the lack of controls or curation in this approach represents a serious disconnect with how KGs get curated to represent an organization's domain expertise. | |
| Can the definition of "generative" be reformulated for KGs? | |
| Instead of trying to use a fully-automated "black box", what if it were possible to generate _composable elements_ which then get aggregated into a KG? | |
| Some research in topological analysis of graphs indicates potential ways to decompose graphs, which can then be re-composed probabilistically. | |
| While the mathematics may be sound, these techniques need to be understood in the context of a full range of tasks within KG-construction workflows to assess how they can apply for real-world graph data. | |
| This project explores the use of LLM-augmented components within natural language workflows, focusing on small well-defined tasks within the scope of KG construction. | |
| To address challenges in this problem, this project considers improved means of tokenization, for handling input. | |
| In addition, a range of methods are considered for filtering and selecting elements of the output stream, re-composing them into KGs. | |
| This has a side-effect of providing steps toward better pattern identification and variable abstraction layers for graph data, for _graph levels of detail_ (GLOD). | |
| Many papers aim to evaluate benchmarks, in contrast this line of inquiry focuses on integration: | |
| means of combining multiple complementary research projects; | |
| how to evaluate the outcomes of other projects to assess their potential usefulness in production-quality libraries; | |
| and suggested directions for improving the LLM-based components of NLP workflows used to construct KGs. | |
| ## Index Terms | |
| _natural language processing_, | |
| _knowledge graph construction_, | |
| _large language models_, | |
| _entity extraction_, | |
| _entity linking_, | |
| _relation extraction_, | |
| _semantic random walk_, | |
| _human-in-the-loop_, | |
| _topological decomposition of graphs_, | |
| _graph levels of detail_, | |
| _network motifs_, | |