phi-knowledge-graph

Running on Zero

App Files Files Community

phi-knowledge-graph / CLAUDE.md

vietexob

Adding LightRAG KG

5bfc72c 3 months ago

preview code

raw

history blame

2.82 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Application Overview

	This is a Text2Graph application that extracts knowledge graphs from natural language text. It's a Gradio web app that uses either OpenAI GPT-4.1-mini via Azure or Phi-3-mini-128k-instruct-graph via Hugging Face to extract entities and relationships from text, then visualizes them as interactive graphs.

	## Architecture

	- app.py: Main Gradio application with UI components, visualization logic, and caching
	- llm_graph.py: Core LLMGraph class that handles model selection and knowledge graph extraction
	- cache/: Directory for caching visualization data (first example is pre-cached for performance)

	## Key Components

	### LLMGraph Class (llm_graph.py)
	- Supports two model backends: Azure OpenAI (GPT-4.1-mini) and Hugging Face (Phi-3-mini-128k-instruct-graph)
	- Uses LightRAG for Azure OpenAI integration
	- Direct inference API calls for Hugging Face models
	- Extracts structured JSON with nodes (entities) and edges (relationships)

	### Visualization Pipeline (app.py)
	- Entity recognition visualization using spaCy's displacy
	- Interactive knowledge graph using pyvis and NetworkX
	- Caching system for performance optimization
	- Color-coded entity types with random light colors

	## Environment Setup

	Required environment variables:
	```
	HF_TOKEN=<huggingface_token>
	HF_API_ENDPOINT=<huggingface_inference_endpoint>
	AZURE_OPENAI_API_KEY=<azure_openai_key>
	AZURE_OPENAI_ENDPOINT=<azure_endpoint>
	AZURE_OPENAI_API_VERSION=<api_version>
	AZURE_OPENAI_DEPLOYMENT=<deployment_name>
	AZURE_EMBEDDING_DEPLOYMENT=<embedding_deployment>
	AZURE_EMBEDDING_API_VERSION=<embedding_api_version>
	```

	## Running the Application

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run the Gradio app
	python app.py
	```

	## Key Dependencies

	- gradio: Web interface framework
	- lightrag-hku: RAG framework for Azure OpenAI integration
	- transformers: Hugging Face model integration
	- pyvis: Interactive network visualization
	- networkx: Graph data structure and algorithms
	- spacy: Natural language processing and entity visualization
	- openai: Azure OpenAI client

	## Data Flow

	1. User inputs text and selects model
	2. LLMGraph.extract() processes text using selected model backend
	3. JSON response contains nodes (entities) and edges (relationships)
	4. Visualization functions create entity highlighting and interactive graph
	5. Results cached for performance (first example only)

	## Model Behavior

	The application expects JSON output with this schema:
	```json
	{
	"nodes": [{"id": "entity", "type": "broad_type", "detailed_type": "specific_type"}],
	"edges": [{"from": "entity1", "to": "entity2", "label": "relationship"}]
	}
	```