Spaces:
Running
on
Zero
Running
on
Zero
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Application Overview | |
| This is a Text2Graph application that extracts knowledge graphs from natural language text. It's a Gradio web app that uses either OpenAI GPT-4.1-mini via Azure or Phi-3-mini-128k-instruct-graph via Hugging Face to extract entities and relationships from text, then visualizes them as interactive graphs. | |
| ## Architecture | |
| - **app.py**: Main Gradio application with UI components, visualization logic, and caching | |
| - **llm_graph.py**: Core LLMGraph class that handles model selection and knowledge graph extraction | |
| - **cache/**: Directory for caching visualization data (first example is pre-cached for performance) | |
| ## Key Components | |
| ### LLMGraph Class (llm_graph.py) | |
| - Supports two model backends: Azure OpenAI (GPT-4.1-mini) and Hugging Face (Phi-3-mini-128k-instruct-graph) | |
| - Uses LightRAG for Azure OpenAI integration | |
| - Direct inference API calls for Hugging Face models | |
| - Extracts structured JSON with nodes (entities) and edges (relationships) | |
| ### Visualization Pipeline (app.py) | |
| - Entity recognition visualization using spaCy's displacy | |
| - Interactive knowledge graph using pyvis and NetworkX | |
| - Caching system for performance optimization | |
| - Color-coded entity types with random light colors | |
| ## Environment Setup | |
| Required environment variables: | |
| ``` | |
| HF_TOKEN=<huggingface_token> | |
| HF_API_ENDPOINT=<huggingface_inference_endpoint> | |
| AZURE_OPENAI_API_KEY=<azure_openai_key> | |
| AZURE_OPENAI_ENDPOINT=<azure_endpoint> | |
| AZURE_OPENAI_API_VERSION=<api_version> | |
| AZURE_OPENAI_DEPLOYMENT=<deployment_name> | |
| AZURE_EMBEDDING_DEPLOYMENT=<embedding_deployment> | |
| AZURE_EMBEDDING_API_VERSION=<embedding_api_version> | |
| ``` | |
| ## Running the Application | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run the Gradio app | |
| python app.py | |
| ``` | |
| ## Key Dependencies | |
| - **gradio**: Web interface framework | |
| - **lightrag-hku**: RAG framework for Azure OpenAI integration | |
| - **transformers**: Hugging Face model integration | |
| - **pyvis**: Interactive network visualization | |
| - **networkx**: Graph data structure and algorithms | |
| - **spacy**: Natural language processing and entity visualization | |
| - **openai**: Azure OpenAI client | |
| ## Data Flow | |
| 1. User inputs text and selects model | |
| 2. LLMGraph.extract() processes text using selected model backend | |
| 3. JSON response contains nodes (entities) and edges (relationships) | |
| 4. Visualization functions create entity highlighting and interactive graph | |
| 5. Results cached for performance (first example only) | |
| ## Model Behavior | |
| The application expects JSON output with this schema: | |
| ```json | |
| { | |
| "nodes": [{"id": "entity", "type": "broad_type", "detailed_type": "specific_type"}], | |
| "edges": [{"from": "entity1", "to": "entity2", "label": "relationship"}] | |
| } | |
| ``` |