Spaces:
Runtime error
Runtime error
File size: 4,730 Bytes
c7ce07f 447d142 c7ce07f 480a181 7740927 c7ce07f d9f82e0 c7ce07f a50f7ce c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f 447d142 dc56f4d 447d142 dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d a50f7ce dc56f4d a50f7ce dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d c7ce07f dc56f4d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | ---
# metadata
title: semmyKG - Knowledge Graph visualiser toolkit (builder from markdown)
emoji: 🕸️
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
python_version: 3.12
#command: python app_gradio_lightrag.py
app_file: app.py #app_gradio_lightrag.py
hf_oauth: true
oauth_scopes: [read-access]
hf_oauth_scopes: [inference-api]
license: mit
pinned: true
short_description: semmyKG - Knowledge Graph toolkit |
#models: [meta-llama/Llama-4-Maverick-17B-128E-Instruct, openai/gpt-oss-120b, openai/gpt-oss-20b, ]
models:
- meta-llama/Llama-4-Maverick-17B-128E-Instruct
- openai/gpt-oss-120b, openai/gpt-oss-20b
tags: [knowledge graph, markdown, RAG, domain]
#preload_from_hub: [https://huggingface.co/datalab-to/surya_layout, https://huggingface.co/datalab-to/surya_tablerec, huggingface.co/datalab-to/line_detector0, https://huggingface.co/tarun-menta/ocr_error_detection/blob/main/config.json]
owner: research-semmyk
#---
#[Project]
#---
#short_description: PDF & HTML parser to markdown
version: 0.2.8.6
readme: README.md
requires-python: ">=3.12"
#dependencies: []
#---
---
# LightRAG Gradio App
A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using [LightRAG][1]. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF ([GitHub][3] | [HF Space][4]) pipeline generate markdown from documents (pdf, Word, html).
## Features
- LightRAG for Dual-level RAG and knowledge graph (KG)
- Ingest markdown files from a folder (default: `dataset/data/docs`).
- Query with OpenAI or Ollama backend (user-selectable)
- Visualise KG interactively in-browser
- Deployable to venv, Colab, or HuggingFace Spaces
- Robust, pythonic, modular code (UK English)
## Setup
### 1. Clone and create venv
```bash
git clone https://github.com/semmyk-research/semmyKG
cd semmyKG
uv venv .venv # ensure you have the uv package
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip sync # or uv pip sync requirements.txt
or
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
```
### 2. Configure environment
Copy `.env.example` to `.env` and fill in your keys:
```markdown
OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=your-LLM-model-Name
##(in the format: provider/model-identifier)
OPENAI_API_BASE=your-LLM-inference-provider-endpoint
##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint
##(for locally hosted, do not include /embedding)
LLM_MODEL_EMBED=your-embedding-model ##(in the format: provider/embedding-name)
OLLAMA_HOST=http://localhost:11434
OLLAMA_API_KEY= ##(include if required)
```
If .env is not set, you can enter into the web UI directly. <br>
Ditto, override .env by inputting directly in web UI.
### 3. Run the app
```bash
python app_gradio_lightrag.py
```
For 'faster' development 'debug'
```python
##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
gradio app_gradio_lightrag.py --demo-name=gradio_ui
```
### 4. Colab/Spaces
- For HuggingFace Spaces: ensure all dependencies are in `requirements.txt` and `.env` is set via the web UI or Space secret.
- For Colab: install requirements and run the app cell.
## Usage
- Browse/Select your data folder (default: `dataset/data/docs`)
- Choose LLM backend (OpenAI or Ollama). [fix: GenAI has a bug yieling error: role:'assistant' instead of 'user' when updating history].
- Activate the RAG constructor
- Click 'Index Documents' to build the KG entities
- Click 'Query' to get answers
-- Enter your query and select query mode
- Click 'Show Knowledge Graph' to visualise the KG
NB: If using HuggingFace, log in first before browsing/selecting/uploading files and setting LLM parameters.
## Notes
- Only markdown files are supported for ingestion (images in `/images` subfolder are ignored for now). <br>NB: other formats will be enabled later: pdf, txt, html...
- To generate markdown from documents (PDf, Word, html), use our ParserPDF tool [GitHub][3] | [HF Space][4].
- All user-facing text is in UK English
- For advanced configuration, see LightRAG documentation
## Roadmap (no defined timeline)
- HuggingFace log in
- [ParserPDF][3] integration
## License
[MIT][2]
[1]: https://github.com/HKUDS/LightRAG "LightRAG GitHub"
[2]: https://opensource.org/license/mit "MIT License"
[3]: https://github.com/semmyk-research/parserPDF "ParserPDF (GitHub)"
[4]: https://huggingface.co/spaces/semmyk/parserPDF "ParserPDF (HF Space)" |