---
# metadata
title: semmyKG - Knowledge Graph visualiser toolkit (builder from markdown)
emoji: 🕸️
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
python_version: 3.12
#command: python app_gradio_lightrag.py
app_file: app.py    #app_gradio_lightrag.py
hf_oauth: true
oauth_scopes: [read-access]
hf_oauth_scopes: [inference-api]
license: mit
pinned: true
short_description: semmyKG - Knowledge Graph toolkit |
#models: [meta-llama/Llama-4-Maverick-17B-128E-Instruct, openai/gpt-oss-120b, openai/gpt-oss-20b, ]
models: 
  - meta-llama/Llama-4-Maverick-17B-128E-Instruct
  - openai/gpt-oss-120b, openai/gpt-oss-20b
tags: [knowledge graph, markdown, RAG, domain]
#preload_from_hub: [https://huggingface.co/datalab-to/surya_layout, https://huggingface.co/datalab-to/surya_tablerec, huggingface.co/datalab-to/line_detector0, https://huggingface.co/tarun-menta/ocr_error_detection/blob/main/config.json]
owner: research-semmyk
#---
#[Project]
#---

#short_description: PDF & HTML parser to markdown
version: 0.2.8.6
readme: README.md
requires-python: ">=3.12"
#dependencies: []
#---
---

# LightRAG Gradio App

A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using [LightRAG][1]. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF ([GitHub][3] | [HF Space][4]) pipeline generate markdown from documents (pdf, Word, html).

## Features
- LightRAG for Dual-level RAG and knowledge graph (KG)
- Ingest markdown files from a folder (default: `dataset/data/docs`). 
- Query with OpenAI or Ollama backend (user-selectable)
- Visualise KG interactively in-browser
- Deployable to venv, Colab, or HuggingFace Spaces
- Robust, pythonic, modular code (UK English)

## Setup

### 1. Clone and create venv
```bash
git clone https://github.com/semmyk-research/semmyKG
cd semmyKG

uv venv .venv              # ensure you have the uv package
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv pip sync                # or uv pip sync requirements.txt

or 
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
```

### 2. Configure environment
Copy `.env.example` to `.env` and fill in your keys:
```markdown
OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=your-LLM-model-Name 
    ##(in the format: provider/model-identifier)
OPENAI_API_BASE=your-LLM-inference-provider-endpoint 
    ##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint 
    ##(for locally hosted, do not include /embedding)
LLM_MODEL_EMBED=your-embedding-model  ##(in the format: provider/embedding-name)
OLLAMA_HOST=http://localhost:11434
OLLAMA_API_KEY=  ##(include if required)
```  
If .env is not set, you can enter into the web UI directly. <br>
Ditto, override .env by inputting directly in web UI.

### 3. Run the app
```bash
python app_gradio_lightrag.py
```  
For 'faster' development 'debug'

```python
##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
gradio app_gradio_lightrag.py --demo-name=gradio_ui
```

### 4. Colab/Spaces
- For HuggingFace Spaces: ensure all dependencies are in `requirements.txt` and `.env` is set via the web UI or Space secret.
- For Colab: install requirements and run the app cell.

## Usage
- Browse/Select your data folder (default: `dataset/data/docs`)
- Choose LLM backend (OpenAI or Ollama). [fix: GenAI has a bug yieling error: role:'assistant' instead of 'user' when updating history].
- Activate the RAG constructor
- Click 'Index Documents' to build the KG entities
- Click 'Query' to get answers
-- Enter your query and select query mode
- Click 'Show Knowledge Graph' to visualise the KG  

NB: If using HuggingFace, log in first before browsing/selecting/uploading files and setting LLM parameters.

## Notes
- Only markdown files are supported for ingestion (images in `/images` subfolder are ignored for now). <br>NB: other formats will be enabled later: pdf, txt, html...
- To generate markdown from documents (PDf, Word, html), use our ParserPDF tool [GitHub][3] | [HF Space][4].
- All user-facing text is in UK English
- For advanced configuration, see LightRAG documentation

## Roadmap (no defined timeline)
- HuggingFace log in
- [ParserPDF][3] integration

## License
[MIT][2] 

[1]: https://github.com/HKUDS/LightRAG "LightRAG GitHub"
[2]: https://opensource.org/license/mit "MIT License"
[3]: https://github.com/semmyk-research/parserPDF "ParserPDF (GitHub)"
[4]: https://huggingface.co/spaces/semmyk/parserPDF "ParserPDF (HF Space)"