File size: 4,730 Bytes
c7ce07f
 
447d142
c7ce07f
 
 
 
 
 
480a181
7740927
c7ce07f
 
 
 
 
d9f82e0
c7ce07f
 
 
 
 
 
 
 
 
 
 
 
a50f7ce
c7ce07f
 
 
 
 
 
dc56f4d
c7ce07f
dc56f4d
c7ce07f
dc56f4d
 
 
 
 
 
 
c7ce07f
dc56f4d
c7ce07f
dc56f4d
c7ce07f
447d142
 
 
dc56f4d
 
 
447d142
dc56f4d
c7ce07f
dc56f4d
c7ce07f
 
 
dc56f4d
 
 
c7ce07f
dc56f4d
 
 
 
 
 
 
c7ce07f
dc56f4d
 
 
 
c7ce07f
dc56f4d
c7ce07f
dc56f4d
 
 
c7ce07f
 
dc56f4d
 
c7ce07f
 
dc56f4d
 
 
 
 
a50f7ce
 
dc56f4d
 
 
 
a50f7ce
 
 
dc56f4d
 
 
 
 
 
 
 
 
c7ce07f
 
dc56f4d
 
c7ce07f
dc56f4d
c7ce07f
dc56f4d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
# metadata
title: semmyKG - Knowledge Graph visualiser toolkit (builder from markdown)
emoji: 🕸️
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
python_version: 3.12
#command: python app_gradio_lightrag.py
app_file: app.py    #app_gradio_lightrag.py
hf_oauth: true
oauth_scopes: [read-access]
hf_oauth_scopes: [inference-api]
license: mit
pinned: true
short_description: semmyKG - Knowledge Graph toolkit |
#models: [meta-llama/Llama-4-Maverick-17B-128E-Instruct, openai/gpt-oss-120b, openai/gpt-oss-20b, ]
models: 
  - meta-llama/Llama-4-Maverick-17B-128E-Instruct
  - openai/gpt-oss-120b, openai/gpt-oss-20b
tags: [knowledge graph, markdown, RAG, domain]
#preload_from_hub: [https://huggingface.co/datalab-to/surya_layout, https://huggingface.co/datalab-to/surya_tablerec, huggingface.co/datalab-to/line_detector0, https://huggingface.co/tarun-menta/ocr_error_detection/blob/main/config.json]
owner: research-semmyk
#---
#[Project]
#---

#short_description: PDF & HTML parser to markdown
version: 0.2.8.6
readme: README.md
requires-python: ">=3.12"
#dependencies: []
#---
---

# LightRAG Gradio App

A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using [LightRAG][1]. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF ([GitHub][3] | [HF Space][4]) pipeline generate markdown from documents (pdf, Word, html).

## Features
- LightRAG for Dual-level RAG and knowledge graph (KG)
- Ingest markdown files from a folder (default: `dataset/data/docs`). 
- Query with OpenAI or Ollama backend (user-selectable)
- Visualise KG interactively in-browser
- Deployable to venv, Colab, or HuggingFace Spaces
- Robust, pythonic, modular code (UK English)

## Setup

### 1. Clone and create venv
```bash
git clone https://github.com/semmyk-research/semmyKG
cd semmyKG

uv venv .venv              # ensure you have the uv package
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv pip sync                # or uv pip sync requirements.txt

or 
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
```

### 2. Configure environment
Copy `.env.example` to `.env` and fill in your keys:
```markdown
OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=your-LLM-model-Name 
    ##(in the format: provider/model-identifier)
OPENAI_API_BASE=your-LLM-inference-provider-endpoint 
    ##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint 
    ##(for locally hosted, do not include /embedding)
LLM_MODEL_EMBED=your-embedding-model  ##(in the format: provider/embedding-name)
OLLAMA_HOST=http://localhost:11434
OLLAMA_API_KEY=  ##(include if required)
```  
If .env is not set, you can enter into the web UI directly. <br>
Ditto, override .env by inputting directly in web UI.

### 3. Run the app
```bash
python app_gradio_lightrag.py
```  
For 'faster' development 'debug'

```python
##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
gradio app_gradio_lightrag.py --demo-name=gradio_ui
```

### 4. Colab/Spaces
- For HuggingFace Spaces: ensure all dependencies are in `requirements.txt` and `.env` is set via the web UI or Space secret.
- For Colab: install requirements and run the app cell.

## Usage
- Browse/Select your data folder (default: `dataset/data/docs`)
- Choose LLM backend (OpenAI or Ollama). [fix: GenAI has a bug yieling error: role:'assistant' instead of 'user' when updating history].
- Activate the RAG constructor
- Click 'Index Documents' to build the KG entities
- Click 'Query' to get answers
-- Enter your query and select query mode
- Click 'Show Knowledge Graph' to visualise the KG  

NB: If using HuggingFace, log in first before browsing/selecting/uploading files and setting LLM parameters.

## Notes
- Only markdown files are supported for ingestion (images in `/images` subfolder are ignored for now). <br>NB: other formats will be enabled later: pdf, txt, html...
- To generate markdown from documents (PDf, Word, html), use our ParserPDF tool [GitHub][3] | [HF Space][4].
- All user-facing text is in UK English
- For advanced configuration, see LightRAG documentation

## Roadmap (no defined timeline)
- HuggingFace log in
- [ParserPDF][3] integration

## License
[MIT][2] 

[1]: https://github.com/HKUDS/LightRAG "LightRAG GitHub"
[2]: https://opensource.org/license/mit "MIT License"
[3]: https://github.com/semmyk-research/parserPDF "ParserPDF (GitHub)"
[4]: https://huggingface.co/spaces/semmyk/parserPDF "ParserPDF (HF Space)"