Spaces:

semmyk
/

semmyKG

Runtime error

semmyKG / README.md

v0.2.8.6: Baseline 03 - fix require einops for nomic - update Gradio embed components, add queue - update README - attempt fixing GenAI role:assistant with modify_history_in_place()

a50f7ce 6 months ago

preview code

raw

history blame contribute delete

4.73 kB

	---
	# metadata
	title: semmyKG - Knowledge Graph visualiser toolkit (builder from markdown)
	emoji: 🕸️
	colorFrom: yellow
	colorTo: purple
	sdk: gradio
	sdk_version: 5.44.1
	python_version: 3.12
	#command: python app_gradio_lightrag.py
	app_file: app.py #app_gradio_lightrag.py
	hf_oauth: true
	oauth_scopes: [read-access]
	hf_oauth_scopes: [inference-api]
	license: mit
	pinned: true
	short_description: semmyKG - Knowledge Graph toolkit \|
	#models: [meta-llama/Llama-4-Maverick-17B-128E-Instruct, openai/gpt-oss-120b, openai/gpt-oss-20b, ]
	models:
	- meta-llama/Llama-4-Maverick-17B-128E-Instruct
	- openai/gpt-oss-120b, openai/gpt-oss-20b
	tags: [knowledge graph, markdown, RAG, domain]
	#preload_from_hub: [https://huggingface.co/datalab-to/surya_layout, https://huggingface.co/datalab-to/surya_tablerec, huggingface.co/datalab-to/line_detector0, https://huggingface.co/tarun-menta/ocr_error_detection/blob/main/config.json]
	owner: research-semmyk
	#---
	#[Project]
	#---

	#short_description: PDF & HTML parser to markdown
	version: 0.2.8.6
	readme: README.md
	requires-python: ">=3.12"
	#dependencies: []
	#---
	---

	# LightRAG Gradio App

	A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using [LightRAG][1]. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF ([GitHub][3] \| [HF Space][4]) pipeline generate markdown from documents (pdf, Word, html).

	## Features
	- LightRAG for Dual-level RAG and knowledge graph (KG)
	- Ingest markdown files from a folder (default: `dataset/data/docs`).
	- Query with OpenAI or Ollama backend (user-selectable)
	- Visualise KG interactively in-browser
	- Deployable to venv, Colab, or HuggingFace Spaces
	- Robust, pythonic, modular code (UK English)

	## Setup

	### 1. Clone and create venv
	```bash
	git clone https://github.com/semmyk-research/semmyKG
	cd semmyKG

	uv venv .venv # ensure you have the uv package
	source .venv/bin/activate # or .venv\Scripts\activate on Windows
	uv pip sync # or uv pip sync requirements.txt

	or
	python -m venv .venv
	source .venv/bin/activate # or .venv\Scripts\activate on Windows
	pip install -r requirements.txt
	```

	### 2. Configure environment
	Copy `.env.example` to `.env` and fill in your keys:
	```markdown
	OPENAI_API_KEY=your-openai-api-key
	LLM_MODEL=your-LLM-model-Name
	##(in the format: provider/model-identifier)
	OPENAI_API_BASE=your-LLM-inference-provider-endpoint
	##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
	OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint
	##(for locally hosted, do not include /embedding)
	LLM_MODEL_EMBED=your-embedding-model ##(in the format: provider/embedding-name)
	OLLAMA_HOST=http://localhost:11434
	OLLAMA_API_KEY= ##(include if required)
	```
	If .env is not set, you can enter into the web UI directly. <br>
	Ditto, override .env by inputting directly in web UI.

	### 3. Run the app
	```bash
	python app_gradio_lightrag.py
	```
	For 'faster' development 'debug'

	```python
	##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
	gradio app_gradio_lightrag.py --demo-name=gradio_ui
	```

	### 4. Colab/Spaces
	- For HuggingFace Spaces: ensure all dependencies are in `requirements.txt` and `.env` is set via the web UI or Space secret.
	- For Colab: install requirements and run the app cell.

	## Usage
	- Browse/Select your data folder (default: `dataset/data/docs`)
	- Choose LLM backend (OpenAI or Ollama). [fix: GenAI has a bug yieling error: role:'assistant' instead of 'user' when updating history].
	- Activate the RAG constructor
	- Click 'Index Documents' to build the KG entities
	- Click 'Query' to get answers
	-- Enter your query and select query mode
	- Click 'Show Knowledge Graph' to visualise the KG

	NB: If using HuggingFace, log in first before browsing/selecting/uploading files and setting LLM parameters.

	## Notes
	- Only markdown files are supported for ingestion (images in `/images` subfolder are ignored for now). <br>NB: other formats will be enabled later: pdf, txt, html...
	- To generate markdown from documents (PDf, Word, html), use our ParserPDF tool [GitHub][3] \| [HF Space][4].
	- All user-facing text is in UK English
	- For advanced configuration, see LightRAG documentation

	## Roadmap (no defined timeline)
	- HuggingFace log in
	- [ParserPDF][3] integration

	## License
	[MIT][2]

	[1]: https://github.com/HKUDS/LightRAG "LightRAG GitHub"
	[2]: https://opensource.org/license/mit "MIT License"
	[3]: https://github.com/semmyk-research/parserPDF "ParserPDF (GitHub)"
	[4]: https://huggingface.co/spaces/semmyk/parserPDF "ParserPDF (HF Space)"