Spaces:

build-small-hackathon
/

KnowledgeMesh

Running on Zero

App Files Files Community

KnowledgeMesh / README.md

pkheria

Update README.md

69a2232 verified 12 days ago

preview code

Raw

History Blame Contribute Delete

3.64 kB

	---
	title: BuildSmall KnowledgeHub
	emoji: 📚
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	app_file: app.py
	pinned: true
	license: mit
	short_description: AI knowledge hub for groups, powered by Nvidia

	tags:
	- track:backyard
	- sponsor:openai
	- sponsor:nvidia
	- achievement:offbrand
	- achievement:sharing
	- achievement:fieldnotes

	---

	# BuildSmall KnowledgeHub - https://huggingface.co/pkheria

	BuildSmall KnowledgeHub is a modular Gradio app for loading knowledge from:

	- Medium article links through Freedium
	- arXiv links or IDs
	- PDF documents

	It extracts text, captures Medium image references/captions when available, chunks the content, embeds chunks locally with the configured NVIDIA Nemotron embedding model, uploads vectors into Qdrant, and generates grounded answers with NVIDIA's OpenAI-compatible chat API.

	## 🔗 Resources & Links

	- Demo Video: [Watch the Product Demo]([YOUR_DEMO_VIDEO_LINK_HERE](https://youtu.be/aDlKNW10pnw))
	- Blog Post: [Read the Full Write-up](https://huggingface.co/blog/pkheria/knowledgemesh)
	- Social Post : [Linkedin Post](https://www.linkedin.com/posts/piyushkheria7_buildsmall-generativeai-rag-share-7472326307721437184-pFrz/)
	## NVIDIA Usage

	This project explicitly uses NVIDIA in two places:

	- Local retrieval embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2`
	- NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2`

	The chat client calls:

	```text
	https://integrate.api.nvidia.com/v1
	```

	You must provide `NVIDIA_API_KEY` as a Hugging Face Space secret or in your local `.env`.

	## Hugging Face Spaces Deployment

	For ZeroGPU Spaces, add these Space variables:

	```bash
	ENABLE_ZEROGPU=true
	EMBEDDING_DEVICE=cuda
	ZEROGPU_DURATION_SECONDS=180
	```

	For local Apple Silicon development, keep:

	```bash
	EMBEDDING_DEVICE=cpu
	```

	The Gradio ingest, search, and answer callbacks are decorated with `spaces.GPU` when running on Hugging Face Spaces. Locally, the decorator becomes a no-op.

	## Hugging Face Secrets

	Add these in your Space settings under Settings → Variables and secrets.

	Required secrets:

	```bash
	NVIDIA_API_KEY=<your-nvidia-api-key>
	QDRANT_URL=<your-qdrant-url>
	QDRANT_API_KEY=<your-qdrant-api-key>
	```

	Optional variables:

	```bash
	QDRANT_COLLECTION_NAME=knowledge_base
	NVIDIA_API_URL=https://integrate.api.nvidia.com/v1
	NVIDIA_CHAT_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
	NEMOTRON_EMBED_MODEL=nvidia/llama-nemotron-colembed-vl-3b-v2
	NEMOTRON_PARSE_MODEL=Qwen/Qwen2-VL-2B-Instruct
	HF_TOKEN=<token-if-needed-for-gated-model-downloads>
	```

	Use a hosted Qdrant instance for Hugging Face Spaces. `localhost:6333` only works for local development.

	## Qdrant Collection Name

	The Ingest and Retrieve tabs each have their own collection-name field. Set both to the same Qdrant collection when you want to search what you just ingested. The fields are intentionally not auto-synced because auto-sync can cause continuous refreshes in hosted Gradio Spaces.

	## Setup

	```bash
	python3 -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	cp .env.example .env
	```

	Add `NVIDIA_API_KEY` to `.env` for chat completions. Start Qdrant locally or point `QDRANT_URL` to your hosted instance.

	The default model split is:

	- Local parsing model: `Qwen/Qwen2-VL-2B-Instruct`
	- Local embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2`
	- NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2`

	## Run

	```bash
	python app.py
	```

	Open the local Gradio URL printed in the terminal, usually `http://127.0.0.1:7860`.

	The app binds to `0.0.0.0:7860`, which is suitable for Hugging Face Spaces and container deployments.