KnowledgeMesh / README.md
pkheria's picture
Update README.md
69a2232 verified
|
Raw
History Blame Contribute Delete
3.64 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: BuildSmall KnowledgeHub
emoji: πŸ“š
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: true
license: mit
short_description: AI knowledge hub for groups, powered by Nvidia
tags:
  - track:backyard
  - sponsor:openai
  - sponsor:nvidia
  - achievement:offbrand
  - achievement:sharing
  - achievement:fieldnotes

BuildSmall KnowledgeHub - https://huggingface.co/pkheria

BuildSmall KnowledgeHub is a modular Gradio app for loading knowledge from:

  • Medium article links through Freedium
  • arXiv links or IDs
  • PDF documents

It extracts text, captures Medium image references/captions when available, chunks the content, embeds chunks locally with the configured NVIDIA Nemotron embedding model, uploads vectors into Qdrant, and generates grounded answers with NVIDIA's OpenAI-compatible chat API.

πŸ”— Resources & Links

NVIDIA Usage

This project explicitly uses NVIDIA in two places:

  • Local retrieval embedding model: nvidia/llama-nemotron-colembed-vl-3b-v2
  • NVIDIA API chat model: nvidia/nvidia-nemotron-nano-9b-v2

The chat client calls:

https://integrate.api.nvidia.com/v1

You must provide NVIDIA_API_KEY as a Hugging Face Space secret or in your local .env.

Hugging Face Spaces Deployment

For ZeroGPU Spaces, add these Space variables:

ENABLE_ZEROGPU=true
EMBEDDING_DEVICE=cuda
ZEROGPU_DURATION_SECONDS=180

For local Apple Silicon development, keep:

EMBEDDING_DEVICE=cpu

The Gradio ingest, search, and answer callbacks are decorated with spaces.GPU when running on Hugging Face Spaces. Locally, the decorator becomes a no-op.

Hugging Face Secrets

Add these in your Space settings under Settings β†’ Variables and secrets.

Required secrets:

NVIDIA_API_KEY=<your-nvidia-api-key>
QDRANT_URL=<your-qdrant-url>
QDRANT_API_KEY=<your-qdrant-api-key>

Optional variables:

QDRANT_COLLECTION_NAME=knowledge_base
NVIDIA_API_URL=https://integrate.api.nvidia.com/v1
NVIDIA_CHAT_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_EMBED_MODEL=nvidia/llama-nemotron-colembed-vl-3b-v2
NEMOTRON_PARSE_MODEL=Qwen/Qwen2-VL-2B-Instruct
HF_TOKEN=<token-if-needed-for-gated-model-downloads>

Use a hosted Qdrant instance for Hugging Face Spaces. localhost:6333 only works for local development.

Qdrant Collection Name

The Ingest and Retrieve tabs each have their own collection-name field. Set both to the same Qdrant collection when you want to search what you just ingested. The fields are intentionally not auto-synced because auto-sync can cause continuous refreshes in hosted Gradio Spaces.

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env

Add NVIDIA_API_KEY to .env for chat completions. Start Qdrant locally or point QDRANT_URL to your hosted instance.

The default model split is:

  • Local parsing model: Qwen/Qwen2-VL-2B-Instruct
  • Local embedding model: nvidia/llama-nemotron-colembed-vl-3b-v2
  • NVIDIA API chat model: nvidia/nvidia-nemotron-nano-9b-v2

Run

python app.py

Open the local Gradio URL printed in the terminal, usually http://127.0.0.1:7860.

The app binds to 0.0.0.0:7860, which is suitable for Hugging Face Spaces and container deployments.