File size: 3,639 Bytes
236ae36
 
 
b80dca8
 
236ae36
 
7dcc090
236ae36
c9b7fb8
c84ee79
8e21552
c84ee79
 
 
 
 
 
8e21552
236ae36
 
69a2232
236ae36
 
b5e0c74
9707a84
b5e0c74
 
 
9707a84
236ae36
8e21552
 
 
c97012e
 
236ae36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b5e0c74
33c0d0b
 
 
 
b5e0c74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: BuildSmall KnowledgeHub
emoji: πŸ“š
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: true
license: mit
short_description: AI knowledge hub for groups, powered by Nvidia

tags:
  - track:backyard
  - sponsor:openai
  - sponsor:nvidia
  - achievement:offbrand
  - achievement:sharing
  - achievement:fieldnotes

---

# BuildSmall KnowledgeHub - https://huggingface.co/pkheria

BuildSmall KnowledgeHub is a modular Gradio app for loading knowledge from:

- Medium article links through Freedium
- arXiv links or IDs
- PDF documents

It extracts text, captures Medium image references/captions when available, chunks the content, embeds chunks locally with the configured NVIDIA Nemotron embedding model, uploads vectors into Qdrant, and generates grounded answers with NVIDIA's OpenAI-compatible chat API.

## πŸ”— Resources & Links

- **Demo Video:** [Watch the Product Demo]([YOUR_DEMO_VIDEO_LINK_HERE](https://youtu.be/aDlKNW10pnw))
- **Blog Post:** [Read the Full Write-up](https://huggingface.co/blog/pkheria/knowledgemesh)
- **Social Post :** [Linkedin Post](https://www.linkedin.com/posts/piyushkheria7_buildsmall-generativeai-rag-share-7472326307721437184-pFrz/)
## NVIDIA Usage

This project explicitly uses NVIDIA in two places:

- Local retrieval embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2`
- NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2`

The chat client calls:

```text
https://integrate.api.nvidia.com/v1
```

You must provide `NVIDIA_API_KEY` as a Hugging Face Space secret or in your local `.env`.

## Hugging Face Spaces Deployment

For ZeroGPU Spaces, add these Space variables:

```bash
ENABLE_ZEROGPU=true
EMBEDDING_DEVICE=cuda
ZEROGPU_DURATION_SECONDS=180
```

For local Apple Silicon development, keep:

```bash
EMBEDDING_DEVICE=cpu
```

The Gradio ingest, search, and answer callbacks are decorated with `spaces.GPU` when running on Hugging Face Spaces. Locally, the decorator becomes a no-op.

## Hugging Face Secrets

Add these in your Space settings under **Settings β†’ Variables and secrets**.

Required secrets:

```bash
NVIDIA_API_KEY=<your-nvidia-api-key>
QDRANT_URL=<your-qdrant-url>
QDRANT_API_KEY=<your-qdrant-api-key>
```

Optional variables:

```bash
QDRANT_COLLECTION_NAME=knowledge_base
NVIDIA_API_URL=https://integrate.api.nvidia.com/v1
NVIDIA_CHAT_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_EMBED_MODEL=nvidia/llama-nemotron-colembed-vl-3b-v2
NEMOTRON_PARSE_MODEL=Qwen/Qwen2-VL-2B-Instruct
HF_TOKEN=<token-if-needed-for-gated-model-downloads>
```

Use a hosted Qdrant instance for Hugging Face Spaces. `localhost:6333` only works for local development.

## Qdrant Collection Name

The Ingest and Retrieve tabs each have their own collection-name field. Set both to the same Qdrant collection when you want to search what you just ingested. The fields are intentionally not auto-synced because auto-sync can cause continuous refreshes in hosted Gradio Spaces.

## Setup

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
```

Add `NVIDIA_API_KEY` to `.env` for chat completions. Start Qdrant locally or point `QDRANT_URL` to your hosted instance.

The default model split is:

- Local parsing model: `Qwen/Qwen2-VL-2B-Instruct`
- Local embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2`
- NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2`

## Run

```bash
python app.py
```

Open the local Gradio URL printed in the terminal, usually `http://127.0.0.1:7860`.

The app binds to `0.0.0.0:7860`, which is suitable for Hugging Face Spaces and container deployments.