Spaces:
Running on Zero
Running on Zero
File size: 3,639 Bytes
236ae36 b80dca8 236ae36 7dcc090 236ae36 c9b7fb8 c84ee79 8e21552 c84ee79 8e21552 236ae36 69a2232 236ae36 b5e0c74 9707a84 b5e0c74 9707a84 236ae36 8e21552 c97012e 236ae36 b5e0c74 33c0d0b b5e0c74 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | ---
title: BuildSmall KnowledgeHub
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: true
license: mit
short_description: AI knowledge hub for groups, powered by Nvidia
tags:
- track:backyard
- sponsor:openai
- sponsor:nvidia
- achievement:offbrand
- achievement:sharing
- achievement:fieldnotes
---
# BuildSmall KnowledgeHub - https://huggingface.co/pkheria
BuildSmall KnowledgeHub is a modular Gradio app for loading knowledge from:
- Medium article links through Freedium
- arXiv links or IDs
- PDF documents
It extracts text, captures Medium image references/captions when available, chunks the content, embeds chunks locally with the configured NVIDIA Nemotron embedding model, uploads vectors into Qdrant, and generates grounded answers with NVIDIA's OpenAI-compatible chat API.
## π Resources & Links
- **Demo Video:** [Watch the Product Demo]([YOUR_DEMO_VIDEO_LINK_HERE](https://youtu.be/aDlKNW10pnw))
- **Blog Post:** [Read the Full Write-up](https://huggingface.co/blog/pkheria/knowledgemesh)
- **Social Post :** [Linkedin Post](https://www.linkedin.com/posts/piyushkheria7_buildsmall-generativeai-rag-share-7472326307721437184-pFrz/)
## NVIDIA Usage
This project explicitly uses NVIDIA in two places:
- Local retrieval embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2`
- NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2`
The chat client calls:
```text
https://integrate.api.nvidia.com/v1
```
You must provide `NVIDIA_API_KEY` as a Hugging Face Space secret or in your local `.env`.
## Hugging Face Spaces Deployment
For ZeroGPU Spaces, add these Space variables:
```bash
ENABLE_ZEROGPU=true
EMBEDDING_DEVICE=cuda
ZEROGPU_DURATION_SECONDS=180
```
For local Apple Silicon development, keep:
```bash
EMBEDDING_DEVICE=cpu
```
The Gradio ingest, search, and answer callbacks are decorated with `spaces.GPU` when running on Hugging Face Spaces. Locally, the decorator becomes a no-op.
## Hugging Face Secrets
Add these in your Space settings under **Settings β Variables and secrets**.
Required secrets:
```bash
NVIDIA_API_KEY=<your-nvidia-api-key>
QDRANT_URL=<your-qdrant-url>
QDRANT_API_KEY=<your-qdrant-api-key>
```
Optional variables:
```bash
QDRANT_COLLECTION_NAME=knowledge_base
NVIDIA_API_URL=https://integrate.api.nvidia.com/v1
NVIDIA_CHAT_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_EMBED_MODEL=nvidia/llama-nemotron-colembed-vl-3b-v2
NEMOTRON_PARSE_MODEL=Qwen/Qwen2-VL-2B-Instruct
HF_TOKEN=<token-if-needed-for-gated-model-downloads>
```
Use a hosted Qdrant instance for Hugging Face Spaces. `localhost:6333` only works for local development.
## Qdrant Collection Name
The Ingest and Retrieve tabs each have their own collection-name field. Set both to the same Qdrant collection when you want to search what you just ingested. The fields are intentionally not auto-synced because auto-sync can cause continuous refreshes in hosted Gradio Spaces.
## Setup
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
```
Add `NVIDIA_API_KEY` to `.env` for chat completions. Start Qdrant locally or point `QDRANT_URL` to your hosted instance.
The default model split is:
- Local parsing model: `Qwen/Qwen2-VL-2B-Instruct`
- Local embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2`
- NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2`
## Run
```bash
python app.py
```
Open the local Gradio URL printed in the terminal, usually `http://127.0.0.1:7860`.
The app binds to `0.0.0.0:7860`, which is suitable for Hugging Face Spaces and container deployments.
|