Spaces:
Running on Zero
Running on Zero
| title: BuildSmall KnowledgeHub | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| short_description: AI knowledge hub for groups, powered by Nvidia | |
| tags: | |
| - track:backyard | |
| - sponsor:openai | |
| - sponsor:nvidia | |
| - achievement:offbrand | |
| - achievement:sharing | |
| - achievement:fieldnotes | |
| # BuildSmall KnowledgeHub - https://huggingface.co/pkheria | |
| BuildSmall KnowledgeHub is a modular Gradio app for loading knowledge from: | |
| - Medium article links through Freedium | |
| - arXiv links or IDs | |
| - PDF documents | |
| It extracts text, captures Medium image references/captions when available, chunks the content, embeds chunks locally with the configured NVIDIA Nemotron embedding model, uploads vectors into Qdrant, and generates grounded answers with NVIDIA's OpenAI-compatible chat API. | |
| ## π Resources & Links | |
| - **Demo Video:** [Watch the Product Demo]([YOUR_DEMO_VIDEO_LINK_HERE](https://youtu.be/aDlKNW10pnw)) | |
| - **Blog Post:** [Read the Full Write-up](https://huggingface.co/blog/pkheria/knowledgemesh) | |
| - **Social Post :** [Linkedin Post](https://www.linkedin.com/posts/piyushkheria7_buildsmall-generativeai-rag-share-7472326307721437184-pFrz/) | |
| ## NVIDIA Usage | |
| This project explicitly uses NVIDIA in two places: | |
| - Local retrieval embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2` | |
| - NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2` | |
| The chat client calls: | |
| ```text | |
| https://integrate.api.nvidia.com/v1 | |
| ``` | |
| You must provide `NVIDIA_API_KEY` as a Hugging Face Space secret or in your local `.env`. | |
| ## Hugging Face Spaces Deployment | |
| For ZeroGPU Spaces, add these Space variables: | |
| ```bash | |
| ENABLE_ZEROGPU=true | |
| EMBEDDING_DEVICE=cuda | |
| ZEROGPU_DURATION_SECONDS=180 | |
| ``` | |
| For local Apple Silicon development, keep: | |
| ```bash | |
| EMBEDDING_DEVICE=cpu | |
| ``` | |
| The Gradio ingest, search, and answer callbacks are decorated with `spaces.GPU` when running on Hugging Face Spaces. Locally, the decorator becomes a no-op. | |
| ## Hugging Face Secrets | |
| Add these in your Space settings under **Settings β Variables and secrets**. | |
| Required secrets: | |
| ```bash | |
| NVIDIA_API_KEY=<your-nvidia-api-key> | |
| QDRANT_URL=<your-qdrant-url> | |
| QDRANT_API_KEY=<your-qdrant-api-key> | |
| ``` | |
| Optional variables: | |
| ```bash | |
| QDRANT_COLLECTION_NAME=knowledge_base | |
| NVIDIA_API_URL=https://integrate.api.nvidia.com/v1 | |
| NVIDIA_CHAT_MODEL=nvidia/nvidia-nemotron-nano-9b-v2 | |
| NEMOTRON_EMBED_MODEL=nvidia/llama-nemotron-colembed-vl-3b-v2 | |
| NEMOTRON_PARSE_MODEL=Qwen/Qwen2-VL-2B-Instruct | |
| HF_TOKEN=<token-if-needed-for-gated-model-downloads> | |
| ``` | |
| Use a hosted Qdrant instance for Hugging Face Spaces. `localhost:6333` only works for local development. | |
| ## Qdrant Collection Name | |
| The Ingest and Retrieve tabs each have their own collection-name field. Set both to the same Qdrant collection when you want to search what you just ingested. The fields are intentionally not auto-synced because auto-sync can cause continuous refreshes in hosted Gradio Spaces. | |
| ## Setup | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| cp .env.example .env | |
| ``` | |
| Add `NVIDIA_API_KEY` to `.env` for chat completions. Start Qdrant locally or point `QDRANT_URL` to your hosted instance. | |
| The default model split is: | |
| - Local parsing model: `Qwen/Qwen2-VL-2B-Instruct` | |
| - Local embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2` | |
| - NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2` | |
| ## Run | |
| ```bash | |
| python app.py | |
| ``` | |
| Open the local Gradio URL printed in the terminal, usually `http://127.0.0.1:7860`. | |
| The app binds to `0.0.0.0:7860`, which is suitable for Hugging Face Spaces and container deployments. | |