--- title: BuildSmall KnowledgeHub emoji: 📚 colorFrom: blue colorTo: purple sdk: gradio app_file: app.py pinned: true license: mit short_description: AI knowledge hub for groups, powered by Nvidia tags: - track:backyard - sponsor:openai - sponsor:nvidia - achievement:offbrand - achievement:sharing - achievement:fieldnotes --- # BuildSmall KnowledgeHub - https://huggingface.co/pkheria BuildSmall KnowledgeHub is a modular Gradio app for loading knowledge from: - Medium article links through Freedium - arXiv links or IDs - PDF documents It extracts text, captures Medium image references/captions when available, chunks the content, embeds chunks locally with the configured NVIDIA Nemotron embedding model, uploads vectors into Qdrant, and generates grounded answers with NVIDIA's OpenAI-compatible chat API. ## 🔗 Resources & Links - **Demo Video:** [Watch the Product Demo]([YOUR_DEMO_VIDEO_LINK_HERE](https://youtu.be/aDlKNW10pnw)) - **Blog Post:** [Read the Full Write-up](https://huggingface.co/blog/pkheria/knowledgemesh) - **Social Post :** [Linkedin Post](https://www.linkedin.com/posts/piyushkheria7_buildsmall-generativeai-rag-share-7472326307721437184-pFrz/) ## NVIDIA Usage This project explicitly uses NVIDIA in two places: - Local retrieval embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2` - NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2` The chat client calls: ```text https://integrate.api.nvidia.com/v1 ``` You must provide `NVIDIA_API_KEY` as a Hugging Face Space secret or in your local `.env`. ## Hugging Face Spaces Deployment For ZeroGPU Spaces, add these Space variables: ```bash ENABLE_ZEROGPU=true EMBEDDING_DEVICE=cuda ZEROGPU_DURATION_SECONDS=180 ``` For local Apple Silicon development, keep: ```bash EMBEDDING_DEVICE=cpu ``` The Gradio ingest, search, and answer callbacks are decorated with `spaces.GPU` when running on Hugging Face Spaces. Locally, the decorator becomes a no-op. ## Hugging Face Secrets Add these in your Space settings under **Settings → Variables and secrets**. Required secrets: ```bash NVIDIA_API_KEY= QDRANT_URL= QDRANT_API_KEY= ``` Optional variables: ```bash QDRANT_COLLECTION_NAME=knowledge_base NVIDIA_API_URL=https://integrate.api.nvidia.com/v1 NVIDIA_CHAT_MODEL=nvidia/nvidia-nemotron-nano-9b-v2 NEMOTRON_EMBED_MODEL=nvidia/llama-nemotron-colembed-vl-3b-v2 NEMOTRON_PARSE_MODEL=Qwen/Qwen2-VL-2B-Instruct HF_TOKEN= ``` Use a hosted Qdrant instance for Hugging Face Spaces. `localhost:6333` only works for local development. ## Qdrant Collection Name The Ingest and Retrieve tabs each have their own collection-name field. Set both to the same Qdrant collection when you want to search what you just ingested. The fields are intentionally not auto-synced because auto-sync can cause continuous refreshes in hosted Gradio Spaces. ## Setup ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt cp .env.example .env ``` Add `NVIDIA_API_KEY` to `.env` for chat completions. Start Qdrant locally or point `QDRANT_URL` to your hosted instance. The default model split is: - Local parsing model: `Qwen/Qwen2-VL-2B-Instruct` - Local embedding model: `nvidia/llama-nemotron-colembed-vl-3b-v2` - NVIDIA API chat model: `nvidia/nvidia-nemotron-nano-9b-v2` ## Run ```bash python app.py ``` Open the local Gradio URL printed in the terminal, usually `http://127.0.0.1:7860`. The app binds to `0.0.0.0:7860`, which is suitable for Hugging Face Spaces and container deployments.