BuildSmall KnowledgeHub
AI knowledge hub for groups, powered by Nvidia
We all have too many tabs open. Between insightful Medium articles hidden behind paywalls, dense arXiv papers, and a folder full of PDFs, keeping track of knowledge is a mess. What if you could throw all of these sources into a single application and just... chat with them?
Enter BuildSmall KnowledgeHubβan open-source AI knowledge management tool built for Hugging Face Spaces. It acts as a modular, local-first (where it counts) Retrieval-Augmented Generation (RAG) pipeline powered by Gradio, Qdrant, and NVIDIA's Nemotron models.
Here is a look at how we built it, the tech stack, and how you can deploy your own instance.
BuildSmall KnowledgeHub is designed to ingest multi-modal, real-world data sources seamlessly:
Once ingested, the app chunks the content, embeds it locally, stores it in a Qdrant vector database, and uses an LLM to generate highly accurate, grounded answers to your queries.
To make this app fast and accurate, we split the workload between local models running on Hugging Face's ZeroGPU infrastructure and cloud APIs.
1. Embedding Pipeline (Local on ZeroGPU)
We use nvidia/llama-nemotron-colembed-vl-3b-v2 for generating embeddings. Because Hugging Face Spaces offers ZeroGPU support, we wrap our Gradio ingestion callbacks with the @spaces.GPU decorator. This dynamically allocates GPU resources exactly when the embedding model needs them, keeping the app efficient and cost-effective.
2. Visual Parsing
For handling complex documents, we rely on the lightweight but powerful Qwen/Qwen2-VL-2B-Instruct model to parse visual and text elements.
3. Chat & Generation (NVIDIA API)
Once the relevant chunks are retrieved from our Qdrant database, we pass the context to the nvidia/nvidia-nemotron-nano-9b-v2 model via NVIDIA's OpenAI-compatible API (integrate.api.nvidia.com). This ensures high-speed generation without needing massive VRAM on the Space itself.
One of the coolest features of this hub is the Medium extraction. Writing a scraper for Medium is notoriously difficult due to paywalls and dynamic content.
Instead of reinventing the wheel, we integrated Freedium (freedium-mirror.cfd). When a user inputs a Medium URL, the app translates it into a Freedium mirror link, scrapes the clean HTML, and intelligently extracts not just the text, but the alt tags and image URLs. This means the LLM actually knows what images were in the article, preserving crucial context that standard text scrapers lose.
Deploying this to your own Hugging Face Space is incredibly straightforward.
1. Create a Space Create a new Gradio Space (select ZeroGPU if you have access, or standard CPU/GPU).
2. Add Your Secrets In your Space settings, add the following under Variables and secrets:
NVIDIA_API_KEY: Your key from the NVIDIA developer portal.QDRANT_URL: Your hosted Qdrant cluster URL.QDRANT_API_KEY: Your Qdrant API key.(For ZeroGPU spaces, ensure ENABLE_ZEROGPU=true and EMBEDDING_DEVICE=cuda are set).
3. Push the Code
git remote add space https://huggingface.co/spaces/build-small-hackathon/KnowledgeMesh
git push space main
Thatβs it! You now have a personal, multi-modal research assistant.
Built for the BuildSmall Hackathon / Backyard AI Track.
AI knowledge hub for groups, powered by Nvidia
More from this author
This is a really thoughtful and useful project. Loved the idea and how it makes managing knowledge feel much simpler and more accessible!