YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π§© vROM Hub Backend
Python backend for the vROM ecosystem β accepts documentation pages, chunks them, embeds them, builds HNSW indexes, and publishes vROM packages to the HF Hub registry.
What is vROM?
vROM (Vector Read-Only Memory) is a pre-computed, portable vector knowledge base. Like a game ROM cartridge, you "slot it in" and the knowledge is instantly available for semantic search β entirely client-side in the browser via WebAssembly.
This backend handles the build pipeline: turning documentation into vROM packages that the browser runtime (vecdb-wasm) can load.
Architecture
Documentation Pages (Markdown/URLs)
β
βΌ
ββββββββββββββββββββββββ
β DocFetcher β Fetch from URLs, HF docs, or raw markdown
ββββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β SectionAwareChunker β ~256-token chunks, code-block-preserving,
β β doubly-linked list for context expansion
ββββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β ChunkEmbedder β sentence-transformers/all-MiniLM-L6-v2
β β 384d, L2-normalized, cosine metric
ββββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β HnswBuilder β Pure-Python HNSW construction
β β Produces index.json compatible with
β β Rust/WASM VectorDB.load()
ββββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β VromHubBackend β Packages manifest + chunks + index
β β Uploads to HF Hub registry dataset
ββββββββββββββββββββββββ
β
βΌ
HF Hub Dataset (philipp-zettl/vrom-registry)
registry.json β master index
vroms/{id}/
manifest.json β metadata
index.json β HNSW graph (VectorDB.load() compatible)
chunks.json β chunk metadata
Installation
pip install sentence-transformers huggingface_hub numpy requests
Quick Start
Python API
from vrom_hub import VromHubBackend
hub = VromHubBackend(registry_repo="philipp-zettl/vrom-registry")
# Build from raw markdown pages
result = hub.submit_project(
vrom_id="my-project-docs",
name="My Project Documentation",
description="Complete API reference and guides",
version="1.0.0",
pages=[
{
"content": "# Getting Started\n\nInstall with `pip install mylib`...",
"url": "https://myproject.dev/docs/getting-started",
"title": "Getting Started",
},
{
"content": "# API Reference\n\n## MyClass\n\n```python\nclass MyClass:...",
"url": "https://myproject.dev/docs/api",
"title": "API Reference",
},
],
tags=["my-project", "api", "docs"],
)
print(f"Published: {result['hub_url']}")
print(f"Vectors: {result['stats']['vectors']}")
Build from URLs
result = hub.submit_project(
vrom_id="my-docs",
name="My Docs",
description="Fetched from live URLs",
urls=["https://example.com/docs/page1", "https://example.com/docs/page2"],
tags=["docs"],
)
Build locally (no upload)
result = hub.build_vrom(
pages=[...],
vrom_id="local-test",
output_dir="./my_vrom",
)
# Produces: ./my_vrom/{index.json, chunks.json, manifest.json}
CLI
# Build from local markdown files
python -m vrom_hub.cli build my-docs \
--name "My Docs" \
--files "docs/*.md" \
--output ./vrom_output
# Build + publish to registry
python -m vrom_hub.cli submit my-docs \
--name "My Docs" \
--description "Project documentation" \
--files "docs/*.md" \
--tags my-project api
# List all vROMs in registry
python -m vrom_hub.cli list
# Show details for a specific vROM
python -m vrom_hub.cli info hf-transformers-docs
Output Format
Each vROM package consists of 3 files, all 100% compatible with VectorDB.load():
| File | Description |
|---|---|
index.json |
Serialized HNSW graph (nodes, neighbors, vectors, metadata) |
chunks.json |
Chunk metadata array (text, source, headings, linked-list pointers) |
manifest.json |
Package metadata (embedding spec, HNSW config, stats) |
Modules
| Module | Description |
|---|---|
vrom_hub.chunker |
Section-aware document chunker with linked-list pointers |
vrom_hub.embedder |
Chunk embedding via sentence-transformers (384d, cosine) |
vrom_hub.hnsw |
Pure-Python HNSW builder (Rust/WASM-compatible output) |
vrom_hub.fetcher |
Doc page fetcher (URLs, HF docs, raw markdown) |
vrom_hub.hub |
Main orchestrator: chunk β embed β build β upload β register |
vrom_hub.cli |
Command-line interface |
Registry
The vROM Hub Backend publishes to philipp-zettl/vrom-registry:
import { AgentMemory } from 'vrom.js';
const memory = new AgentMemory({
registryUrl: 'https://huggingface.co/datasets/philipp-zettl/vrom-registry/resolve/main/registry.json',
});
await memory.init();
await memory.mount('my-project-docs');
const results = await memory.search('how to install');
License
MIT