YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🧩 vROM Hub Backend

Python backend for the vROM ecosystem β€” accepts documentation pages, chunks them, embeds them, builds HNSW indexes, and publishes vROM packages to the HF Hub registry.

What is vROM?

vROM (Vector Read-Only Memory) is a pre-computed, portable vector knowledge base. Like a game ROM cartridge, you "slot it in" and the knowledge is instantly available for semantic search β€” entirely client-side in the browser via WebAssembly.

This backend handles the build pipeline: turning documentation into vROM packages that the browser runtime (vecdb-wasm) can load.

Architecture

 Documentation Pages (Markdown/URLs)
          β”‚
          β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   DocFetcher         β”‚  Fetch from URLs, HF docs, or raw markdown
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚  SectionAwareChunker β”‚  ~256-token chunks, code-block-preserving,
 β”‚                      β”‚  doubly-linked list for context expansion
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   ChunkEmbedder      β”‚  sentence-transformers/all-MiniLM-L6-v2
 β”‚                      β”‚  384d, L2-normalized, cosine metric
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   HnswBuilder        β”‚  Pure-Python HNSW construction
 β”‚                      β”‚  Produces index.json compatible with
 β”‚                      β”‚  Rust/WASM VectorDB.load()
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   VromHubBackend     β”‚  Packages manifest + chunks + index
 β”‚                      β”‚  Uploads to HF Hub registry dataset
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
 HF Hub Dataset (philipp-zettl/vrom-registry)
   registry.json           ← master index
   vroms/{id}/
     manifest.json         ← metadata
     index.json            ← HNSW graph (VectorDB.load() compatible)
     chunks.json           ← chunk metadata

Installation

pip install sentence-transformers huggingface_hub numpy requests

Quick Start

Python API

from vrom_hub import VromHubBackend

hub = VromHubBackend(registry_repo="philipp-zettl/vrom-registry")

# Build from raw markdown pages
result = hub.submit_project(
    vrom_id="my-project-docs",
    name="My Project Documentation",
    description="Complete API reference and guides",
    version="1.0.0",
    pages=[
        {
            "content": "# Getting Started\n\nInstall with `pip install mylib`...",
            "url": "https://myproject.dev/docs/getting-started",
            "title": "Getting Started",
        },
        {
            "content": "# API Reference\n\n## MyClass\n\n```python\nclass MyClass:...",
            "url": "https://myproject.dev/docs/api",
            "title": "API Reference",
        },
    ],
    tags=["my-project", "api", "docs"],
)

print(f"Published: {result['hub_url']}")
print(f"Vectors: {result['stats']['vectors']}")

Build from URLs

result = hub.submit_project(
    vrom_id="my-docs",
    name="My Docs",
    description="Fetched from live URLs",
    urls=["https://example.com/docs/page1", "https://example.com/docs/page2"],
    tags=["docs"],
)

Build locally (no upload)

result = hub.build_vrom(
    pages=[...],
    vrom_id="local-test",
    output_dir="./my_vrom",
)
# Produces: ./my_vrom/{index.json, chunks.json, manifest.json}

CLI

# Build from local markdown files
python -m vrom_hub.cli build my-docs \
    --name "My Docs" \
    --files "docs/*.md" \
    --output ./vrom_output

# Build + publish to registry
python -m vrom_hub.cli submit my-docs \
    --name "My Docs" \
    --description "Project documentation" \
    --files "docs/*.md" \
    --tags my-project api

# List all vROMs in registry
python -m vrom_hub.cli list

# Show details for a specific vROM
python -m vrom_hub.cli info hf-transformers-docs

Output Format

Each vROM package consists of 3 files, all 100% compatible with VectorDB.load():

File Description
index.json Serialized HNSW graph (nodes, neighbors, vectors, metadata)
chunks.json Chunk metadata array (text, source, headings, linked-list pointers)
manifest.json Package metadata (embedding spec, HNSW config, stats)

Modules

Module Description
vrom_hub.chunker Section-aware document chunker with linked-list pointers
vrom_hub.embedder Chunk embedding via sentence-transformers (384d, cosine)
vrom_hub.hnsw Pure-Python HNSW builder (Rust/WASM-compatible output)
vrom_hub.fetcher Doc page fetcher (URLs, HF docs, raw markdown)
vrom_hub.hub Main orchestrator: chunk β†’ embed β†’ build β†’ upload β†’ register
vrom_hub.cli Command-line interface

Registry

The vROM Hub Backend publishes to philipp-zettl/vrom-registry:

import { AgentMemory } from 'vrom.js';

const memory = new AgentMemory({
  registryUrl: 'https://huggingface.co/datasets/philipp-zettl/vrom-registry/resolve/main/registry.json',
});

await memory.init();
await memory.mount('my-project-docs');
const results = await memory.search('how to install');

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support