Jikkii's picture
Focus on the backend; beginning of the UI
fed0900

indexing

Local semantic + lexical search over a folder of text and images.

Modalities:

  • semantic-text β€” dense embeddings over .txt / .md (sentence-transformers)
  • bm25 β€” lexical ranking over .txt / .md
  • image β€” dense CLIP embeddings over images

Group aliases:

  • text β†’ semantic-text + bm25

Install

pip install -r requirements.txt

Build the index

python index.py /path/to/folder

Writes index_data/semantic-text.faiss, index_data/bm25.pkl, index_data/image.faiss, plus *_meta.json.

Query (CLI)

The CLI is a thin HTTP client of backend/server.py so the model weights stay loaded in one process. Start the backend in another terminal first:

indexing/.env/bin/python backend/server.py

Then:

python query.py "your query"                # all modalities
python query.py "your query" 10             # top_k = 10
python query.py "your query" -m text          # semantic-text + bm25
python query.py "your query" -m semantic-text # dense text only
python query.py "your query" -m bm25          # lexical text only
python query.py "your query" -m image         # images only
python query.py "your query" -m text,image    # everything (same as default)

Set RAGSTUDIO_URL to point at a non-default backend (default http://127.0.0.1:8000).

Query (Python, in-process)

The searchers can still be imported directly if you don't want to run the backend β€” but each fresh Python process re-loads the model weights:

from searchers import SEARCHERS

SEARCHERS["semantic-text"]("your query", top_k=5)  # -> [(score, path), ...]
SEARCHERS["bm25"]("your query", top_k=5)
SEARCHERS["image"]("your query", top_k=5)

Add a modality

  1. Create searchers/<name>.py exposing search_<name>(query: str, top_k: int) -> list[tuple[float, str]].

  2. Register it in searchers/__init__.py:

    from .audio import search_audio
    SEARCHERS["audio"] = search_audio
    

To group several modalities under one alias, add to GROUPS in the same file:

GROUPS["av"] = ("audio", "image")

Both SEARCHERS keys and GROUPS keys work as CLI -m values.