ontocord
/

riverbed

huu-ontocord commited on Aug 30, 2024

Commit

e12990a

verified ·

1 Parent(s): 4e96485

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -34,3 +34,34 @@ textbook_model = fasttext.load_model("model_textbook_quality.bin")
 ```
 See the files here: https://huggingface.co/ontocord/riverbed/tree/main

 ```
 See the files here: https://huggingface.co/ontocord/riverbed/tree/main
+This includes a a small whoosh search index of wikidata useful for background knowledge for LLMs.
+installation:
+```import os
+if not os.path.exists("./wikidata_bm25_whoosh"):
+  !git clone https://huggingface.co/ontocord/riverbed
+!pip install -q whoosh
+import whoosh.index as whoosh_index
+from whoosh.qparser import QueryParser
+from whoosh.analysis import StemmingAnalyzer, Filter
+class MyFilter(Filter):
+  def __call__(self, tokens):
+    for t in tokens:
+        t.text = t.text.lower()
+        if len(t.text) > 5:
+          yield t
+          t.text = t.text[:5]
+        yield t
+try:
+  if qp is None: assert False
+except:
+  bm25_dir = "./riverbed"
+  index = whoosh_index.open_dir(bm25_dir)
+  searcher = index.searcher()
+  qp = QueryParser("content", schema=index.schema)
+```