Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: Substack Semantic Search
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
π Semantic Search over Substack Posts
This Space hosts a semantic search engine built over a collection of Substack HTML posts.
It uses SentenceTransformers, FAISS, and Gradio to provide fast, offline semantic similarity search.
π How It Works
1. Chunk + Embed
HTML posts from the posts/ directory are:
- parsed with BeautifulSoup
- split into manageable text chunks
- embedded using
all-MiniLM-L6-v2 - stored in a FAISS vector index
2. Vector Search
At runtime, the app:
- loads
faiss_index.binandfaiss_meta.pkl - embeds the user query
- retrieves the most semantically relevant chunks
3. Gradio App
The search UI is powered by Gradio and runs fully offline inside this Space.
Local Usage
To rebuild the FAISS index locally:
pip install -r requirements.txt
python src/build_index.py
python app.py
Ensure your .html files live in:
posts/
Make sure these files are at root
faiss_index.bin
faiss_meta.pkl
app.py
requirements.txt
Once your local run works:
python app.py