Spaces:
Runtime error
Runtime error
| title: Substack Semantic Search | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: "6.0.0" | |
| app_file: app.py | |
| pinned: false | |
| # π Semantic Search over Substack Posts | |
| This Space hosts a semantic search engine built over a collection of Substack HTML posts. | |
| It uses **SentenceTransformers**, **FAISS**, and **Gradio** to provide fast, offline semantic similarity search. | |
| --- | |
| ## π How It Works | |
| ### 1. **Chunk + Embed** | |
| HTML posts from the `posts/` directory are: | |
| - parsed with BeautifulSoup | |
| - split into manageable text chunks | |
| - embedded using `all-MiniLM-L6-v2` | |
| - stored in a FAISS vector index | |
| ### 2. **Vector Search** | |
| At runtime, the app: | |
| - loads `faiss_index.bin` and `faiss_meta.pkl` | |
| - embeds the user query | |
| - retrieves the most semantically relevant chunks | |
| ### 3. **Gradio App** | |
| The search UI is powered by Gradio and runs fully offline inside this Space. | |
| --- | |
| ## Local Usage | |
| To rebuild the FAISS index locally: | |
| ``` | |
| pip install -r requirements.txt | |
| python src/build_index.py | |
| python app.py | |
| ```` | |
| Ensure your `.html` files live in: | |
| ``` | |
| posts/ | |
| ``` | |
| Make sure these files are at root | |
| ``` | |
| faiss_index.bin | |
| faiss_meta.pkl | |
| app.py | |
| requirements.txt | |
| ``` | |
| Once your local run works: | |
| ``` | |
| python app.py | |
| ``` | |