DTanzillo commited on
Commit
7565d4e
ยท
verified ยท
1 Parent(s): 382c248

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -6
README.md CHANGED
@@ -1,15 +1,65 @@
1
- # Semantic Search over Substack Posts
 
 
 
 
 
 
 
 
 
2
 
3
- This project builds a semantic search engine over a collection of HTML posts.
4
 
5
- ## Steps
 
6
 
7
- 1. Place all .html files into a folder named posts/
8
- 2. Run:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ```
11
  pip install -r requirements.txt
12
  python src/build_index.py
13
  python app.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ```
15
- 3. The app will load the FAISS database and start a Gradio interface.
 
1
+ ---
2
+ title: Substack Semantic Search
3
+ emoji: ๐Ÿ”Ž
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: "4.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
 
12
+ # ๐Ÿ”Ž Semantic Search over Substack Posts
13
 
14
+ This Space hosts a semantic search engine built over a collection of Substack HTML posts.
15
+ It uses **SentenceTransformers**, **FAISS**, and **Gradio** to provide fast, offline semantic similarity search.
16
 
17
+ ---
18
+
19
+ ## ๐Ÿš€ How It Works
20
+
21
+ ### 1. **Chunk + Embed**
22
+ HTML posts from the `posts/` directory are:
23
+ - parsed with BeautifulSoup
24
+ - split into manageable text chunks
25
+ - embedded using `all-MiniLM-L6-v2`
26
+ - stored in a FAISS vector index
27
+
28
+ ### 2. **Vector Search**
29
+ At runtime, the app:
30
+ - loads `faiss_index.bin` and `faiss_meta.pkl`
31
+ - embeds the user query
32
+ - retrieves the most semantically relevant chunks
33
+
34
+ ### 3. **Gradio App**
35
+ The search UI is powered by Gradio and runs fully offline inside this Space.
36
+
37
+ ---
38
+
39
+ ## Local Usage
40
+
41
+ To rebuild the FAISS index locally:
42
 
43
  ```
44
  pip install -r requirements.txt
45
  python src/build_index.py
46
  python app.py
47
+ ````
48
+ Ensure your `.html` files live in:
49
+ ```
50
+ posts/
51
+ ```
52
+
53
+ Make sure these files are at root
54
+
55
+ ```
56
+ faiss_index.bin
57
+ faiss_meta.pkl
58
+ app.py
59
+ requirements.txt
60
+ ```
61
+
62
+ Once your local run works:
63
+ ```
64
+ python app.py
65
  ```