ninooo96 commited on
Commit
bbb806b
·
1 Parent(s): b840569

upload gradio project to HuggingFace Spaces

Browse files
Files changed (3) hide show
  1. README.md +43 -8
  2. app.py +284 -0
  3. requirements.txt +5 -0
README.md CHANGED
@@ -1,14 +1,49 @@
1
  ---
2
- title: SemanticBookSearch
3
- emoji: 🌖
4
- colorFrom: indigo
5
- colorTo: blue
6
  sdk: gradio
7
- sdk_version: 6.0.2
8
  app_file: app.py
9
  pinned: false
10
- license: apache-2.0
11
- short_description: 'An AI-powered semantic search engine for 2.4M+ books. '
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Semantic Book Search (2.4M)
3
+ emoji: 📚
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 4.0.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
 
11
  ---
12
 
13
+ # 📚 Semantic Book Search Engine
14
+
15
+ Welcome to the **AI-powered Book Search Engine**.
16
+ Stop searching by exact keywords. This tool allows you to search for books by **describing the plot, the atmosphere, or the emotions** you are looking for.
17
+
18
+ The system indexes over **2.4 million books**, allowing you to uncover hidden gems using state-of-the-art Natural Language Processing.
19
+
20
+ ## 🚀 How to use it
21
+
22
+ ### 1. 🔎 Search by Plot (Semantic Search)
23
+ Can't remember the title? Looking for a specific vibe?
24
+ * Try: *"A dystopian novel where books are banned and burned by firemen"*
25
+ * Try: *"A psychological thriller set in Victorian London with a plot twist"*
26
+ * The model understands the **concept** and retrieves the most semantically similar books.
27
+
28
+ ### 2. 📖 I liked... (Recommendation)
29
+ Did you love a specific book?
30
+ * Switch to the second tab.
31
+ * Search for a title (e.g., *"Harry Potter"*).
32
+ * The system retrieves the existing vector from the database and recommends books that are mathematically closest in the latent space (similar style, genre, and plot).
33
+
34
+ ---
35
+
36
+ ## 🛠️ Under the Hood (Technical Architecture)
37
+
38
+ This project is a showcase of **End-to-End AI Engineering**, designed to handle large-scale datasets in a **Low-Resource Environment**.
39
+
40
+ * **Dataset:** ~2.4 Million books processed and indexed.
41
+ * **AI Embedding Model:** `intfloat/multilingual-e5-small`.
42
+ * **Hybrid Retrieval Architecture:**
43
+ * 🧠 **Qdrant (Vector DB):** Handles semantic similarity search. Vectors are compressed using **INT8 Scalar Quantization**.
44
+ * 🗄️ **Turso (LibSQL):** Relational database for low-latency metadata retrieval (Title, Author, Year, Rating), keeping the vector payload lightweight.
45
+
46
+ ### 👨‍💻 Author
47
+ **Antonio Gagliostro**
48
+ * [GitHub Profile](https://github.com/ninooo96)
49
+ * [LinkedIn](https://www.linkedin.com/in/antonio-gagliostro-1b4751121)
app.py ADDED
@@ -0,0 +1,284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ import libsql_experimental as libsql
4
+ from qdrant_client import QdrantClient, models
5
+ from sentence_transformers import SentenceTransformer
6
+ import time
7
+
8
+ # --- SETUP ---
9
+ model = SentenceTransformer("intfloat/multilingual-e5-small", device="cpu")
10
+
11
+ QDRANT_URL = os.environ.get("QDRANT_URL")
12
+ QDRANT_API_KEY = os.environ.get("QDRANT_API_KEY")
13
+ TURSO_URL = os.environ.get("TURSO_URL")
14
+ TURSO_TOKEN = os.environ.get("TURSO_TOKEN")
15
+
16
+ try:
17
+ client = QdrantClient(
18
+ url=QDRANT_URL,
19
+ api_key=QDRANT_API_KEY
20
+ )
21
+ except Exception as e:
22
+ print(f"Errore Qdrant: {e}")
23
+
24
+ def get_turso_conn():
25
+ return libsql.connect(TURSO_URL, auth_token=TURSO_TOKEN)
26
+
27
+ COLLECTION_NAME = "books_collection"
28
+ VECTOR_SIZE = 256
29
+
30
+ # --- CSS GLOBALE ---
31
+ GLOBAL_CSS = """
32
+ /* Animazione Spinner */
33
+ .loader {
34
+ border: 6px solid #f3f3f3;
35
+ border-radius: 50%;
36
+ border-top: 6px solid #3498db;
37
+ border-bottom: 6px solid #e74c3c;
38
+ width: 40px;
39
+ height: 40px;
40
+ -webkit-animation: spin 1s linear infinite;
41
+ animation: spin 1s linear infinite;
42
+ margin: 0 auto;
43
+ }
44
+
45
+ @keyframes spin {
46
+ 0% { transform: rotate(0deg); }
47
+ 100% { transform: rotate(360deg); }
48
+ }
49
+
50
+ #book_cards button {
51
+ background-color: #f0f2f5 !important;
52
+ color: #1f2937 !important;
53
+ border: 1px solid #d1d5db !important;
54
+ text-align: left;
55
+ padding: 10px !important;
56
+ }
57
+ #book_cards button:hover {
58
+ background-color: #e5e7eb !important;
59
+ border-color: #3b82f6 !important;
60
+ }
61
+ #book_cards .text-sm {
62
+ color: #4b5563 !important;
63
+ }
64
+
65
+ .card-force-dark {
66
+ color: #000000 !important;
67
+ }
68
+ .card-force-dark h3 {
69
+ color: #1f2937 !important;
70
+ margin-top: 0 !important;
71
+ }
72
+ .card-force-dark p,
73
+ .card-force-dark b,
74
+ .card-force-dark span,
75
+ .card-force-dark div {
76
+ color: #000000 !important;
77
+ .card-force-dark summary {
78
+ color: #007bff !important;
79
+ }
80
+ """
81
+
82
+ # HTML dello spinner
83
+ LOADING_HTML = """
84
+ <div style="display: flex; justify-content: center; align-items: center; height: 100px; width: 100%; flex-direction: column;">
85
+ <div class="loader"></div>
86
+ <p style="margin-top: 10px; color: #666; font-size: 0.9em;">Ricerca libri...</p>
87
+ </div>
88
+ """
89
+
90
+ # --- FUNZIONI DI SUPPORTO ---
91
+
92
+ def render_results_from_ids(ids, scores):
93
+ """Genera l'HTML dai risultati."""
94
+ if not ids: return "Nessun risultato trovato."
95
+
96
+ conn = None
97
+ ordered_books = []
98
+ try:
99
+ conn = get_turso_conn()
100
+ placeholders = ", ".join(["?"] * len(ids))
101
+ sql_query = f"SELECT id, title, author, year, rating, summary FROM books WHERE id IN ({placeholders})"
102
+ cursor = conn.execute(sql_query, tuple(ids))
103
+ rows = cursor.fetchall()
104
+ books_map = {row[0]: row for row in rows}
105
+
106
+ for uid in ids:
107
+ if uid in books_map:
108
+ ordered_books.append(books_map[uid])
109
+ except Exception as e:
110
+ return f"Errore Database Turso: {str(e)}"
111
+ finally:
112
+ if conn: conn.close()
113
+
114
+ html_output = "<div style='font-family: sans-serif; gap: 10px; display: flex; flex-direction: column;'>"
115
+ for row in ordered_books:
116
+ score = scores.get(row[0], 0.0)
117
+
118
+ autore_clean = str(row[2]).replace('"', '').replace("[","").replace("]", "")
119
+
120
+ html_output += f"""
121
+ <div class="card-force-dark" style="border: 1px solid #ddd; padding: 15px; border-radius: 8px; background-color: #ffffff;">
122
+ <h3>{row[1]}</h3>
123
+
124
+ <p style="font-size: 0.9em; margin-bottom: 5px;">
125
+ <b>✍️ {autore_clean}</b> | 📅 {row[3]} | ⭐ {row[4]}
126
+ </p>
127
+
128
+ <p style="margin-top: 0;">
129
+ <b>🎯 Similarità:</b> {score:.3f}
130
+ </p>
131
+
132
+ <details>
133
+ <summary style="cursor: pointer;">Leggi Trama</summary>
134
+ <p style="margin-top: 5px;">{row[5]}</p>
135
+ </details>
136
+ </div>
137
+ """
138
+ html_output += "</div>"
139
+ return html_output
140
+
141
+ def search_free_text_animated(query_text, max_results):
142
+ yield gr.update(visible=True), gr.update(visible=False)
143
+ time.sleep(0.2)
144
+
145
+ if not query_text:
146
+ yield gr.update(visible=False), "Inserisci una richiesta!"
147
+ return
148
+
149
+ try:
150
+ vec = model.encode(f"query: {query_text}")[:VECTOR_SIZE]
151
+ hits_response = client.query_points(
152
+ collection_name=COLLECTION_NAME,
153
+ query=vec,
154
+ limit=int(max_results),
155
+ search_params=models.SearchParams(quantization=models.QuantizationSearchParams(rescore=True))
156
+ )
157
+ hits = hits_response.points
158
+ ids = [hit.id for hit in hits]
159
+ scores = {hit.id: hit.score for hit in hits}
160
+
161
+ final_html = render_results_from_ids(ids, scores)
162
+
163
+ yield gr.update(visible=False), gr.update(value=final_html, visible=True)
164
+
165
+ except Exception as e:
166
+ yield gr.update(visible=False), f"Errore: {e}"
167
+
168
+
169
+ def find_book_cards_animated(partial_title):
170
+ # FASE 1: Spinner ON, Dataset OFF
171
+ yield gr.update(visible=True), gr.update(visible=False), []
172
+ time.sleep(0.3)
173
+
174
+ if not partial_title or len(partial_title) < 2:
175
+ yield gr.update(visible=False), gr.update(samples=[], visible=False), []
176
+ return
177
+
178
+ conn = None
179
+ try:
180
+ conn = get_turso_conn()
181
+ query = f"SELECT id, title, author, year FROM books WHERE title LIKE '%{partial_title}%' LIMIT 10"
182
+ rows = conn.execute(query).fetchall()
183
+
184
+ card_data = [[str(row[1]), str(row[2]).replace('"', '').replace("'", "").replace("[","").replace("]",""), str(row[3]).split('.')[0]] for row in rows]
185
+ full_data_state = [{"id": row[0], "title": row[1]} for row in rows]
186
+
187
+ # FASE 2: Spinner OFF, Dataset ON
188
+ yield gr.update(visible=False), gr.update(samples=card_data, visible=True), full_data_state
189
+
190
+ except Exception as e:
191
+ print(f"Error: {e}")
192
+ yield gr.update(visible=False), gr.update(visible=False), []
193
+ finally:
194
+ if conn: conn.close()
195
+
196
+
197
+ def on_card_click_animated(selected_index, books_state_list, max_results):
198
+ yield gr.update(visible=True), gr.update(visible=False)
199
+ time.sleep(0.2)
200
+
201
+ if selected_index >= len(books_state_list):
202
+ yield gr.update(visible=False), "Errore selezione."
203
+ return
204
+
205
+ book_obj = books_state_list[selected_index]
206
+ source_id = book_obj["id"]
207
+
208
+ try:
209
+ points = client.retrieve(collection_name=COLLECTION_NAME, ids=[source_id], with_vectors=True)
210
+ if not points:
211
+ yield gr.update(visible=False), "ID non trovato."
212
+ return
213
+
214
+ existing_vector = points[0].vector
215
+ hits_response = client.query_points(
216
+ collection_name=COLLECTION_NAME,
217
+ query=existing_vector,
218
+ limit=int(max_results),
219
+ query_filter=models.Filter(must_not=[models.HasIdCondition(has_id=[source_id])]),
220
+ search_params=models.SearchParams(quantization=models.QuantizationSearchParams(rescore=True))
221
+ )
222
+ ids = [hit.id for hit in hits_response.points]
223
+ scores = {hit.id: hit.score for hit in hits_response.points}
224
+ final_html = render_results_from_ids(ids, scores)
225
+
226
+ yield gr.update(visible=False), gr.update(value=final_html, visible=True)
227
+
228
+ except Exception as e:
229
+ yield gr.update(visible=False), f"Errore Backend: {e}"
230
+
231
+
232
+ # --- INTERFACCIA ---
233
+ with gr.Blocks(theme=gr.themes.Soft(), css=GLOBAL_CSS) as demo:
234
+ gr.Markdown("# 📚 AI Book Finder")
235
+
236
+ books_state = gr.State([])
237
+
238
+ with gr.Row():
239
+ num_results = gr.Slider(1, 10, value=5, step=1, label="Quanti consigli vuoi?")
240
+
241
+ with gr.Tabs():
242
+ # TAB 1: Ricerca Libera
243
+ with gr.Tab("🔎 Ricerca per Trama"):
244
+ with gr.Row():
245
+ txt_input = gr.Textbox(placeholder="Descrivi la trama, l'atmosfera o le emozioni che cerchi...", show_label=False, scale=4)
246
+ btn_search = gr.Button("Cerca", variant="primary", scale=1)
247
+
248
+ # TAB 2: Ricerca per Libro
249
+ with gr.Tab("📖 Mi è piaciuto..."):
250
+ with gr.Row():
251
+ txt_title = gr.Textbox(placeholder="Scrivi il titolo, anche parziale", show_label=False, scale=4)
252
+ btn_find = gr.Button("Trova", scale=1)
253
+
254
+ loader_cards = gr.HTML(value=LOADING_HTML, visible=False)
255
+
256
+ # DATASET
257
+ cards_view = gr.Dataset(
258
+ elem_id="book_cards",
259
+ label="Seleziona il libro corretto:",
260
+ components=[gr.Textbox(visible=False), gr.Textbox(visible=False), gr.Textbox(visible=False)],
261
+ headers=["Titolo", "Autore", "Anno"],
262
+ samples=[],
263
+ visible=False,
264
+ type="index"
265
+ )
266
+
267
+ loader_results = gr.HTML(value=LOADING_HTML, visible=False)
268
+ out_results = gr.HTML(label="Consigli", visible=True)
269
+
270
+ # EVENTI
271
+ btn_search.click(fn=search_free_text_animated, inputs=[txt_input, num_results], outputs=[loader_results, out_results])
272
+ txt_input.submit(fn=search_free_text_animated, inputs=[txt_input, num_results], outputs=[loader_results, out_results])
273
+
274
+ btn_find.click(fn=find_book_cards_animated, inputs=[txt_title], outputs=[loader_cards, cards_view, books_state])
275
+ txt_title.submit(fn=find_book_cards_animated, inputs=[txt_title], outputs=[loader_cards, cards_view, books_state])
276
+
277
+ cards_view.click(
278
+ fn=on_card_click_animated,
279
+ inputs=[cards_view, books_state, num_results],
280
+ outputs=[loader_results, out_results]
281
+ )
282
+
283
+ print("Avvio Gradio...")
284
+ demo.launch(share=True, debug=True)
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ gradio
2
+ pandas
3
+ qdrant-client
4
+ sentence-transformers
5
+ libsql-experimental