Nirman Patel commited on
Commit
f632dba
·
verified ·
1 Parent(s): 8847646

Upload folder using huggingface_hub

Browse files
.DS_Store ADDED
Binary file (8.2 kB). View file
 
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Nirman Patel
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,12 +1,388 @@
1
  ---
2
- title: Semantic Book Recommender
3
- emoji: 🐠
4
- colorFrom: blue
5
- colorTo: green
6
  sdk: gradio
7
  sdk_version: 5.38.0
8
- app_file: app.py
9
- pinned: false
10
  ---
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: semantic-book-recommender
3
+ app_file: gradio_dashboard.py
 
 
4
  sdk: gradio
5
  sdk_version: 5.38.0
 
 
6
  ---
7
+ # 📚 Semantic Book Recommendation System
8
 
9
+ [![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
10
+ [![Transformers](https://img.shields.io/badge/transformers-4.21.0-orange.svg)](https://huggingface.co/transformers/)
11
+ [![Gradio](https://img.shields.io/badge/gradio-3.40.0-green.svg)](https://gradio.app/)
12
+ [![LangChain](https://img.shields.io/badge/langchain-0.1.0-red.svg)](https://langchain.readthedocs.io/)
13
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
14
+
15
+ A sophisticated book recommendation system that combines semantic search with emotion analysis to provide personalized book suggestions. The system uses vector embeddings, zero-shot classification, and emotion detection to understand user preferences and recommend books based on content similarity and emotional tone.
16
+
17
+ ## 🌟 Features
18
+
19
+ - **Semantic Search**: Uses HuggingFace embeddings and ChromaDB for vector-based similarity search
20
+ - **Emotion Analysis**: Analyzes book descriptions for emotional content (joy, sadness, anger, fear, surprise, disgust, neutral)
21
+ - **Zero-Shot Classification**: Automatically categorizes books into Fiction/Non-Fiction using BART-large-MNLI
22
+ - **Interactive Dashboard**: Gradio-based web interface for easy book discovery
23
+ - **Advanced Filtering**: Filter by category, emotional tone, and rating
24
+ - **Data Visualization**: Statistical insights and data exploration tools
25
+
26
+ ## 🏗️ System Architecture
27
+
28
+ ```
29
+ books.csv → Data Cleaning → Category Classification → Emotion Analysis → Vector Database → Gradio UI
30
+ ```
31
+
32
+ ### Pipeline Components:
33
+
34
+ 1. **Data Exploration & Cleaning** (`data_exploration.py`)
35
+ - Handles missing values and data quality issues
36
+ - Filters books with substantial descriptions (25+ words)
37
+ - Creates correlation analysis and visualizations
38
+
39
+ 2. **Text Classification** (`text_classification.py`)
40
+ - Zero-shot classification for Fiction/Non-Fiction categorization
41
+ - Uses Facebook's BART-large-MNLI model
42
+ - Achieves high accuracy in automated categorization
43
+
44
+ 3. **Sentiment Analysis** (`sentiment_analysis.py`)
45
+ - Emotion detection using DistilRoBERTa model
46
+ - Analyzes 7 emotions: anger, disgust, fear, joy, sadness, surprise, neutral
47
+ - Sentence-level emotion scoring with max aggregation
48
+
49
+ 4. **Vector Search** (`vector_search.py`)
50
+ - Creates embeddings using HuggingFace sentence-transformers
51
+ - Implements ChromaDB for efficient similarity search
52
+ - Supports semantic book discovery
53
+
54
+ 5. **Gradio Dashboard** (`gradio_dashboard.py`)
55
+ - Interactive web interface for book recommendations
56
+ - Real-time filtering and visualization
57
+ - Statistical dashboards and data insights
58
+
59
+ ## 📁 Project Structure
60
+
61
+ ```
62
+ semantic-book-recommender/
63
+ ├── 📄 Core Files
64
+ │ ├── .env.example # Template for environment variables
65
+ │ ├── .gitignore # Git ignore file (IMPORTANT!)
66
+ │ ├── README.md # This file
67
+ │ └── requirements.txt # Python dependencies
68
+
69
+ ├── 🐍 Python Scripts
70
+ │ ├── data_exploration.py # Data cleaning and exploration
71
+ │ ├── text_classification.py # Zero-shot classification
72
+ │ ├── sentiment_analysis.py # Emotion analysis
73
+ │ ├── vector_search.py # Vector database operations
74
+ │ └── gradio_dashboard.py # Web interface
75
+
76
+ ├── 📊 Data Files (Generated/Input)
77
+ │ ├── books.csv # Input dataset (not included in repo)
78
+ │ ├── books_cleaned.csv # Cleaned dataset
79
+ │ ├── books_with_categories.csv # Dataset with categories
80
+ │ ├── books_with_emotions.csv # Final dataset with emotions
81
+ │ ├── tagged_description.txt # Generated text file for embeddings
82
+ │ └── predictions_results.csv # Classification results
83
+
84
+ ├── 🖼️ Assets
85
+ │ └── cover-not-found.jpg # Default book cover image
86
+
87
+ ├── 🗄️ Vector Databases (Auto-generated)
88
+ │ ├── chroma_db_books/ # OpenAI embeddings vector DB
89
+ │ └── chroma_db_books_hf/ # HuggingFace embeddings vector DB
90
+
91
+ └── 🔧 Environment (Ignored)
92
+ ├── .env # Your API keys (NEVER commit!)
93
+ └── .venv/ # Virtual environment (ignored)
94
+ ```
95
+
96
+ ### 📋 File Descriptions
97
+
98
+ | File | Purpose | Generated By |
99
+ |------|---------|--------------|
100
+ | `data_exploration.py` | Data cleaning, missing value analysis, correlation heatmaps | Manual |
101
+ | `text_classification.py` | Zero-shot classification (Fiction/Non-Fiction) | Manual |
102
+ | `sentiment_analysis.py` | Emotion analysis (7 emotions) | Manual |
103
+ | `vector_search.py` | Vector embeddings and similarity search | Manual |
104
+ | `gradio_dashboard.py` | Interactive web interface | Manual |
105
+ | `books.csv` | Original dataset | User provided |
106
+ | `books_cleaned.csv` | Cleaned dataset (25+ word descriptions) | `data_exploration.py` |
107
+ | `books_with_categories.csv` | Dataset with Fiction/Non-Fiction labels | `text_classification.py` |
108
+ | `books_with_emotions.csv` | Final dataset with emotion scores | `sentiment_analysis.py` |
109
+ | `tagged_description.txt` | Text file for vector embeddings | `vector_search.py` |
110
+ | `predictions_results.csv` | Classification accuracy results | `text_classification.py` |
111
+
112
+ ### 🔄 Processing Pipeline
113
+
114
+ ```
115
+ books.csv
116
+ ↓ (data_exploration.py)
117
+ books_cleaned.csv
118
+ ↓ (text_classification.py)
119
+ books_with_categories.csv
120
+ ↓ (sentiment_analysis.py)
121
+ books_with_emotions.csv
122
+ ↓ (vector_search.py)
123
+ tagged_description.txt + Vector DB
124
+ ↓ (gradio_dashboard.py)
125
+ 📱 Web Interface
126
+ ```
127
+
128
+ ## 🔒 Security Setup (IMPORTANT!)
129
+
130
+ ### Before uploading to GitHub:
131
+
132
+ 1. **Create `.gitignore` file** (copy the one provided below)
133
+ 2. **Never commit `.env` files** - they contain your API keys
134
+ 3. **Use `.env.example`** as a template for others
135
+ 4. **Remove any API keys** from code files
136
+
137
+ ### Required `.gitignore` file:
138
+ ```gitignore
139
+ # Environment variables (NEVER commit these!)
140
+ .env
141
+ .env.local
142
+ .env.development.local
143
+ .env.test.local
144
+ .env.production.local
145
+
146
+ # Virtual environment
147
+ venv/
148
+ .venv/
149
+ env/
150
+ ENV/
151
+
152
+ # Python cache
153
+ __pycache__/
154
+ *.py[cod]
155
+ *$py.class
156
+ *.so
157
+ .Python
158
+ build/
159
+ develop-eggs/
160
+ dist/
161
+ downloads/
162
+ eggs/
163
+ .eggs/
164
+ lib/
165
+ lib64/
166
+ parts/
167
+ sdist/
168
+ var/
169
+ wheels/
170
+ *.egg-info/
171
+ .installed.cfg
172
+ *.egg
173
+ MANIFEST
174
+
175
+ # Vector databases (large files)
176
+ chroma_db_books/
177
+ chroma_db_books_hf/
178
+ *.db
179
+ *.sqlite
180
+
181
+ # Data files (add to .gitignore if sensitive)
182
+ books.csv
183
+ books_cleaned.csv
184
+ books_with_categories.csv
185
+ books_with_emotions.csv
186
+ tagged_description.txt
187
+ predictions_results.csv
188
+
189
+ # IDE files
190
+ .vscode/
191
+ .idea/
192
+ *.swp
193
+ *.swo
194
+ *~
195
+
196
+ # OS files
197
+ .DS_Store
198
+ .DS_Store?
199
+ ._*
200
+ .Spotlight-V100
201
+ .Trashes
202
+ ehthumbs.db
203
+ Thumbs.db
204
+
205
+ # Jupyter Notebook checkpoints
206
+ .ipynb_checkpoints
207
+
208
+ # PyTorch model files
209
+ *.pth
210
+ *.pt
211
+
212
+ # Logs
213
+ *.log
214
+ logs/
215
+ ```
216
+
217
+ ## 🚀 Quick Start
218
+
219
+ ### Prerequisites
220
+
221
+ - Python 3.8 or higher
222
+ - Virtual environment (recommended)
223
+
224
+ ### Installation
225
+
226
+ 1. Clone the repository:
227
+ ```bash
228
+ git clone https://github.com/yourusername/semantic-book-recommender.git
229
+ cd semantic-book-recommender
230
+ ```
231
+
232
+ 2. Create and activate virtual environment:
233
+ ```bash
234
+ python -m venv venv
235
+ source venv/bin/activate # On Windows: venv\Scripts\activate
236
+ ```
237
+
238
+ 3. Install dependencies:
239
+ ```bash
240
+ pip install -r requirements.txt
241
+ ```
242
+
243
+ 4. Set up environment variables:
244
+ ```bash
245
+ cp .env.example .env
246
+ # Edit .env with your OpenAI API key (optional, for OpenAI embeddings)
247
+ ```
248
+
249
+ ### Running the System
250
+
251
+ 1. **Data Processing Pipeline**:
252
+ ```bash
253
+ # Step 1: Clean and explore data
254
+ python data_exploration.py
255
+
256
+ # Step 2: Classify books into categories
257
+ python text_classification.py
258
+
259
+ # Step 3: Analyze emotions in book descriptions
260
+ python sentiment_analysis.py
261
+
262
+ # Step 4: Create vector database
263
+ python vector_search.py
264
+ ```
265
+
266
+ 2. **Launch Dashboard**:
267
+ ```bash
268
+ python gradio_dashboard.py
269
+ ```
270
+
271
+ Access the dashboard at `http://localhost:7860`
272
+
273
+ ## 📊 Data Requirements
274
+
275
+ The system expects a `books.csv` file with the following columns:
276
+ - `isbn13`: Unique book identifier
277
+ - `title`: Book title
278
+ - `subtitle`: Book subtitle (optional)
279
+ - `authors`: Author names (semicolon-separated)
280
+ - `categories`: Book categories
281
+ - `description`: Book description
282
+ - `num_pages`: Number of pages
283
+ - `average_rating`: Average rating (1-5 scale)
284
+ - `published_year`: Publication year
285
+ - `thumbnail`: Book cover image URL
286
+
287
+ ## 🎯 Usage Examples
288
+
289
+ ### Semantic Search
290
+ ```python
291
+ from vector_search import retrieve_semantic_recommendations
292
+
293
+ # Find books similar to a query
294
+ results = retrieve_semantic_recommendations(
295
+ "A mystery novel about redemption and forgiveness",
296
+ top_k=10
297
+ )
298
+ ```
299
+
300
+ ### Emotion-Based Filtering
301
+ ```python
302
+ # Get happy books in fiction category
303
+ recommendations = retrieve_semantic_recommendations(
304
+ query="adventure story",
305
+ category="Fiction",
306
+ tone="Happy"
307
+ )
308
+ ```
309
+
310
+ ## 🔧 Configuration
311
+
312
+ ### Model Settings
313
+ - **Embedding Model**: `sentence-transformers/all-MiniLM-L6-v2` (384 dimensions)
314
+ - **Classification Model**: `facebook/bart-large-mnli`
315
+ - **Emotion Model**: `j-hartmann/emotion-english-distilroberta-base`
316
+
317
+ ### Performance Tuning
318
+ - Adjust `initial_top_k` and `final_top_k` in recommendation functions
319
+ - Modify chunk size and overlap in text splitting
320
+ - Configure vector database persistence settings
321
+
322
+ ## 📈 Model Performance
323
+
324
+ - **Zero-Shot Classification Accuracy**: ~85% on Fiction/Non-Fiction categorization
325
+ - **Emotion Detection**: 7-class emotion classification with confidence scores
326
+ - **Semantic Search**: Cosine similarity-based ranking with embedding vectors
327
+
328
+ ## 🛠️ Technical Details
329
+
330
+ ### Dependencies
331
+ - **Core ML**: `transformers`, `torch`, `sentence-transformers`
332
+ - **Vector Database**: `chromadb`, `langchain`
333
+ - **Data Processing**: `pandas`, `numpy`
334
+ - **Visualization**: `matplotlib`, `seaborn`, `gradio`
335
+ - **Utilities**: `tqdm`, `tabulate`, `python-dotenv`
336
+
337
+ ### Hardware Requirements
338
+ - **RAM**: 8GB+ recommended for model loading
339
+ - **GPU**: Optional, supports CUDA/MPS for faster inference
340
+ - **Storage**: 2GB+ for model weights and vector database
341
+
342
+ ## 📝 API Reference
343
+
344
+ ### Main Functions
345
+
346
+ #### `retrieve_semantic_recommendations(query, category, tone, initial_top_k, final_top_k)`
347
+ Returns book recommendations based on semantic similarity and filters.
348
+
349
+ **Parameters:**
350
+ - `query` (str): Search query describing desired book
351
+ - `category` (str): Book category filter ("All", "Fiction", "Non-Fiction", etc.)
352
+ - `tone` (str): Emotional tone filter ("Happy", "Sad", "Suspenseful", etc.)
353
+ - `initial_top_k` (int): Initial number of candidates to retrieve
354
+ - `final_top_k` (int): Final number of recommendations to return
355
+
356
+ **Returns:**
357
+ - `pandas.DataFrame`: Filtered book recommendations with metadata
358
+
359
+ ## 🤝 Contributing
360
+
361
+ 1. Fork the repository
362
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
363
+ 3. Commit changes (`git commit -m 'Add amazing feature'`)
364
+ 4. Push to branch (`git push origin feature/amazing-feature`)
365
+ 5. Open a Pull Request
366
+
367
+ ## 📄 License
368
+
369
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
370
+
371
+ ## 🙏 Acknowledgments
372
+
373
+ - HuggingFace for providing pre-trained models
374
+ - OpenAI for embedding models
375
+ - ChromaDB for vector database functionality
376
+ - Gradio for the intuitive web interface
377
+ - The open-source community for various Python libraries
378
+
379
+ ## 📚 References
380
+
381
+ - [Sentence Transformers Documentation](https://www.sbert.net/)
382
+ - [LangChain Documentation](https://python.langchain.com/)
383
+ - [Gradio Documentation](https://gradio.app/docs/)
384
+ - [ChromaDB Documentation](https://docs.trychroma.com/)
385
+
386
+ ---
387
+
388
+ **Note**: This system is designed for educational and research purposes. Ensure compliance with data usage policies and model licenses when deploying in production environments.
books.csv ADDED
The diff for this file is too large to render. See raw diff
 
books_cleaned.csv ADDED
The diff for this file is too large to render. See raw diff
 
books_with_categories.csv ADDED
The diff for this file is too large to render. See raw diff
 
books_with_emotions.csv ADDED
The diff for this file is too large to render. See raw diff
 
cover-not-found.jpg ADDED
gradio_dashboard.py ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ from dotenv import load_dotenv
4
+ from langchain_community.document_loaders import TextLoader
5
+ from langchain_openai import OpenAIEmbeddings
6
+ from langchain_huggingface import HuggingFaceEmbeddings
7
+ from langchain_text_splitters import CharacterTextSplitter
8
+ from langchain_chroma import Chroma
9
+ import matplotlib.pyplot as plt
10
+ import seaborn as sns
11
+ import io
12
+ import gradio as gr
13
+ from PIL import Image
14
+
15
+ load_dotenv()
16
+
17
+ books = pd.read_csv("books_with_emotions.csv")
18
+ books["large_thumbnail"] = books["thumbnail"] + "&fife=w800"
19
+ books ["large_thumbnail"] = np.where(
20
+ books["large_thumbnail"].isna(),
21
+ "cover-not-found.jpg",
22
+ books ["large_thumbnail"],
23
+ )
24
+
25
+ raw_documents = TextLoader("tagged_description.txt").load()
26
+ text_splitter = CharacterTextSplitter(separator="\n", chunk_size=0, chunk_overlap=0)
27
+ documents = text_splitter.split_documents(raw_documents)
28
+ db_books = Chroma.from_documents(documents, HuggingFaceEmbeddings())
29
+
30
+ def retrieve_semantic_recommendations(
31
+ query: str,
32
+ category: str = None,
33
+ tone: str = None,
34
+ initial_top_k: int = 50,
35
+ final_top_k: int = 16,
36
+ ) -> pd.DataFrame:
37
+ recs = db_books.similarity_search_with_score(query, k=initial_top_k)
38
+ books_list = [
39
+ int(rec[0].page_content.strip('"').split()[0]) if isinstance(rec, tuple)
40
+ else int(rec.page_content.strip('"').split()[0])
41
+ for rec in recs]
42
+ book_recs = books.loc[books["isbn13"].isin(books_list)]
43
+ if category != "All":
44
+ book_recs = book_recs[book_recs["simple_categories"] == category]
45
+
46
+ # Tone-based emotion sorting
47
+ if tone and tone != "All":
48
+ tone_column_map = {
49
+ "Happy": "joy",
50
+ "Surprising": "surprise",
51
+ "Angry": "anger",
52
+ "Suspenseful": "fear",
53
+ "Sad": "sadness"
54
+ }
55
+ tone_col = tone_column_map.get(tone)
56
+ if tone_col and tone_col in book_recs.columns:
57
+ book_recs = book_recs.sort_values(by=tone_col, ascending=False)
58
+
59
+ return book_recs.head(final_top_k)
60
+
61
+ def recommend_books(
62
+ query: str,
63
+ category: str,
64
+ tone: str
65
+ ):
66
+ recommendations = retrieve_semantic_recommendations(query, category, tone)
67
+ results = []
68
+
69
+ for _, row in recommendations.iterrows ():
70
+ description = row["description"]
71
+ truncated_desc_split = description.split()
72
+ truncated_description = " ".join(truncated_desc_split[:30]) + "..."
73
+
74
+ authors_split = row["authors"].split(";")
75
+ if len(authors_split) == 2:
76
+ authors_str = f"{authors_split[0]} and {authors_split[1]}"
77
+ elif len(authors_split) > 2:
78
+ authors_str = f"{', '.join(authors_split[:-1])}, and {authors_split[-1]}"
79
+ else:
80
+ authors_str = row["authors"]
81
+
82
+ caption = f"{row['title']} by {authors_str}: {truncated_description}"
83
+ results.append((row["large_thumbnail"], caption))
84
+
85
+ return results
86
+
87
+ # --- Functions for Visuals and Stats --- #
88
+ def plot_pie(column):
89
+ fig, ax = plt.subplots(figsize=(6, 6))
90
+ books[column].fillna("Unknown").value_counts().head(5).plot.pie(autopct="%1.1f%%", startangle=90, ax=ax)
91
+ ax.set_ylabel("")
92
+ ax.set_title(f"Top 5 {column} Distribution")
93
+ buf = io.BytesIO()
94
+ plt.savefig(buf, format="png", bbox_inches="tight")
95
+ buf.seek(0)
96
+ plt.close()
97
+ return Image.open(buf)
98
+
99
+ def get_missing_df():
100
+ return books.isnull().sum().reset_index().rename(columns={"index": "Column", 0: "Missing Values"})
101
+
102
+ def get_summary_df():
103
+ return books.describe(include="all").T.fillna("").reset_index().rename(columns={"index": "Column"})
104
+
105
+ def filter_by_rating(min_rating):
106
+ filtered = books[books["average_rating"] >= min_rating]
107
+ return filtered[["title", "average_rating", "authors"]].head(20)
108
+
109
+ def plot_author_boxplot():
110
+ fig, ax = plt.subplots(figsize=(8, 4))
111
+ author_counts = books["simple_categories"].value_counts().index[:5]
112
+ data = books[books["simple_categories"].isin(author_counts)]
113
+ data["num_authors"] = data["authors"].fillna("").apply(lambda x: len(str(x).split(";")))
114
+ sns.boxplot(data=data, x="simple_categories", y="num_authors", hue="simple_categories", palette="Set2", ax=ax, legend=False)
115
+ ax.set_title("Number of Authors per Category")
116
+ ax.set_ylabel("Number of Authors")
117
+ ax.set_xlabel("Category")
118
+ buf = io.BytesIO()
119
+ plt.savefig(buf, format="png", bbox_inches="tight")
120
+ buf.seek(0)
121
+ plt.close()
122
+ return Image.open(buf)
123
+
124
+ def get_thumbnails(category):
125
+ df = books[books["simple_categories"] == category].dropna(subset=["thumbnail"]).head(8)
126
+ return list(df["thumbnail"])
127
+
128
+ # Category & tone setup
129
+ categories = ["All"] + sorted(books["simple_categories"].unique())
130
+ tones = ["All", "Happy", "Surprising", "Angry", "Suspenseful", "Sad"]
131
+
132
+ # Custom theme
133
+ custom_theme = gr.themes.Base(
134
+ primary_hue="violet",
135
+ secondary_hue="stone",
136
+ font=["Plus-jakarta-sans", "sans-serif"]
137
+ )
138
+
139
+ # Gradio UI
140
+ with gr.Blocks(theme=custom_theme) as dashboard:
141
+ with gr.Tab("🔍 Recommender"):
142
+ gr.Markdown("""
143
+ <style>
144
+ #form-section {
145
+ padding: 18px;
146
+ border: 1px solid #dcdcdc;
147
+ border-radius: 15px;
148
+ background: #fdfdfd;
149
+ margin-bottom: 1.5rem;
150
+ }
151
+ .title {
152
+ font-size: 32px;
153
+ font-weight: bold;
154
+ text-align: center;
155
+ margin-bottom: 1em;
156
+ }
157
+ </style>
158
+ """)
159
+
160
+ gr.Markdown("# 📚 Semantic Book Recommender", elem_classes="title")
161
+ gr.Markdown("Describe your ideal book and get smart recommendations based on semantics and emotions 🎯")
162
+
163
+ with gr.Group(elem_id="form-section"):
164
+ with gr.Row():
165
+ user_query = gr.Textbox(
166
+ label="🔍 Book Description",
167
+ placeholder="e.g., A story about forgiveness, mystery, and redemption",
168
+ lines=2
169
+ )
170
+ with gr.Row():
171
+ category_dropdown = gr.Dropdown(choices=categories, label="📂 Category", value="All")
172
+ tone_dropdown = gr.Dropdown(choices=tones, label="🎭 Emotional Tone", value="All")
173
+ with gr.Row():
174
+ submit_button = gr.Button("🚀 Find Recommendations", variant="primary")
175
+
176
+ gr.Markdown("## 🧠 Smart Recommendations")
177
+ output = gr.Gallery(label="📚 Recommended Books", columns=4, rows=2, height="auto", preview=False)
178
+
179
+ submit_button.click(
180
+ fn=recommend_books,
181
+ inputs=[user_query, category_dropdown, tone_dropdown],
182
+ outputs=output
183
+ )
184
+
185
+ with gr.Tab("📊 Dataset Statistics"):
186
+ gr.Markdown("## 🧮 Dataset Summary Table")
187
+ gr.Dataframe(value=get_summary_df(), interactive=False)
188
+
189
+ gr.Markdown("## ❓ Missing Values Table")
190
+ gr.Dataframe(value=get_missing_df(), interactive=False)
191
+
192
+ gr.Markdown("## 🧁 Pie Chart Visualization")
193
+ categorical_cols = books.select_dtypes(include=["object", "category"]).columns.tolist()
194
+ col_dropdown = gr.Dropdown(
195
+ choices=categorical_cols,
196
+ value=categorical_cols[0] if categorical_cols else None,
197
+ label="Select Column"
198
+ )
199
+ pie_img = gr.Image(type="pil", label="Pie Chart")
200
+ col_dropdown.change(fn=plot_pie, inputs=col_dropdown, outputs=pie_img)
201
+
202
+ gr.Markdown("## 🌡️ Histogram Filter by Rating")
203
+ rating_slider = gr.Slider(minimum=0, maximum=5, step=0.1, value=3.5, label="Minimum Rating")
204
+ rating_table = gr.Dataframe(label="Books Above Rating", interactive=False)
205
+ rating_slider.change(fn=filter_by_rating, inputs=rating_slider, outputs=rating_table)
206
+
207
+ gr.Markdown("## 📦 Boxplot: Authors per Category")
208
+ box_img = gr.Image(type="pil", value=plot_author_boxplot, label="Author Count Boxplot")
209
+
210
+ gr.Markdown("## 🖼️ Top Book Covers by Category")
211
+ cat_dropdown = gr.Dropdown(choices=books["simple_categories"].dropna().unique().tolist(), label="Select Category")
212
+ gallery = gr.Gallery(label="Thumbnails", columns=4, height="auto")
213
+ cat_dropdown.change(fn=get_thumbnails, inputs=cat_dropdown, outputs=gallery)
214
+
215
+
216
+ # Run app
217
+ if __name__ == "__main__":
218
+ dashboard.launch()
219
+
predictions_results.csv ADDED
@@ -0,0 +1,601 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ actual_categories,predicted_categories
2
+ Fiction,Fiction
3
+ Fiction,Fiction
4
+ Fiction,Fiction
5
+ Fiction,Nonfiction
6
+ Fiction,Fiction
7
+ Fiction,Fiction
8
+ Fiction,Fiction
9
+ Fiction,Fiction
10
+ Fiction,Fiction
11
+ Fiction,Fiction
12
+ Fiction,Fiction
13
+ Fiction,Fiction
14
+ Fiction,Fiction
15
+ Fiction,Fiction
16
+ Fiction,Fiction
17
+ Fiction,Fiction
18
+ Fiction,Fiction
19
+ Fiction,Fiction
20
+ Fiction,Fiction
21
+ Fiction,Nonfiction
22
+ Fiction,Fiction
23
+ Fiction,Nonfiction
24
+ Fiction,Nonfiction
25
+ Fiction,Fiction
26
+ Fiction,Fiction
27
+ Fiction,Nonfiction
28
+ Fiction,Fiction
29
+ Fiction,Nonfiction
30
+ Fiction,Nonfiction
31
+ Fiction,Fiction
32
+ Fiction,Fiction
33
+ Fiction,Fiction
34
+ Fiction,Fiction
35
+ Fiction,Fiction
36
+ Fiction,Nonfiction
37
+ Fiction,Nonfiction
38
+ Fiction,Fiction
39
+ Fiction,Fiction
40
+ Fiction,Nonfiction
41
+ Fiction,Fiction
42
+ Fiction,Nonfiction
43
+ Fiction,Nonfiction
44
+ Fiction,Fiction
45
+ Fiction,Fiction
46
+ Fiction,Fiction
47
+ Fiction,Fiction
48
+ Fiction,Fiction
49
+ Fiction,Nonfiction
50
+ Fiction,Fiction
51
+ Fiction,Fiction
52
+ Fiction,Fiction
53
+ Fiction,Fiction
54
+ Fiction,Nonfiction
55
+ Fiction,Fiction
56
+ Fiction,Fiction
57
+ Fiction,Nonfiction
58
+ Fiction,Fiction
59
+ Fiction,Fiction
60
+ Fiction,Fiction
61
+ Fiction,Fiction
62
+ Fiction,Fiction
63
+ Fiction,Fiction
64
+ Fiction,Fiction
65
+ Fiction,Fiction
66
+ Fiction,Fiction
67
+ Fiction,Nonfiction
68
+ Fiction,Fiction
69
+ Fiction,Nonfiction
70
+ Fiction,Nonfiction
71
+ Fiction,Nonfiction
72
+ Fiction,Fiction
73
+ Fiction,Fiction
74
+ Fiction,Nonfiction
75
+ Fiction,Nonfiction
76
+ Fiction,Nonfiction
77
+ Fiction,Nonfiction
78
+ Fiction,Fiction
79
+ Fiction,Fiction
80
+ Fiction,Fiction
81
+ Fiction,Fiction
82
+ Fiction,Fiction
83
+ Fiction,Nonfiction
84
+ Fiction,Nonfiction
85
+ Fiction,Nonfiction
86
+ Fiction,Nonfiction
87
+ Fiction,Nonfiction
88
+ Fiction,Nonfiction
89
+ Fiction,Fiction
90
+ Fiction,Fiction
91
+ Fiction,Fiction
92
+ Fiction,Fiction
93
+ Fiction,Fiction
94
+ Fiction,Nonfiction
95
+ Fiction,Fiction
96
+ Fiction,Fiction
97
+ Fiction,Fiction
98
+ Fiction,Fiction
99
+ Fiction,Fiction
100
+ Fiction,Fiction
101
+ Fiction,Fiction
102
+ Fiction,Nonfiction
103
+ Fiction,Fiction
104
+ Fiction,Fiction
105
+ Fiction,Nonfiction
106
+ Fiction,Fiction
107
+ Fiction,Fiction
108
+ Fiction,Fiction
109
+ Fiction,Fiction
110
+ Fiction,Fiction
111
+ Fiction,Nonfiction
112
+ Fiction,Fiction
113
+ Fiction,Fiction
114
+ Fiction,Nonfiction
115
+ Fiction,Fiction
116
+ Fiction,Fiction
117
+ Fiction,Fiction
118
+ Fiction,Nonfiction
119
+ Fiction,Fiction
120
+ Fiction,Nonfiction
121
+ Fiction,Nonfiction
122
+ Fiction,Fiction
123
+ Fiction,Fiction
124
+ Fiction,Nonfiction
125
+ Fiction,Fiction
126
+ Fiction,Fiction
127
+ Fiction,Nonfiction
128
+ Fiction,Fiction
129
+ Fiction,Fiction
130
+ Fiction,Fiction
131
+ Fiction,Fiction
132
+ Fiction,Fiction
133
+ Fiction,Fiction
134
+ Fiction,Fiction
135
+ Fiction,Fiction
136
+ Fiction,Fiction
137
+ Fiction,Fiction
138
+ Fiction,Fiction
139
+ Fiction,Nonfiction
140
+ Fiction,Fiction
141
+ Fiction,Fiction
142
+ Fiction,Fiction
143
+ Fiction,Fiction
144
+ Fiction,Fiction
145
+ Fiction,Nonfiction
146
+ Fiction,Fiction
147
+ Fiction,Fiction
148
+ Fiction,Nonfiction
149
+ Fiction,Nonfiction
150
+ Fiction,Fiction
151
+ Fiction,Nonfiction
152
+ Fiction,Fiction
153
+ Fiction,Fiction
154
+ Fiction,Fiction
155
+ Fiction,Fiction
156
+ Fiction,Fiction
157
+ Fiction,Fiction
158
+ Fiction,Nonfiction
159
+ Fiction,Fiction
160
+ Fiction,Fiction
161
+ Fiction,Nonfiction
162
+ Fiction,Fiction
163
+ Fiction,Fiction
164
+ Fiction,Nonfiction
165
+ Fiction,Nonfiction
166
+ Fiction,Fiction
167
+ Fiction,Nonfiction
168
+ Fiction,Fiction
169
+ Fiction,Nonfiction
170
+ Fiction,Fiction
171
+ Fiction,Fiction
172
+ Fiction,Fiction
173
+ Fiction,Fiction
174
+ Fiction,Nonfiction
175
+ Fiction,Nonfiction
176
+ Fiction,Nonfiction
177
+ Fiction,Fiction
178
+ Fiction,Fiction
179
+ Fiction,Nonfiction
180
+ Fiction,Nonfiction
181
+ Fiction,Fiction
182
+ Fiction,Fiction
183
+ Fiction,Fiction
184
+ Fiction,Fiction
185
+ Fiction,Nonfiction
186
+ Fiction,Fiction
187
+ Fiction,Nonfiction
188
+ Fiction,Fiction
189
+ Fiction,Nonfiction
190
+ Fiction,Nonfiction
191
+ Fiction,Nonfiction
192
+ Fiction,Fiction
193
+ Fiction,Fiction
194
+ Fiction,Fiction
195
+ Fiction,Fiction
196
+ Fiction,Fiction
197
+ Fiction,Nonfiction
198
+ Fiction,Fiction
199
+ Fiction,Fiction
200
+ Fiction,Fiction
201
+ Fiction,Fiction
202
+ Fiction,Nonfiction
203
+ Fiction,Fiction
204
+ Fiction,Nonfiction
205
+ Fiction,Fiction
206
+ Fiction,Nonfiction
207
+ Fiction,Fiction
208
+ Fiction,Fiction
209
+ Fiction,Nonfiction
210
+ Fiction,Fiction
211
+ Fiction,Fiction
212
+ Fiction,Nonfiction
213
+ Fiction,Nonfiction
214
+ Fiction,Fiction
215
+ Fiction,Nonfiction
216
+ Fiction,Fiction
217
+ Fiction,Fiction
218
+ Fiction,Fiction
219
+ Fiction,Fiction
220
+ Fiction,Nonfiction
221
+ Fiction,Fiction
222
+ Fiction,Fiction
223
+ Fiction,Nonfiction
224
+ Fiction,Fiction
225
+ Fiction,Fiction
226
+ Fiction,Fiction
227
+ Fiction,Fiction
228
+ Fiction,Fiction
229
+ Fiction,Nonfiction
230
+ Fiction,Fiction
231
+ Fiction,Nonfiction
232
+ Fiction,Fiction
233
+ Fiction,Fiction
234
+ Fiction,Fiction
235
+ Fiction,Fiction
236
+ Fiction,Nonfiction
237
+ Fiction,Fiction
238
+ Fiction,Nonfiction
239
+ Fiction,Fiction
240
+ Fiction,Fiction
241
+ Fiction,Fiction
242
+ Fiction,Fiction
243
+ Fiction,Fiction
244
+ Fiction,Fiction
245
+ Fiction,Fiction
246
+ Fiction,Fiction
247
+ Fiction,Nonfiction
248
+ Fiction,Fiction
249
+ Fiction,Fiction
250
+ Fiction,Fiction
251
+ Fiction,Fiction
252
+ Fiction,Nonfiction
253
+ Fiction,Nonfiction
254
+ Fiction,Nonfiction
255
+ Fiction,Nonfiction
256
+ Fiction,Fiction
257
+ Fiction,Fiction
258
+ Fiction,Nonfiction
259
+ Fiction,Fiction
260
+ Fiction,Nonfiction
261
+ Fiction,Fiction
262
+ Fiction,Nonfiction
263
+ Fiction,Nonfiction
264
+ Fiction,Nonfiction
265
+ Fiction,Fiction
266
+ Fiction,Nonfiction
267
+ Fiction,Nonfiction
268
+ Fiction,Fiction
269
+ Fiction,Fiction
270
+ Fiction,Fiction
271
+ Fiction,Fiction
272
+ Fiction,Fiction
273
+ Fiction,Fiction
274
+ Fiction,Fiction
275
+ Fiction,Nonfiction
276
+ Fiction,Nonfiction
277
+ Fiction,Fiction
278
+ Fiction,Nonfiction
279
+ Fiction,Fiction
280
+ Fiction,Fiction
281
+ Fiction,Fiction
282
+ Fiction,Fiction
283
+ Fiction,Nonfiction
284
+ Fiction,Fiction
285
+ Fiction,Fiction
286
+ Fiction,Nonfiction
287
+ Fiction,Fiction
288
+ Fiction,Fiction
289
+ Fiction,Fiction
290
+ Fiction,Fiction
291
+ Fiction,Fiction
292
+ Fiction,Nonfiction
293
+ Fiction,Fiction
294
+ Fiction,Nonfiction
295
+ Fiction,Fiction
296
+ Fiction,Nonfiction
297
+ Fiction,Nonfiction
298
+ Fiction,Fiction
299
+ Fiction,Nonfiction
300
+ Fiction,Fiction
301
+ Fiction,Fiction
302
+ Nonfiction,Nonfiction
303
+ Nonfiction,Nonfiction
304
+ Nonfiction,Nonfiction
305
+ Nonfiction,Fiction
306
+ Nonfiction,Nonfiction
307
+ Nonfiction,Nonfiction
308
+ Nonfiction,Fiction
309
+ Nonfiction,Nonfiction
310
+ Nonfiction,Nonfiction
311
+ Nonfiction,Nonfiction
312
+ Nonfiction,Nonfiction
313
+ Nonfiction,Nonfiction
314
+ Nonfiction,Nonfiction
315
+ Nonfiction,Nonfiction
316
+ Nonfiction,Nonfiction
317
+ Nonfiction,Nonfiction
318
+ Nonfiction,Fiction
319
+ Nonfiction,Nonfiction
320
+ Nonfiction,Nonfiction
321
+ Nonfiction,Fiction
322
+ Nonfiction,Nonfiction
323
+ Nonfiction,Nonfiction
324
+ Nonfiction,Nonfiction
325
+ Nonfiction,Nonfiction
326
+ Nonfiction,Nonfiction
327
+ Nonfiction,Nonfiction
328
+ Nonfiction,Nonfiction
329
+ Nonfiction,Nonfiction
330
+ Nonfiction,Nonfiction
331
+ Nonfiction,Nonfiction
332
+ Nonfiction,Nonfiction
333
+ Nonfiction,Nonfiction
334
+ Nonfiction,Nonfiction
335
+ Nonfiction,Fiction
336
+ Nonfiction,Nonfiction
337
+ Nonfiction,Nonfiction
338
+ Nonfiction,Nonfiction
339
+ Nonfiction,Nonfiction
340
+ Nonfiction,Nonfiction
341
+ Nonfiction,Nonfiction
342
+ Nonfiction,Nonfiction
343
+ Nonfiction,Nonfiction
344
+ Nonfiction,Nonfiction
345
+ Nonfiction,Nonfiction
346
+ Nonfiction,Nonfiction
347
+ Nonfiction,Nonfiction
348
+ Nonfiction,Nonfiction
349
+ Nonfiction,Nonfiction
350
+ Nonfiction,Nonfiction
351
+ Nonfiction,Fiction
352
+ Nonfiction,Nonfiction
353
+ Nonfiction,Nonfiction
354
+ Nonfiction,Nonfiction
355
+ Nonfiction,Nonfiction
356
+ Nonfiction,Nonfiction
357
+ Nonfiction,Nonfiction
358
+ Nonfiction,Nonfiction
359
+ Nonfiction,Nonfiction
360
+ Nonfiction,Nonfiction
361
+ Nonfiction,Nonfiction
362
+ Nonfiction,Nonfiction
363
+ Nonfiction,Nonfiction
364
+ Nonfiction,Nonfiction
365
+ Nonfiction,Nonfiction
366
+ Nonfiction,Nonfiction
367
+ Nonfiction,Nonfiction
368
+ Nonfiction,Nonfiction
369
+ Nonfiction,Fiction
370
+ Nonfiction,Nonfiction
371
+ Nonfiction,Nonfiction
372
+ Nonfiction,Nonfiction
373
+ Nonfiction,Nonfiction
374
+ Nonfiction,Nonfiction
375
+ Nonfiction,Nonfiction
376
+ Nonfiction,Nonfiction
377
+ Nonfiction,Fiction
378
+ Nonfiction,Nonfiction
379
+ Nonfiction,Nonfiction
380
+ Nonfiction,Nonfiction
381
+ Nonfiction,Nonfiction
382
+ Nonfiction,Nonfiction
383
+ Nonfiction,Nonfiction
384
+ Nonfiction,Nonfiction
385
+ Nonfiction,Nonfiction
386
+ Nonfiction,Nonfiction
387
+ Nonfiction,Nonfiction
388
+ Nonfiction,Nonfiction
389
+ Nonfiction,Nonfiction
390
+ Nonfiction,Nonfiction
391
+ Nonfiction,Nonfiction
392
+ Nonfiction,Nonfiction
393
+ Nonfiction,Nonfiction
394
+ Nonfiction,Fiction
395
+ Nonfiction,Nonfiction
396
+ Nonfiction,Nonfiction
397
+ Nonfiction,Nonfiction
398
+ Nonfiction,Nonfiction
399
+ Nonfiction,Nonfiction
400
+ Nonfiction,Nonfiction
401
+ Nonfiction,Nonfiction
402
+ Nonfiction,Nonfiction
403
+ Nonfiction,Nonfiction
404
+ Nonfiction,Fiction
405
+ Nonfiction,Fiction
406
+ Nonfiction,Nonfiction
407
+ Nonfiction,Nonfiction
408
+ Nonfiction,Nonfiction
409
+ Nonfiction,Nonfiction
410
+ Nonfiction,Nonfiction
411
+ Nonfiction,Nonfiction
412
+ Nonfiction,Nonfiction
413
+ Nonfiction,Nonfiction
414
+ Nonfiction,Nonfiction
415
+ Nonfiction,Nonfiction
416
+ Nonfiction,Nonfiction
417
+ Nonfiction,Nonfiction
418
+ Nonfiction,Nonfiction
419
+ Nonfiction,Nonfiction
420
+ Nonfiction,Nonfiction
421
+ Nonfiction,Nonfiction
422
+ Nonfiction,Nonfiction
423
+ Nonfiction,Fiction
424
+ Nonfiction,Nonfiction
425
+ Nonfiction,Fiction
426
+ Nonfiction,Nonfiction
427
+ Nonfiction,Nonfiction
428
+ Nonfiction,Nonfiction
429
+ Nonfiction,Nonfiction
430
+ Nonfiction,Nonfiction
431
+ Nonfiction,Fiction
432
+ Nonfiction,Fiction
433
+ Nonfiction,Nonfiction
434
+ Nonfiction,Nonfiction
435
+ Nonfiction,Nonfiction
436
+ Nonfiction,Nonfiction
437
+ Nonfiction,Fiction
438
+ Nonfiction,Nonfiction
439
+ Nonfiction,Fiction
440
+ Nonfiction,Nonfiction
441
+ Nonfiction,Nonfiction
442
+ Nonfiction,Fiction
443
+ Nonfiction,Fiction
444
+ Nonfiction,Fiction
445
+ Nonfiction,Nonfiction
446
+ Nonfiction,Nonfiction
447
+ Nonfiction,Nonfiction
448
+ Nonfiction,Nonfiction
449
+ Nonfiction,Nonfiction
450
+ Nonfiction,Nonfiction
451
+ Nonfiction,Nonfiction
452
+ Nonfiction,Nonfiction
453
+ Nonfiction,Nonfiction
454
+ Nonfiction,Nonfiction
455
+ Nonfiction,Nonfiction
456
+ Nonfiction,Fiction
457
+ Nonfiction,Nonfiction
458
+ Nonfiction,Nonfiction
459
+ Nonfiction,Nonfiction
460
+ Nonfiction,Nonfiction
461
+ Nonfiction,Nonfiction
462
+ Nonfiction,Nonfiction
463
+ Nonfiction,Nonfiction
464
+ Nonfiction,Nonfiction
465
+ Nonfiction,Nonfiction
466
+ Nonfiction,Nonfiction
467
+ Nonfiction,Nonfiction
468
+ Nonfiction,Nonfiction
469
+ Nonfiction,Nonfiction
470
+ Nonfiction,Nonfiction
471
+ Nonfiction,Nonfiction
472
+ Nonfiction,Nonfiction
473
+ Nonfiction,Nonfiction
474
+ Nonfiction,Nonfiction
475
+ Nonfiction,Nonfiction
476
+ Nonfiction,Nonfiction
477
+ Nonfiction,Nonfiction
478
+ Nonfiction,Nonfiction
479
+ Nonfiction,Nonfiction
480
+ Nonfiction,Nonfiction
481
+ Nonfiction,Nonfiction
482
+ Nonfiction,Nonfiction
483
+ Nonfiction,Nonfiction
484
+ Nonfiction,Nonfiction
485
+ Nonfiction,Nonfiction
486
+ Nonfiction,Nonfiction
487
+ Nonfiction,Nonfiction
488
+ Nonfiction,Nonfiction
489
+ Nonfiction,Nonfiction
490
+ Nonfiction,Nonfiction
491
+ Nonfiction,Nonfiction
492
+ Nonfiction,Fiction
493
+ Nonfiction,Nonfiction
494
+ Nonfiction,Nonfiction
495
+ Nonfiction,Nonfiction
496
+ Nonfiction,Nonfiction
497
+ Nonfiction,Nonfiction
498
+ Nonfiction,Nonfiction
499
+ Nonfiction,Fiction
500
+ Nonfiction,Fiction
501
+ Nonfiction,Nonfiction
502
+ Nonfiction,Fiction
503
+ Nonfiction,Nonfiction
504
+ Nonfiction,Nonfiction
505
+ Nonfiction,Nonfiction
506
+ Nonfiction,Nonfiction
507
+ Nonfiction,Nonfiction
508
+ Nonfiction,Nonfiction
509
+ Nonfiction,Nonfiction
510
+ Nonfiction,Nonfiction
511
+ Nonfiction,Nonfiction
512
+ Nonfiction,Nonfiction
513
+ Nonfiction,Fiction
514
+ Nonfiction,Nonfiction
515
+ Nonfiction,Nonfiction
516
+ Nonfiction,Nonfiction
517
+ Nonfiction,Nonfiction
518
+ Nonfiction,Nonfiction
519
+ Nonfiction,Nonfiction
520
+ Nonfiction,Nonfiction
521
+ Nonfiction,Nonfiction
522
+ Nonfiction,Nonfiction
523
+ Nonfiction,Nonfiction
524
+ Nonfiction,Nonfiction
525
+ Nonfiction,Nonfiction
526
+ Nonfiction,Nonfiction
527
+ Nonfiction,Nonfiction
528
+ Nonfiction,Nonfiction
529
+ Nonfiction,Nonfiction
530
+ Nonfiction,Nonfiction
531
+ Nonfiction,Nonfiction
532
+ Nonfiction,Nonfiction
533
+ Nonfiction,Nonfiction
534
+ Nonfiction,Fiction
535
+ Nonfiction,Nonfiction
536
+ Nonfiction,Nonfiction
537
+ Nonfiction,Nonfiction
538
+ Nonfiction,Nonfiction
539
+ Nonfiction,Nonfiction
540
+ Nonfiction,Nonfiction
541
+ Nonfiction,Nonfiction
542
+ Nonfiction,Nonfiction
543
+ Nonfiction,Nonfiction
544
+ Nonfiction,Fiction
545
+ Nonfiction,Nonfiction
546
+ Nonfiction,Nonfiction
547
+ Nonfiction,Nonfiction
548
+ Nonfiction,Nonfiction
549
+ Nonfiction,Nonfiction
550
+ Nonfiction,Nonfiction
551
+ Nonfiction,Nonfiction
552
+ Nonfiction,Nonfiction
553
+ Nonfiction,Fiction
554
+ Nonfiction,Fiction
555
+ Nonfiction,Nonfiction
556
+ Nonfiction,Nonfiction
557
+ Nonfiction,Nonfiction
558
+ Nonfiction,Nonfiction
559
+ Nonfiction,Nonfiction
560
+ Nonfiction,Fiction
561
+ Nonfiction,Nonfiction
562
+ Nonfiction,Nonfiction
563
+ Nonfiction,Fiction
564
+ Nonfiction,Nonfiction
565
+ Nonfiction,Nonfiction
566
+ Nonfiction,Nonfiction
567
+ Nonfiction,Nonfiction
568
+ Nonfiction,Nonfiction
569
+ Nonfiction,Nonfiction
570
+ Nonfiction,Nonfiction
571
+ Nonfiction,Nonfiction
572
+ Nonfiction,Fiction
573
+ Nonfiction,Fiction
574
+ Nonfiction,Nonfiction
575
+ Nonfiction,Nonfiction
576
+ Nonfiction,Nonfiction
577
+ Nonfiction,Fiction
578
+ Nonfiction,Nonfiction
579
+ Nonfiction,Nonfiction
580
+ Nonfiction,Nonfiction
581
+ Nonfiction,Nonfiction
582
+ Nonfiction,Nonfiction
583
+ Nonfiction,Nonfiction
584
+ Nonfiction,Nonfiction
585
+ Nonfiction,Nonfiction
586
+ Nonfiction,Nonfiction
587
+ Nonfiction,Nonfiction
588
+ Nonfiction,Nonfiction
589
+ Nonfiction,Nonfiction
590
+ Nonfiction,Nonfiction
591
+ Nonfiction,Nonfiction
592
+ Nonfiction,Nonfiction
593
+ Nonfiction,Nonfiction
594
+ Nonfiction,Nonfiction
595
+ Nonfiction,Nonfiction
596
+ Nonfiction,Nonfiction
597
+ Nonfiction,Nonfiction
598
+ Nonfiction,Fiction
599
+ Nonfiction,Nonfiction
600
+ Nonfiction,Nonfiction
601
+ Nonfiction,Fiction
requirements.txt ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core ML and NLP libraries
2
+ transformers>=4.21.0
3
+ torch>=1.12.0
4
+ sentence-transformers>=2.2.0
5
+ tokenizers>=0.13.0
6
+
7
+ # Vector database and search
8
+ chromadb>=0.4.0
9
+ langchain>=0.1.0
10
+ langchain-community>=0.0.20
11
+ langchain-chroma>=0.1.0
12
+ langchain-openai>=0.0.5
13
+ langchain-huggingface>=0.0.1
14
+ langchain-text-splitters>=0.0.1
15
+
16
+ # Data processing and analysis
17
+ pandas>=1.5.0
18
+ numpy>=1.21.0
19
+ scikit-learn>=1.1.0
20
+
21
+ # Visualization and UI
22
+ matplotlib>=3.5.0
23
+ seaborn>=0.11.0
24
+ gradio>=3.40.0
25
+ plotly>=5.10.0
26
+ Pillow>=9.0.0
27
+
28
+ # Utilities
29
+ tqdm>=4.64.0
30
+ tabulate>=0.8.0
31
+ python-dotenv>=0.19.0
32
+
33
+ # OpenAI (optional, for OpenAI embeddings)
34
+ openai>=1.0.0
35
+
36
+ # Additional dependencies for specific functionalities
37
+ datasets>=2.5.0
38
+ accelerate>=0.21.0
tagged_description.txt ADDED
The diff for this file is too large to render. See raw diff