Spaces:

AngelaColmen
/

semantic-library-search

Sleeping

App Files Files Community

semantic-library-search / README

AngelaColmen

Update README

3d84227 verified 29 days ago

raw

history blame contribute delete

4.17 kB

	# 📚 Semantic Library Search

	An AI-powered library search engine that finds books by meaning, not just keywords. Built as a sample project in AI/ML.

	## What Is Semantic Search?

	Traditional library search looks for exact keyword matches. If you search for "books about the cosmos" you might miss books that use the word "universe" or "astronomy" instead.

	Semantic search understands meaning. It knows that "cosmos", "universe", "space", and "astronomy" are all related concepts — and returns relevant results even when the exact words don't match.

	## How It Works

	1. Catalog Loading — Book metadata is loaded from a CSV catalog file
	2. Embedding Generation — Each book description is converted into a mathematical "meaning fingerprint" using a Sentence Transformer AI model
	3. Vector Indexing — These fingerprints are stored in a FAISS index for fast similarity searching
	4. Query Processing — When a user searches, their query is converted into a fingerprint and compared against all books in the index
	5. Results Returned — The closest matching books are returned ranked by semantic similarity

	## Technologies Used

	- Python 3.14 — Core programming language
	- Sentence Transformers — AI model for generating semantic embeddings (all-MiniLM-L6-v2)
	- FAISS — Facebook AI Similarity Search for fast vector search
	- FastAPI — Modern Python web framework for the search API
	- Uvicorn — ASGI server for running the API
	- Pandas — Data manipulation and catalog management
	- HTML/CSS/JavaScript — Frontend search interface

	## Project Structure

	```
	semantic-library-search/
	├── catalog.csv # Library catalog with 200 books
	├── build_index.py # Builds the AI search index
	├── search.py # Command line search interface
	├── app.py # FastAPI search API
	├── index.html # Web search interface
	├── library.index # Generated FAISS vector index
	├── catalog_processed.csv # Processed catalog data
	└── embeddings.pkl # Saved book embeddings
	```

	## Getting Started

	### Prerequisites
	- Python 3.9 or higher
	- pip package manager

	### Installation

	1. Clone the repository:
	```
	git clone https://github.com/angelacolmen/semantic-library-search.git
	cd semantic-library-search
	```

	2. Create and activate a virtual environment:
	```
	python -m venv venv
	venv\Scripts\activate # Windows
	source venv/bin/activate # Mac/Linux
	```

	3. Install dependencies:
	```
	pip install sentence-transformers faiss-cpu pandas fastapi uvicorn python-multipart
	```

	4. Build the search index:
	```
	python build_index.py
	```

	5. Start the API:
	```
	uvicorn app:app --reload
	```

	6. Open your browser and go to:
	```
	http://127.0.0.1:8000
	```

	## Example Searches

	Try these searches to see semantic search in action:

	- `books about space and the universe`
	- `stories about race and justice in America`
	- `women who made a difference in science`
	- `how governments control people`
	- `survival against the odds`

	Notice how results appear even when the exact search words don't appear in the book titles or descriptions!

	## Library Science Applications

	This project demonstrates several real world library applications:

	- Reference Services — Patrons can describe their research need in plain language and receive relevant resource recommendations
	- Collection Development — Identify gaps in a collection by searching for topics and seeing what's missing
	- Catalog Enhancement — Improve discoverability of items that may be poorly described in traditional catalog records
	- Accessibility — Helps patrons who don't know the exact terminology used in library classification systems

	## Future Enhancements

	- Connect to a live library catalog via API (e.g. WorldCat, Open Library)
	- Add Library of Congress Subject Heading suggestions
	- Implement user feedback to improve search results over time
	- Scale to larger collections using cloud based vector databases
	- Add multilingual search support

	## Author

	Built by Angela Colmenares as a sample project for AI/ML.