| |
|
|
| An AI-powered library search engine that finds books by meaning, not just keywords. Built as a sample project in AI/ML. |
|
|
| |
|
|
| Traditional library search looks for exact keyword matches. If you search for "books about the cosmos" you might miss books that use the word "universe" or "astronomy" instead. |
|
|
| Semantic search understands *meaning*. It knows that "cosmos", "universe", "space", and "astronomy" are all related concepts β and returns relevant results even when the exact words don't match. |
|
|
| |
|
|
| 1. **Catalog Loading** β Book metadata is loaded from a CSV catalog file |
| 2. **Embedding Generation** β Each book description is converted into a mathematical "meaning fingerprint" using a Sentence Transformer AI model |
| 3. **Vector Indexing** β These fingerprints are stored in a FAISS index for fast similarity searching |
| 4. **Query Processing** β When a user searches, their query is converted into a fingerprint and compared against all books in the index |
| 5. **Results Returned** β The closest matching books are returned ranked by semantic similarity |
|
|
| |
|
|
| - **Python 3.14** β Core programming language |
| - **Sentence Transformers** β AI model for generating semantic embeddings (all-MiniLM-L6-v2) |
| - **FAISS** β Facebook AI Similarity Search for fast vector search |
| - **FastAPI** β Modern Python web framework for the search API |
| - **Uvicorn** β ASGI server for running the API |
| - **Pandas** β Data manipulation and catalog management |
| - **HTML/CSS/JavaScript** β Frontend search interface |
|
|
| |
|
|
| ``` |
| semantic-library-search/ |
| βββ catalog.csv |
| βββ build_index.py |
| βββ search.py |
| βββ app.py |
| βββ index.html |
| βββ library.index |
| βββ catalog_processed.csv |
| βββ embeddings.pkl |
| ``` |
|
|
| |
|
|
| |
| - Python 3.9 or higher |
| - pip package manager |
|
|
| |
|
|
| 1. Clone the repository: |
| ``` |
| git clone https://github.com/angelacolmen/semantic-library-search.git |
| cd semantic-library-search |
| ``` |
|
|
| 2. Create and activate a virtual environment: |
| ``` |
| python -m venv venv |
| venv\Scripts\activate |
| source venv/bin/activate |
| ``` |
|
|
| 3. Install dependencies: |
| ``` |
| pip install sentence-transformers faiss-cpu pandas fastapi uvicorn python-multipart |
| ``` |
|
|
| 4. Build the search index: |
| ``` |
| python build_index.py |
| ``` |
|
|
| 5. Start the API: |
| ``` |
| uvicorn app:app |
| ``` |
|
|
| 6. Open your browser and go to: |
| ``` |
| http://127.0.0.1:8000 |
| ``` |
|
|
| |
|
|
| Try these searches to see semantic search in action: |
|
|
| - `books about space and the universe` |
| - `stories about race and justice in America` |
| - `women who made a difference in science` |
| - `how governments control people` |
| - `survival against the odds` |
|
|
| Notice how results appear even when the exact search words don't appear in the book titles or descriptions! |
|
|
| |
|
|
| This project demonstrates several real world library applications: |
|
|
| - **Reference Services** β Patrons can describe their research need in plain language and receive relevant resource recommendations |
| - **Collection Development** β Identify gaps in a collection by searching for topics and seeing what's missing |
| - **Catalog Enhancement** β Improve discoverability of items that may be poorly described in traditional catalog records |
| - **Accessibility** β Helps patrons who don't know the exact terminology used in library classification systems |
|
|
| |
|
|
| - Connect to a live library catalog via API (e.g. WorldCat, Open Library) |
| - Add Library of Congress Subject Heading suggestions |
| - Implement user feedback to improve search results over time |
| - Scale to larger collections using cloud based vector databases |
| - Add multilingual search support |
|
|
| |
|
|
| Built by Angela Colmenares as a sample project for AI/ML. |