Spaces:

AngelaColmen
/

semantic-library-search

Sleeping

App Files Files Community

AngelaColmen commited on Apr 12

Commit

369a077

verified ·

1 Parent(s): f80bc88

Initial Commit

Browse files

Files changed (7) hide show

README +115 -0
app.py +66 -0
build_index.py +33 -0
catalog.csv +21 -0
index.html +135 -0
requirements.txt +5 -0
search.py +31 -0

README ADDED Viewed

	@@ -0,0 +1,115 @@

+# 📚 Semantic Library Search
+An AI-powered library search engine that finds books by meaning, not just keywords. Built as a sample project for a AI/ML.
+## What Is Semantic Search?
+Traditional library search looks for exact keyword matches. If you search for "books about the cosmos" you might miss books that use the word "universe" or "astronomy" instead.
+Semantic search understands *meaning*. It knows that "cosmos", "universe", "space", and "astronomy" are all related concepts — and returns relevant results even when the exact words don't match.
+## How It Works
+1. **Catalog Loading** — Book metadata is loaded from a CSV catalog file
+2. **Embedding Generation** — Each book description is converted into a mathematical "meaning fingerprint" using a Sentence Transformer AI model
+3. **Vector Indexing** — These fingerprints are stored in a FAISS index for fast similarity searching
+4. **Query Processing** — When a user searches, their query is converted into a fingerprint and compared against all books in the index
+5. **Results Returned** — The closest matching books are returned ranked by semantic similarity
+## Technologies Used
+- **Python 3.14** — Core programming language
+- **Sentence Transformers** — AI model for generating semantic embeddings (all-MiniLM-L6-v2)
+- **FAISS** — Facebook AI Similarity Search for fast vector search
+- **FastAPI** — Modern Python web framework for the search API
+- **Uvicorn** — ASGI server for running the API
+- **Pandas** — Data manipulation and catalog management
+- **HTML/CSS/JavaScript** — Frontend search interface
+## Project Structure
+```
+semantic-library-search/
+├── catalog.csv              # Library catalog with 20 books
+├── build_index.py           # Builds the AI search index
+├── search.py                # Command line search interface
+├── app.py                   # FastAPI search API
+├── index.html               # Web search interface
+├── library.index            # Generated FAISS vector index
+├── catalog_processed.csv    # Processed catalog data
+└── embeddings.pkl           # Saved book embeddings
+```
+## Getting Started
+### Prerequisites
+- Python 3.9 or higher
+- pip package manager
+### Installation
+1. Clone the repository:
+```
+git clone https://github.com/angelacolmen/semantic-library-search.git
+cd semantic-library-search
+```
+2. Create and activate a virtual environment:
+```
+python -m venv venv
+venv\Scripts\activate  # Windows
+source venv/bin/activate  # Mac/Linux
+```
+3. Install dependencies:
+```
+pip install sentence-transformers faiss-cpu pandas fastapi uvicorn python-multipart
+```
+4. Build the search index:
+```
+python build_index.py
+```
+5. Start the API:
+```
+uvicorn app:app --reload
+```
+6. Open your browser and go to:
+```
+http://127.0.0.1:8000
+```
+## Example Searches
+Try these searches to see semantic search in action:
+- `books about space and the universe`
+- `stories about race and justice in America`
+- `women who made a difference in science`
+- `how governments control people`
+- `survival against the odds`
+Notice how results appear even when the exact search words don't appear in the book titles or descriptions!
+## Library Science Applications
+This project demonstrates several real world library applications:
+- **Reference Services** — Patrons can describe their research need in plain language and receive relevant resource recommendations
+- **Collection Development** — Identify gaps in a collection by searching for topics and seeing what's missing
+- **Catalog Enhancement** — Improve discoverability of items that may be poorly described in traditional catalog records
+- **Accessibility** — Helps patrons who don't know the exact terminology used in library classification systems
+## Future Enhancements
+- Connect to a live library catalog via API (e.g. WorldCat, Open Library)
+- Add Library of Congress Subject Heading suggestions
+- Implement user feedback to improve search results over time
+- Scale to larger collections using cloud based vector databases
+- Add multilingual search support
+## Author
+Built by Angela Colmenares as a sample project for AI/ML.

app.py ADDED Viewed

	@@ -0,0 +1,66 @@

+import gradio as gr
+from sentence_transformers import SentenceTransformer
+import faiss
+import numpy as np
+import pandas as pd
+# Load everything
+print("Loading search engine...")
+model = SentenceTransformer("all-MiniLM-L6-v2")
+# Build index on startup
+import subprocess
+subprocess.run(["python", "build_index.py"])
+index = faiss.read_index("library.index")
+df = pd.read_csv("catalog_processed.csv")
+print("Ready!")
+def search(query, num_results=3):
+    if not query:
+        return "Please enter a search query."
+    query_embedding = model.encode([query])
+    distances, indices = index.search(np.array(query_embedding), num_results)
+    results = ""
+    for i, idx in enumerate(indices[0]):
+        book = df.iloc[idx]
+        results += f"### {i+1}. {book['title']}\n"
+        results += f"**Author:** {book['author']}\n"
+        results += f"**Subject:** {book['subject']}\n"
+        results += f"{book['description']}\n\n"
+    return results
+# Create Gradio interface
+demo = gr.Interface(
+    fn=search,
+    inputs=[
+        gr.Textbox(
+            label="Search Query",
+            placeholder="e.g. books about space and the universe...",
+            lines=2
+        ),
+        gr.Slider(
+            minimum=1,
+            maximum=5,
+            value=3,
+            step=1,
+            label="Number of Results"
+        )
+    ],
+    outputs=gr.Markdown(label="Search Results"),
+    title="📚 Semantic Library Search",
+    description="Search by meaning, not just keywords. Try searching for 'books about space and the universe' or 'stories about race and justice in America'.",
+    examples=[
+        ["books about space and the universe", 3],
+        ["stories about race and justice in America", 3],
+        ["women who made a difference in science", 3],
+        ["how governments control people", 3],
+        ["survival against the odds", 3]
+    ]
+)
+if __name__ == "__main__":
+    demo.launch()

build_index.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import pandas as pd
+from sentence_transformers import SentenceTransformer
+import faiss
+import numpy as np
+import pickle
+# Step 1: Load your catalog
+print("Loading catalog...")
+df = pd.read_csv("catalog.csv")
+# Step 2: Combine the important fields into one sentence per book
+df["combined"] = df["title"] + " by " + df["author"] + ". " + df["description"]
+# Step 3: Load the AI model
+print("Loading AI model (this may take a minute first time)...")
+model = SentenceTransformer("all-MiniLM-L6-v2")
+# Step 4: Turn each book description into a vector (list of numbers)
+print("Creating embeddings...")
+embeddings = model.encode(df["combined"].tolist())
+# Step 5: Build the search index
+print("Building search index...")
+index = faiss.IndexFlatL2(embeddings.shape[1])
+index.add(np.array(embeddings))
+# Step 6: Save everything for later
+faiss.write_index(index, "library.index")
+df.to_csv("catalog_processed.csv", index=False)
+with open("embeddings.pkl", "wb") as f:
+    pickle.dump(embeddings, f)
+print("Done! Your library search index is ready.")

catalog.csv ADDED Viewed

	@@ -0,0 +1,21 @@

+id,title,author,subject,description
+1,The Great Gatsby,F. Scott Fitzgerald,Fiction,A story about wealth and the American dream in the 1920s
+2,A Brief History of Time,Stephen Hawking,Science,An introduction to cosmology and the nature of the universe
+3,To Kill a Mockingbird,Harper Lee,Fiction,A lawyer defends a Black man accused of a crime in the American South
+4,The Selfish Gene,Richard Dawkins,Science,An exploration of evolution from the perspective of genes
+5,Sapiens,Yuval Noah Harari,History,A sweeping history of humankind from prehistoric times to the present
+6,Cosmos,Carl Sagan,Science,A journey through the universe exploring astronomy and the nature of space
+7,The Immortal Life of Henrietta Lacks,Rebecca Skloot,Science,The story of a Black woman whose cancer cells were taken without consent and used for medical research
+8,Just Mercy,Bryan Stevenson,Law,A lawyer fights for wrongly condemned prisoners on death row in America
+9,The Origin of Species,Charles Darwin,Science,The foundational text of evolutionary biology explaining natural selection
+10,Guns Germs and Steel,Jared Diamond,History,An explanation of why some civilizations came to dominate others throughout history
+11,The Color of Law,Richard Rothstein,History,How the American government segregated the country through housing policies
+12,Hidden Figures,Margot Lee Shetterly,History,The story of Black female mathematicians who worked at NASA during the space race
+13,The Martian,Andy Weir,Fiction,An astronaut is stranded alone on Mars and must use science to survive
+14,Astrophysics for People in a Hurry,Neil deGrasse Tyson,Science,A quick guide to the biggest ideas in astrophysics and the cosmos
+15,The New Jim Crow,Michelle Alexander,Law,How mass incarceration functions as a system of racial control in America
+16,A Short History of Nearly Everything,Bill Bryson,Science,A journey through the history of science and how humans came to understand the world
+17,The Warmth of Other Suns,Isabel Wilkerson,History,The story of the Great Migration of Black Americans from the South to the North
+18,Contact,Carl Sagan,Fiction,A scientist receives a mysterious signal from deep space and must decide how humanity should respond
+19,The Sixth Extinction,Elizabeth Kolbert,Science,An investigation into how human activity is causing a mass extinction of species on Earth
+20,Between the World and Me,Ta-Nehisi Coates,History,A father's letter to his son about the history and reality of being Black in America

index.html ADDED Viewed

	@@ -0,0 +1,135 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Semantic Library Search</title>
+    <style>
+        body {
+            font-family: Arial, sans-serif;
+            max-width: 800px;
+            margin: 40px auto;
+            padding: 20px;
+            background-color: #f5f5f5;
+        }
+        h1 {
+            color: #2c3e50;
+            text-align: center;
+        }
+        p.subtitle {
+            text-align: center;
+            color: #7f8c8d;
+            margin-bottom: 30px;
+        }
+        .search-box {
+            display: flex;
+            gap: 10px;
+            margin-bottom: 30px;
+        }
+        input {
+            flex: 1;
+            padding: 12px;
+            font-size: 16px;
+            border: 2px solid #bdc3c7;
+            border-radius: 6px;
+        }
+        button {
+            padding: 12px 24px;
+            background-color: #2980b9;
+            color: white;
+            border: none;
+            border-radius: 6px;
+            font-size: 16px;
+            cursor: pointer;
+        }
+        button:hover {
+            background-color: #2471a3;
+        }
+        .result {
+            background: white;
+            padding: 20px;
+            margin-bottom: 15px;
+            border-radius: 8px;
+            border-left: 5px solid #2980b9;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+        }
+        .result h3 {
+            margin: 0 0 5px 0;
+            color: #2c3e50;
+        }
+        .result .author {
+            color: #7f8c8d;
+            font-style: italic;
+            margin-bottom: 8px;
+        }
+        .result .subject {
+            display: inline-block;
+            background: #eaf4fb;
+            color: #2980b9;
+            padding: 3px 10px;
+            border-radius: 12px;
+            font-size: 13px;
+            margin-bottom: 8px;
+        }
+        .result p {
+            color: #555;
+            margin: 0;
+        }
+        #status {
+            text-align: center;
+            color: #7f8c8d;
+            font-style: italic;
+        }
+    </style>
+</head>
+<body>
+    <h1>📚 Semantic Library Search</h1>
+    <p class="subtitle">Search by meaning, not just keywords</p>
+    <div class="search-box">
+        <input type="text" id="query" placeholder="e.g. books about space and the universe..." />
+        <button onclick="search()">Search</button>
+    </div>
+    <div id="status"></div>
+    <div id="results"></div>
+    <script>
+        document.getElementById('query').addEventListener('keypress', function(e) {
+            if (e.key === 'Enter') search();
+        });
+        async function search() {
+            const query = document.getElementById('query').value;
+            if (!query) return;
+            document.getElementById('status').textContent = 'Searching...';
+            document.getElementById('results').innerHTML = '';
+            try {
+                const response = await fetch('http://127.0.0.1:8000/search', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({ query: query, num_results: 3 })
+                });
+                const data = await response.json();
+                document.getElementById('status').textContent = '';
+                data.results.forEach(book => {
+                    document.getElementById('results').innerHTML += `
+                        <div class="result">
+                            <h3>${book.title}</h3>
+                            <div class="author">by ${book.author}</div>
+                            <span class="subject">${book.subject}</span>
+                            <p>${book.description}</p>
+                        </div>
+                    `;
+                });
+            } catch (error) {
+                document.getElementById('status').textContent = 'Error connecting to search engine. Make sure the API is running!';
+            }
+        }
+    </script>
+</body>
+</html>

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+sentence-transformers
+faiss-cpu
+pandas
+gradio
+numpy

search.py ADDED Viewed

	@@ -0,0 +1,31 @@

+import pandas as pd
+from sentence_transformers import SentenceTransformer
+import faiss
+import numpy as np
+# Load everything we built
+print("Loading search engine...")
+model = SentenceTransformer("all-MiniLM-L6-v2")
+index = faiss.read_index("library.index")
+df = pd.read_csv("catalog_processed.csv")
+print("Ready! Type your search question below.")
+print("Type 'quit' to exit\n")
+# Search loop
+while True:
+    query = input("Search: ")
+    if query.lower() == "quit":
+        break
+    # Turn your search query into a meaning fingerprint
+    query_embedding = model.encode([query])
+    # Find the 3 most similar books
+    distances, indices = index.search(np.array(query_embedding), 3)
+    print("\nTop results:")
+    for i, idx in enumerate(indices[0]):
+        book = df.iloc[idx]
+        print(f"{i+1}. {book['title']} by {book['author']}")
+        print(f"   {book['description']}\n")