Spaces:

upk
/

Related-Work-Finder

Sleeping

App Files Files Community

Related-Work-Finder / README.md

upk

Updated the HF Space URL

610ea37 about 1 month ago

preview code

raw

history blame contribute delete

2.04 kB

	---
	title: Related Work Finder
	emoji: 📚
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	license: cc-by-nc-4.0
	---
	# Related Work Finder

	Live Demo: [Related Work Finder](https://huggingface.co/spaces/upk/Related-Work-Finder)

	A search engine designed to find relevant papers from the ACL Anthology.

	## Current Features

	- Keyword Search: Full-text search using SQLite's FTS5 engine, with support for boolean logic and special characters.
	- Abstract Search: Uses the KeyBERT NLP transformer to extract keywords from user-provided abstracts and runs a targeted search.
	- Dataset Parsing: Automatically downloads the `anthology+abstracts.bib.gz` dataset from ACL servers on startup and parses the BibTeX data (filtered for years 2022–2026).
	- Web Interface: A responsive UI displaying paper titles, authors, venues, years, and abstracts. Results can be sorted by relevance or year.
	- Deployment: Includes a `pytest` suite running via GitHub Actions, and a `Dockerfile` configured for deployment to Hugging Face Spaces.

	## Tech Stack

	- Frontend: HTML5, CSS3, JavaScript
	- Backend: Python 3.10, FastAPI, Uvicorn
	- Database: SQLite (FTS5 Virtual Tables)
	- NLP: KeyBERT

	## Local Setup

	1. Create a virtual environment and install dependencies:
	```bash
	python -m venv venv
	source venv/bin/activate
	pip install -r requirements.txt
	```
	2. Start the backend server:
	```bash
	cd backend
	uvicorn main:app --reload
	```
	3. Open your web browser and navigate to:
	`http://127.0.0.1:8000/frontend/index.html`

	## Planned for Phase 2

	- Fuzzy Searching: Implement robust typo-tolerance and spelling correction (e.g. using `pyspellchecker` or trigram matching) to handle slight misspellings in search queries.
	- Semantic Scholar Integration: Pull citation counts and sort papers by impact/citations instead of just basic relevance.
	- Advanced PDF Parsing: Extract text and insights directly from full paper PDFs or images instead of just using abstracts.