Spaces:

michaelkri
/

focal

Running

App Files Files Community

focal / README.md

michaelkri

Demo GIF in README

6d35b7a 5 months ago

preview code

raw

history blame contribute delete

2.51 kB

	---
	title: Focal
	emoji: 📰
	colorFrom: blue
	colorTo: gray
	python_version: 3.9.23
	sdk: docker
	app_file: app/main.py
	---

	# Focal: AI-Powered Multi-Source News Summarizer

	<p align="center">
	<a href="https://huggingface.co/spaces/michaelkri/focal">
	<strong>View live demo >></strong>
	</a>
	<img src="assets/demo.gif" alt="Demo video" />
	</p>


	### A web application that aggregates current news from RSS feeds, searches the web for articles to create a single coherent summary

	<p align="center">
	<hr />
	<img src="assets/screenshot.png" alt="Screenshot" />
	</p>


	## Architecture
	<p align="center">
	<img src="assets/diagram.png" alt="Diagram" />
	</p>

	### Data Flow
	1. A background service periodically reads the latest headlines from multiple RSS feeds (defined in `rss_feeds.txt`). The headlines from all feeds are then grouped based on semantic similarity (see point 3).
	2. A web search is performed to find the top articles about each topic. The contents of these articles is then scraped.
	3. The articles about every topic are divided into individual sentences and combined into a single collection. Embeddings from each of the sentences are created using `sentence-transformers/all-MiniLM-L6-v2`. These embeddings are then grouped using the HDBSCAN algorithm, such that sentences that have a similar meaning are grouped together. Only the most populous groups of sentences are kept.
	4. The most representative sentences from the top groups are taken, and fed to `facebook/bart-large-cnn` for summarization. Summaries (along with sources) are saved in an SQLite database hosted on Turso.
	5. A FastAPI server exposes endpoints to retrieve the news from the database, displaying the articles to the user on a simple webpage.


	## Tech Stack
	- Backend: FastAPI, Uvicorn
	- ML/NLP: Hugging Face Transformers, Sentence Transformers, Scikit-learn, NLTK, NumPy
	- Web Scraping: Trafilatura, DDGS (DuckDuckGo search), feedparser
	- Database: Turso (remote SQLite), SQLAlchemy
	- Deployment: Docker, GitHub Actions (CI/CD), Hugging Face Spaces


	## Local Setup
	To run the project locally:

	1. Clone the repository:
	```sh
	git clone https://github.com/michaelkri/focal.git
	```
	2. _Optional:_ To store summaries in a Turso database, create a `.env` file and add your API keys as follows:
	```
	USE_TURSO=true
	TURSO_DATABASE_URL=libsql://...
	TURSO_AUTH_TOKEN=...
	```
	3. Build and run the Docker container:
	```sh
	docker build -t focal .
	docker run -p 8000:8000 focal
	```