Spaces:

lamossta
/

sv-task

Running

App Files Files Community

sv-task / README.md

lamossta

hf spaces readme update

312633f 23 days ago

preview code

raw

history blame contribute delete

3.86 kB

	---
	title: Entity Sentiment Classification
	emoji: 📊
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# Entity Sentiment Classification

	Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models.

	Three classification modes:
	- marker — entity wrapped in `[E]...[/E]` special tokens, single-sequence input
	- qa_m — question-answering multi-class: "What do you think of the sentiment of {entity}?"
	- qa_b — question-answering binary: three hypotheses per entity, argmax of P(yes)

	Plus a fastText baseline using marker-mode text.

	Report: [report.pdf](report.pdf).

	Live demo: https://huggingface.co/spaces/lamossta/entity-sentiment-classification

	Logs: https://telemetry.betterstack.com/team/t529434/tail?s=2383648

	## Setup

	### 1. Environment Variables

	Create a `.env` file in the project root:

	```
	HF_TOKEN=<your huggingface token> (not needed for the inference)
	BETTERSTACK_SOURCE_TOKEN=<your betterstack token>
	```

	### 2. Run

	```bash
	docker compose up
	```

	This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port 7860:
	- Frontend: http://localhost:7860
	- API: http://localhost:7860/api/

	## API Endpoints

	\| Endpoint \| Method \| Description \|
	\|--------------------\|---\|---\|
	\| `/predict` \| POST \| Classify entities using the marker model \|
	\| `/predict-all-models` \| POST \| Classify entities using all available models \|
	\| `/health` \| GET \| Health check \|
	\| `/docs` \| GET \| Interactive Swagger UI API docs \|


	## Sending Requests

	### Request Format

	```json
	[
	{
	"id": 0,
	"text": "Google had solid Q4 2025 earnings but Microsoft's were not great.",
	"entities": [
	{
	"entity_id": 0,
	"entity_text": "Google",
	"entity_type": "company",
	"positions": [
	{"position_text": "Google", "length": 6, "offset": 0}
	]
	},
	{
	"entity_id": 1,
	"entity_text": "Microsoft",
	"entity_type": "company",
	"positions": [
	{"position_text": "Microsoft", "length": 9, "offset": 40}
	]
	}
	]
	}
	]
	```

	### Response Format

	```json
	[
	{
	"id": 0,
	"entities": [
	{"entity_id": 0, "entity_text": "Google", "classification": "positive"},
	{"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"}
	]
	}
	]
	```

	### Local

	```bash
	curl -X POST http://localhost:7860/api/predict \
	-H "Content-Type: application/json" \
	-d @sample_input.json
	```

	```bash
	curl -X POST http://localhost:7860/api/predict-all-models \
	-H "Content-Type: application/json" \
	-d @sample_input.json
	```

	### HuggingFace Spaces

	```bash
	curl -X POST https://<your-space>.hf.space/predict \
	-H "Content-Type: application/json" \
	-d @sample_input.json
	```

	```bash
	curl -X POST https://<your-space>.hf.space/predict-all-models \
	-H "Content-Type: application/json" \
	-d @sample_input.json
	```

	## Notebooks

	- [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) — data hygiene checks on `data/data_raw.json`
	- [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) — article length + label distribution analysis
	- [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) — train/val/test splitting strategy
	- [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) — fine-tunes marker mode
	- [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) — fine-tunes QA-M mode
	- [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) — fine-tunes QA-B mode
	- [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) — trains fastText baseline