sv-task / README.md
lamossta's picture
hf spaces readme update
312633f
---
title: Entity Sentiment Classification
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
# Entity Sentiment Classification
Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models.
Three classification modes:
- **marker** β€” entity wrapped in `[E]...[/E]` special tokens, single-sequence input
- **qa_m** β€” question-answering multi-class: "What do you think of the sentiment of {entity}?"
- **qa_b** β€” question-answering binary: three hypotheses per entity, argmax of P(yes)
Plus a **fastText** baseline using marker-mode text.
**Report:** [report.pdf](report.pdf).
**Live demo:** https://huggingface.co/spaces/lamossta/entity-sentiment-classification
**Logs:** https://telemetry.betterstack.com/team/t529434/tail?s=2383648
## Setup
### 1. Environment Variables
Create a `.env` file in the project root:
```
HF_TOKEN=<your huggingface token> (not needed for the inference)
BETTERSTACK_SOURCE_TOKEN=<your betterstack token>
```
### 2. Run
```bash
docker compose up
```
This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port **7860**:
- Frontend: http://localhost:7860
- API: http://localhost:7860/api/
## API Endpoints
| Endpoint | Method | Description |
|--------------------|---|---|
| `/predict` | POST | Classify entities using the marker model |
| `/predict-all-models` | POST | Classify entities using all available models |
| `/health` | GET | Health check |
| `/docs` | GET | Interactive Swagger UI API docs |
## Sending Requests
### Request Format
```json
[
{
"id": 0,
"text": "Google had solid Q4 2025 earnings but Microsoft's were not great.",
"entities": [
{
"entity_id": 0,
"entity_text": "Google",
"entity_type": "company",
"positions": [
{"position_text": "Google", "length": 6, "offset": 0}
]
},
{
"entity_id": 1,
"entity_text": "Microsoft",
"entity_type": "company",
"positions": [
{"position_text": "Microsoft", "length": 9, "offset": 40}
]
}
]
}
]
```
### Response Format
```json
[
{
"id": 0,
"entities": [
{"entity_id": 0, "entity_text": "Google", "classification": "positive"},
{"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"}
]
}
]
```
### Local
```bash
curl -X POST http://localhost:7860/api/predict \
-H "Content-Type: application/json" \
-d @sample_input.json
```
```bash
curl -X POST http://localhost:7860/api/predict-all-models \
-H "Content-Type: application/json" \
-d @sample_input.json
```
### HuggingFace Spaces
```bash
curl -X POST https://<your-space>.hf.space/predict \
-H "Content-Type: application/json" \
-d @sample_input.json
```
```bash
curl -X POST https://<your-space>.hf.space/predict-all-models \
-H "Content-Type: application/json" \
-d @sample_input.json
```
## Notebooks
- [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) β€” data hygiene checks on `data/data_raw.json`
- [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) β€” article length + label distribution analysis
- [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) β€” train/val/test splitting strategy
- [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) β€” fine-tunes marker mode
- [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) β€” fine-tunes QA-M mode
- [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) β€” fine-tunes QA-B mode
- [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) β€” trains fastText baseline