Spaces:

lamossta
/

sv-task

Sleeping

File size: 3,860 Bytes

---
title: Entity Sentiment Classification
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---

# Entity Sentiment Classification

Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models.

Three classification modes:
- **marker** — entity wrapped in `[E]...[/E]` special tokens, single-sequence input
- **qa_m** — question-answering multi-class: "What do you think of the sentiment of {entity}?"
- **qa_b** — question-answering binary: three hypotheses per entity, argmax of P(yes)

Plus a **fastText** baseline using marker-mode text.

**Report:** [report.pdf](report.pdf).

**Live demo:** https://huggingface.co/spaces/lamossta/entity-sentiment-classification

**Logs:** https://telemetry.betterstack.com/team/t529434/tail?s=2383648

## Setup

### 1. Environment Variables

Create a `.env` file in the project root:

```
HF_TOKEN=<your huggingface token> (not needed for the inference)
BETTERSTACK_SOURCE_TOKEN=<your betterstack token>
```

### 2. Run

```bash
docker compose up
```

This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port **7860**:
- Frontend: http://localhost:7860
- API: http://localhost:7860/api/

## API Endpoints

| Endpoint           | Method | Description |
|--------------------|---|---|
| `/predict`         | POST | Classify entities using the marker model |
| `/predict-all-models` | POST | Classify entities using all available models |
| `/health`          | GET | Health check |
| `/docs`            | GET | Interactive Swagger UI API docs |


## Sending Requests

### Request Format

```json
[
  {
    "id": 0,
    "text": "Google had solid Q4 2025 earnings but Microsoft's were not great.",
    "entities": [
      {
        "entity_id": 0,
        "entity_text": "Google",
        "entity_type": "company",
        "positions": [
          {"position_text": "Google", "length": 6, "offset": 0}
        ]
      },
      {
        "entity_id": 1,
        "entity_text": "Microsoft",
        "entity_type": "company",
        "positions": [
          {"position_text": "Microsoft", "length": 9, "offset": 40}
        ]
      }
    ]
  }
]
```

### Response Format

```json
[
  {
    "id": 0,
    "entities": [
      {"entity_id": 0, "entity_text": "Google", "classification": "positive"},
      {"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"}
    ]
  }
]
```

### Local

```bash
curl -X POST http://localhost:7860/api/predict \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

```bash
curl -X POST http://localhost:7860/api/predict-all-models \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

### HuggingFace Spaces

```bash
curl -X POST https://<your-space>.hf.space/predict \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

```bash
curl -X POST https://<your-space>.hf.space/predict-all-models \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

## Notebooks

- [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) — data hygiene checks on `data/data_raw.json`
- [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) — article length + label distribution analysis
- [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) — train/val/test splitting strategy
- [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) — fine-tunes marker mode
- [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) — fine-tunes QA-M mode
- [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) — fine-tunes QA-B mode
- [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) — trains fastText baseline