| --- |
| title: Entity Sentiment Classification |
| emoji: π |
| colorFrom: blue |
| colorTo: indigo |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # Entity Sentiment Classification |
|
|
| Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models. |
|
|
| Three classification modes: |
| - **marker** β entity wrapped in `[E]...[/E]` special tokens, single-sequence input |
| - **qa_m** β question-answering multi-class: "What do you think of the sentiment of {entity}?" |
| - **qa_b** β question-answering binary: three hypotheses per entity, argmax of P(yes) |
|
|
| Plus a **fastText** baseline using marker-mode text. |
|
|
| **Report:** [report.pdf](report.pdf). |
|
|
| **Live demo:** https://huggingface.co/spaces/lamossta/entity-sentiment-classification |
|
|
| **Logs:** https://telemetry.betterstack.com/team/t529434/tail?s=2383648 |
|
|
| ## Setup |
|
|
| ### 1. Environment Variables |
|
|
| Create a `.env` file in the project root: |
|
|
| ``` |
| HF_TOKEN=<your huggingface token> (not needed for the inference) |
| BETTERSTACK_SOURCE_TOKEN=<your betterstack token> |
| ``` |
|
|
| ### 2. Run |
|
|
| ```bash |
| docker compose up |
| ``` |
|
|
| This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port **7860**: |
| - Frontend: http://localhost:7860 |
| - API: http://localhost:7860/api/ |
|
|
| ## API Endpoints |
|
|
| | Endpoint | Method | Description | |
| |--------------------|---|---| |
| | `/predict` | POST | Classify entities using the marker model | |
| | `/predict-all-models` | POST | Classify entities using all available models | |
| | `/health` | GET | Health check | |
| | `/docs` | GET | Interactive Swagger UI API docs | |
|
|
|
|
| ## Sending Requests |
|
|
| ### Request Format |
|
|
| ```json |
| [ |
| { |
| "id": 0, |
| "text": "Google had solid Q4 2025 earnings but Microsoft's were not great.", |
| "entities": [ |
| { |
| "entity_id": 0, |
| "entity_text": "Google", |
| "entity_type": "company", |
| "positions": [ |
| {"position_text": "Google", "length": 6, "offset": 0} |
| ] |
| }, |
| { |
| "entity_id": 1, |
| "entity_text": "Microsoft", |
| "entity_type": "company", |
| "positions": [ |
| {"position_text": "Microsoft", "length": 9, "offset": 40} |
| ] |
| } |
| ] |
| } |
| ] |
| ``` |
|
|
| ### Response Format |
|
|
| ```json |
| [ |
| { |
| "id": 0, |
| "entities": [ |
| {"entity_id": 0, "entity_text": "Google", "classification": "positive"}, |
| {"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"} |
| ] |
| } |
| ] |
| ``` |
|
|
| ### Local |
|
|
| ```bash |
| curl -X POST http://localhost:7860/api/predict \ |
| -H "Content-Type: application/json" \ |
| -d @sample_input.json |
| ``` |
|
|
| ```bash |
| curl -X POST http://localhost:7860/api/predict-all-models \ |
| -H "Content-Type: application/json" \ |
| -d @sample_input.json |
| ``` |
|
|
| ### HuggingFace Spaces |
|
|
| ```bash |
| curl -X POST https://<your-space>.hf.space/predict \ |
| -H "Content-Type: application/json" \ |
| -d @sample_input.json |
| ``` |
|
|
| ```bash |
| curl -X POST https://<your-space>.hf.space/predict-all-models \ |
| -H "Content-Type: application/json" \ |
| -d @sample_input.json |
| ``` |
|
|
| ## Notebooks |
|
|
| - [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) β data hygiene checks on `data/data_raw.json` |
| - [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) β article length + label distribution analysis |
| - [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) β train/val/test splitting strategy |
| - [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) β fine-tunes marker mode |
| - [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) β fine-tunes QA-M mode |
| - [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) β fine-tunes QA-B mode |
| - [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) β trains fastText baseline |
|
|