--- title: Entity Sentiment Classification emoji: 📊 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false --- # Entity Sentiment Classification Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models. Three classification modes: - **marker** — entity wrapped in `[E]...[/E]` special tokens, single-sequence input - **qa_m** — question-answering multi-class: "What do you think of the sentiment of {entity}?" - **qa_b** — question-answering binary: three hypotheses per entity, argmax of P(yes) Plus a **fastText** baseline using marker-mode text. **Report:** [report.pdf](report.pdf). **Live demo:** https://huggingface.co/spaces/lamossta/entity-sentiment-classification **Logs:** https://telemetry.betterstack.com/team/t529434/tail?s=2383648 ## Setup ### 1. Environment Variables Create a `.env` file in the project root: ``` HF_TOKEN= (not needed for the inference) BETTERSTACK_SOURCE_TOKEN= ``` ### 2. Run ```bash docker compose up ``` This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port **7860**: - Frontend: http://localhost:7860 - API: http://localhost:7860/api/ ## API Endpoints | Endpoint | Method | Description | |--------------------|---|---| | `/predict` | POST | Classify entities using the marker model | | `/predict-all-models` | POST | Classify entities using all available models | | `/health` | GET | Health check | | `/docs` | GET | Interactive Swagger UI API docs | ## Sending Requests ### Request Format ```json [ { "id": 0, "text": "Google had solid Q4 2025 earnings but Microsoft's were not great.", "entities": [ { "entity_id": 0, "entity_text": "Google", "entity_type": "company", "positions": [ {"position_text": "Google", "length": 6, "offset": 0} ] }, { "entity_id": 1, "entity_text": "Microsoft", "entity_type": "company", "positions": [ {"position_text": "Microsoft", "length": 9, "offset": 40} ] } ] } ] ``` ### Response Format ```json [ { "id": 0, "entities": [ {"entity_id": 0, "entity_text": "Google", "classification": "positive"}, {"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"} ] } ] ``` ### Local ```bash curl -X POST http://localhost:7860/api/predict \ -H "Content-Type: application/json" \ -d @sample_input.json ``` ```bash curl -X POST http://localhost:7860/api/predict-all-models \ -H "Content-Type: application/json" \ -d @sample_input.json ``` ### HuggingFace Spaces ```bash curl -X POST https://.hf.space/predict \ -H "Content-Type: application/json" \ -d @sample_input.json ``` ```bash curl -X POST https://.hf.space/predict-all-models \ -H "Content-Type: application/json" \ -d @sample_input.json ``` ## Notebooks - [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) — data hygiene checks on `data/data_raw.json` - [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) — article length + label distribution analysis - [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) — train/val/test splitting strategy - [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) — fine-tunes marker mode - [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) — fine-tunes QA-M mode - [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) — fine-tunes QA-B mode - [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) — trains fastText baseline