File size: 765 Bytes
3edf9c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Danish Sentiment Analysis
## Information
- Dataset : [DDSC/angry-tweets](https://huggingface.co/datasets/DDSC/angry-tweets)
- Base model : [Danish bert botxo](https://huggingface.co/Maltehb/danish-bert-botxo)

## Approach
- Preprocessing
  - Links and Usernames are replaced with @USER and [LINK], removing those keyholders
  - Removing hashtags as they generally donot contribute to sentiment
  - Removing emoji as models used in this notebook donot take emojis into consideration (replacing with their meaning could also be tested)
  - lowercase
  - Stopwords removal, danish stopwords from NLTK
 
- Training with HF trainer
- Training with pytorch loop
- Uploading model to Huggingface hub
- FastAPI endpoint
- Packaged the api service as a docker container
-