Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Danish Sentiment Analysis
|
| 2 |
+
## Information
|
| 3 |
+
- Dataset : [DDSC/angry-tweets](https://huggingface.co/datasets/DDSC/angry-tweets)
|
| 4 |
+
- Base model : [Danish bert botxo](https://huggingface.co/Maltehb/danish-bert-botxo)
|
| 5 |
+
|
| 6 |
+
## Approach
|
| 7 |
+
- Preprocessing
|
| 8 |
+
- Links and Usernames are replaced with @USER and [LINK], removing those keyholders
|
| 9 |
+
- Removing hashtags as they generally donot contribute to sentiment
|
| 10 |
+
- Removing emoji as models used in this notebook donot take emojis into consideration (replacing with their meaning could also be tested)
|
| 11 |
+
- lowercase
|
| 12 |
+
- Stopwords removal, danish stopwords from NLTK
|
| 13 |
+
|
| 14 |
+
- Training with HF trainer
|
| 15 |
+
- Training with pytorch loop
|
| 16 |
+
- Uploading model to Huggingface hub
|
| 17 |
+
- FastAPI endpoint
|
| 18 |
+
- Packaged the api service as a docker container
|
| 19 |
+
-
|