Spaces:
Configuration error
Configuration error
File size: 2,854 Bytes
9e2ba5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# Disaster Tweet Prediction
Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing
in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter
(i.e. disaster relief organizations and news agencies). Therefore, in this task I am prediction
whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.
## Installation
### Downloading the Data
- Clone this repository to your computer
- Navigate to the project directory `cd twitter-sentiment-analysis` from your terminal
- run `mkdir inputs`
- use `cd inputs` to go into the directory where data should be stored
- Download the data files from Kaggle
- Data can be found [here](https://www.kaggle.com/c/nlp-getting-started/data)
- If you don't have a Kaggle account you'd have to create one
### Installing the requirements
- Install the requirements using `pip install -r requirements`
- The python version is Python 3.8
- You're better off using virtual environment
## Usage
- Navigate to the `src` directory using `cd src` in the project folder
- Then run `python train.py`
- This will train an LSTM and create a directory with the `models` directory called `PRETRAIN_WORD2VEC_LSTM` with
the serialized LSTM and tokenizer inside it.
- Once you've trained the model, you could run your own examples by running the `user_interface.py` script in the top level directory.
this will provide you with a private link. Once selected, input some text that you'd like to determine whether it's a disaster or not.
- View all explorations in `notebook` directory
## Extending This Work
Some ideas to extend this work:
- Methods to reduce inference time
- Use Different word embeddings
- Try LSTM with attention (See [Attention in Long Short-Term Memory Recurrent Neural Networks](https://machinelearningmastery.com/attention-long-short-term-memory-recurrent-neural-networks/))
- Use a transformer model
- Correct misspelled words
- Dealing with overfitting
## Write Ups about This Project
- [Sentiment Analysis: Predicting Whether A Tweet Is About A Disaster](https://towardsdatascience.com/sentiment-analysis-predicting-whether-a-tweet-is-about-a-disaster-c004d09d7245?source=your_stories_page-------------------------------------)
- [Combating Overfitting In Deep Learning](https://towardsdatascience.com/combating-overfitting-in-deep-learning-efb0fdabfccc?source=your_stories_page-------------------------------------)
- [Level Up Your Data Science Project With A Graphical Interface](https://towardsdatascience.com/level-up-your-data-science-project-with-a-graphical-interface-cb5704792509?source=your_stories_page-------------------------------------)
|