Spaces:

vikranth1111
/

cap

Configuration error

App Files Files Community

cap / README.md

vikranth1111

Upload 18 files

9e2ba5f about 2 years ago

preview code

raw

history blame contribute delete

2.85 kB

	# Disaster Tweet Prediction

	Twitter has become an important communication channel in times of emergency.
	The ubiquitousness of smartphones enables people to announce an emergency they’re observing
	in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter
	(i.e. disaster relief organizations and news agencies). Therefore, in this task I am prediction
	whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.

	## Installation
	### Downloading the Data
	- Clone this repository to your computer
	- Navigate to the project directory `cd twitter-sentiment-analysis` from your terminal
	- run `mkdir inputs`
	- use `cd inputs` to go into the directory where data should be stored
	- Download the data files from Kaggle
	- Data can be found [here](https://www.kaggle.com/c/nlp-getting-started/data)
	- If you don't have a Kaggle account you'd have to create one

	### Installing the requirements
	- Install the requirements using `pip install -r requirements`
	- The python version is Python 3.8
	- You're better off using virtual environment

	## Usage

	- Navigate to the `src` directory using `cd src` in the project folder
	- Then run `python train.py`
	- This will train an LSTM and create a directory with the `models` directory called `PRETRAIN_WORD2VEC_LSTM` with
	the serialized LSTM and tokenizer inside it.
	- Once you've trained the model, you could run your own examples by running the `user_interface.py` script in the top level directory.
	this will provide you with a private link. Once selected, input some text that you'd like to determine whether it's a disaster or not.

	- View all explorations in `notebook` directory

	## Extending This Work
	Some ideas to extend this work:
	- Methods to reduce inference time
	- Use Different word embeddings
	- Try LSTM with attention (See [Attention in Long Short-Term Memory Recurrent Neural Networks](https://machinelearningmastery.com/attention-long-short-term-memory-recurrent-neural-networks/))
	- Use a transformer model
	- Correct misspelled words
	- Dealing with overfitting

	## Write Ups about This Project
	- [Sentiment Analysis: Predicting Whether A Tweet Is About A Disaster](https://towardsdatascience.com/sentiment-analysis-predicting-whether-a-tweet-is-about-a-disaster-c004d09d7245?source=your_stories_page-------------------------------------)
	- [Combating Overfitting In Deep Learning](https://towardsdatascience.com/combating-overfitting-in-deep-learning-efb0fdabfccc?source=your_stories_page-------------------------------------)
	- [Level Up Your Data Science Project With A Graphical Interface](https://towardsdatascience.com/level-up-your-data-science-project-with-a-graphical-interface-cb5704792509?source=your_stories_page-------------------------------------)