| from classifier import classify | |
| from PIL import Image | |
| import streamlit as st | |
| st.title("Twitter Sentiment Analysis using BERT model") | |
| st.subheader("Motivation") | |
| st.markdown(""" | |
| Social media has significantly shortened the digital world making it easy for fake news to spread like wildfire. | |
| According to official reports, 36.7 percent [6] of the total population have felt that they are being cyberbullied in their lifetime. | |
| Since the level of offensiveness is subjective, conventional sentiment analysis might not do a perfect job in classifying them. | |
| A way to get around this is to use significantly large and diverse Deep Learning datasets that can generalize the model. | |
| Huggingface spaces provides an easy interfce to test the models before the use. Also, share the models with ease. | |
| """) | |
| st.subheader("Play with the model") | |
| text = st.text_input("Enter a tweet to classify it as either Normal or Abusive. (Press enter to submit)", | |
| value="I love DCNM course", max_chars=512, key=None, type="default", | |
| help=None, autocomplete=None) | |
| st.markdown(f"The tweet is classified as: **{classify(text)}**") | |
| st.markdown("Try out for abusive _Avatar is a crappy movie_") | |
| st.subheader("About the model") | |
| st.markdown(""" | |
| Model was trained on twitter dataset ENCASEH2020 from Founta, A.M et. al. (2018) [3]. BERT Tiny model [1][2][5] was chosen for this project because, empirically, | |
| giving better result with least number of parameters. The model was trained for 10 epochs with batch size of 32 and AdamW optimizer with learning rate of 1e-2 and loss as cross entropy. | |
| """) | |
| st.image("./images/train_val_accuracy.png", caption="Train and validation Accuracy - On an average we are getting 96 percent accuracy", use_column_width=True) | |
| st.image("./images/train_test_scores.png", caption="Classification Report - We are getting F1 score of 0.96 for both the classes", use_column_width=True) | |
| st.image("./images/confusion_matrix.png", caption="Confusion Matrix - Only 217 datapoints are mis-classified from 5430 data points in the test dataset", use_column_width=True) | |
| st.subheader("References") | |
| st.markdown("1. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)") | |
| st.markdown("2. [BERT-Tiny: A Tiny BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351)") | |
| st.markdown("3. [Founta, A.M., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M., & Kourtellis, N. (2018).Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In 11th International Conference on Web and Social Media, ICWSM 2018.](https://arxiv.org/abs/1802.00393)") | |
| st.markdown("4. [Nandagopan D, Kowsik & Dinesh, Navaneeth & S Ram, Ajay. & C N, Amarnath. (2022). End-to-End Messaging System Enhancement using Federated Learning for Cyberbullying Detection. 10.13140/RG.2.2.35686.70722. ](https://github.com/Cubemet/bert-models)") | |
| st.markdown("5. [Base Model from nreimers](https://huggingface.co/nreimers/BERT-Tiny_L-2_H-128_A-2)") | |
| st.markdown("6. [IHPL, Cyberbullying, a Growing Public Health Concern (Aug 2018)](https://ihpl.llu.edu/blog/cyberbullying-growing-public-health-concern)") |