--- language: - en - ko - zh - ja - es - fr - ru - hi base_model: - distilbert/distilbert-base-multilingual-cased pipeline_tag: text-classification tags: - tourism - sentiment - multilingual --- --- # distilbertTourism-multilingual-sentiment A fine-tuned DistilBERT model for performing sentiment analysis on tourism-related texts in multiple languages. This model is a key component of the thesis project **"Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning."** It is designed to analyze reviews, feedback, and other textual data to improve tourist feedback collection in Panglao. ## Overview This model builds on the [distilbert-base-multilingual-cased](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) architecture and has been fine-tuned on tourism-specific sentiment data. With support for eight languages, it provides a practical solution for multilingual sentiment classification in the tourism sector. > **Thesis Context:** > As part of the thesis project, this model integrates with a comprehensive system that leverages advanced natural language processing techniques. In addition to this DistilBERT-based sentiment analyzer, the system utilizes BERTopic for topic modeling. The project aims to surpass the 70% accuracy benchmark set by the IPCR while addressing language barriers and inefficiencies inherent in traditional survey methods. ## Model Details - **Task:** Text Classification (Sentiment Analysis) - **Base Model:** [distilbert-base-multilingual-cased](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) - **Architecture:** DistilBERT - **Parameters:** 135M - **Tensor Format:** F32 (Safetensors) - **Supported Languages:** 8 (Multilingual) - **Training Data:** 160k synthetic tourism reviews - **Performance:** Achieves over 95% confidence in sentiment classification for tourism-related texts. - **Fine-tuning:** Adapted to the tourism domain (242 fine-tuning iterations/steps indicated) ## Usage To integrate this model into your application, you can use the Hugging Face Transformers library. Below is an example in Python: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Define the model repository model_name = "SCANSKY/distilbertTourism-multilingual-sentiment" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example input text (replace with your own tourism-related text) text = "I had an amazing experience during my trip!" inputs = tokenizer(text, return_tensors="pt") # Perform inference outputs = model(**inputs) logits = outputs.logits # You can further process the logits to get predicted sentiment labels. ``` ### Installation Ensure you have the required packages installed: ```bash pip install transformers safetensors ``` ## Limitations - **Domain Specific:** This model is fine-tuned specifically for tourism sentiment analysis and may not perform optimally on texts from other domains. - **Inference API:** Currently, the model does not support direct deployment to the Hugging Face Inference API since it lacks a library tag. ## Future Work - **Dataset Expansion:** Incorporating additional data from more tourism sources could further improve performance. - **Model Optimization:** Experimentation with different fine-tuning strategies or hyperparameters might yield even better sentiment classification accuracy. - **API Integration:** Future updates may include support for direct inference API deployment. ## Acknowledgements - This model is based on the robust [DistilBERT](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) architecture. - Special thanks to the Hugging Face community for providing the infrastructure that makes deploying and sharing models seamless. - This work is part of the thesis project **"Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning."** The project also utilizes BERTopic for topic modeling, aiming to revolutionize the collection and analysis of tourist feedback by overcoming language barriers and improving upon traditional survey methods. ## Citation ```bibtex @inproceedings{your_citation, title={Distilbert Sentiment for Multilingual Tourism Feedback}, author={Paul Andre D. Tadiar}, year={2025} } ``` ---