pratyushee's picture
Update README.md
0d8c41a verified
metadata
language: as
tags:
  - sentiment-analysis
  - assamese
  - transformers
  - text-classification
license: apache-2.0
datasets:
  - None
model-index:
  - name: assamese-sentiment-analysis
    results: []

🌟 Assamese Sentiment Analysis with LSTM

Tags: #text-classification #sentiment-analysis #Assamese #LSTM

A deep learning-powered tool to classify Assamese text as Positive, Negative, or Neutral using an LSTM model tailored for the Assamese language.


🚀 Key Features

  • 🔍 Sentiment Analysis for Assamese – Supports full sentiment classification of Assamese text
  • 🧠 Deep Learning Backbone – Powered by TensorFlow/Keras with a Long Short-Term Memory (LSTM) network
  • Advanced Preprocessing – Includes tokenization, text cleaning, optional stemming, and stopword removal
  • 🧰 Custom Tokenization – Leverages AssameseTokenizer for accurate language handling
  • 📈 Robust Evaluation Metrics – F1-score, precision, recall, and accuracy

🧠 Model Overview

Property Details
Model Name pratyushee/assamese-sentiment-analysis
Architecture Pretrained LSTM-based neural network
Language Assamese (অসমীয়া)
Classes 3 – Positive, Neutral, Negative
Use Cases Customer feedback, social media monitoring, opinion mining

🧪 Installation & Requirements

Clone the repo and install the requirements:

pip install -r requirements.txt

Install the custom Assamese tokenizer:

git clone https://github.com/KashyapKishore/AssameseTokenizer.git
cd AssameseTokenizer
pip install .

⚙️ Model Description

This model was developed using Assamese text data and trained with a custom tokenizer specifically designed for Assamese script. It uses an LSTM architecture, making it well-suited for capturing the sequence and context of natural language in sentiment classification tasks.

  • 📚 Training Data The dataset was curated from public sources such as news articles, social media comments, and feedback forms, and was manually labeled into three sentiment classes: Positive, Neutral, and Negative.

  • 🏋️ Training Procedure

  • ✂️ Preprocessing: Text cleaning, tokenization using AssameseTokenizer, optional stemming and stopword removal

  • 🔢 Input Handling: Sequences padded or truncated to a fixed length of 512 tokens

  • 🧠 Architecture: Embedding layer → LSTM → Dense (Softmax)

  • 💧 Regularization: Dropout layers to prevent overfitting

  • ⚙️ Optimizer: Adam

  • 🔁 Epochs: Trained for X epochs (replace with your actual number)

  • 📊 Evaluation: Final validation accuracy and F1-score: Insert actual metrics here


📦 Intended Usage

Ideal for:

  • 🗨️ Social media sentiment tracking in Assamese

  • 📢 Public opinion & brand monitoring

  • 📚 Research on low-resource NLP in Indic languages

  • ⚠️ Limitations / Not Recommended For:

Code-mixed Assamese-English input

Domain-specific texts (e.g., legal, medical) without additional fine-tuning


🧪 Quickstart: Using the Model

You can load and run the model easily via Hugging Face's transformers pipeline:

from transformers import pipeline

model_name = "pratyushee/assamese-sentiment-analysis"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name)

result = pipe("এই খাবাৰটা একদম ভালো আছিল!")  # Sample Assamese sentence
print(result)

📚 Reference Citations


🤝 In Collaboration with