assamese-sentiment-analysis / README.md

pratyushee

Update README.md

0d8c41a verified 6 months ago

preview code

raw

history blame contribute delete

4.29 kB

metadata

language: as
tags:
  - sentiment-analysis
  - assamese
  - transformers
  - text-classification
license: apache-2.0
datasets:
  - None
model-index:
  - name: assamese-sentiment-analysis
    results: []

🌟 Assamese Sentiment Analysis with LSTM

Tags: #text-classification #sentiment-analysis #Assamese #LSTM

A deep learning-powered tool to classify Assamese text as Positive, Negative, or Neutral using an LSTM model tailored for the Assamese language.

🚀 Key Features

🔍 Sentiment Analysis for Assamese – Supports full sentiment classification of Assamese text
🧠 Deep Learning Backbone – Powered by TensorFlow/Keras with a Long Short-Term Memory (LSTM) network
✨ Advanced Preprocessing – Includes tokenization, text cleaning, optional stemming, and stopword removal
🧰 Custom Tokenization – Leverages AssameseTokenizer for accurate language handling
📈 Robust Evaluation Metrics – F1-score, precision, recall, and accuracy

🧠 Model Overview

Property	Details
Model Name	`pratyushee/assamese-sentiment-analysis`
Architecture	Pretrained LSTM-based neural network
Language	Assamese (অসমীয়া)
Classes	3 – Positive, Neutral, Negative
Use Cases	Customer feedback, social media monitoring, opinion mining

🧪 Installation & Requirements

Clone the repo and install the requirements:

pip install -r requirements.txt

Install the custom Assamese tokenizer:

git clone https://github.com/KashyapKishore/AssameseTokenizer.git
cd AssameseTokenizer
pip install .

⚙️ Model Description

This model was developed using Assamese text data and trained with a custom tokenizer specifically designed for Assamese script. It uses an LSTM architecture, making it well-suited for capturing the sequence and context of natural language in sentiment classification tasks.

📚 Training Data The dataset was curated from public sources such as news articles, social media comments, and feedback forms, and was manually labeled into three sentiment classes: Positive, Neutral, and Negative.
🏋️ Training Procedure
✂️ Preprocessing: Text cleaning, tokenization using AssameseTokenizer, optional stemming and stopword removal
🔢 Input Handling: Sequences padded or truncated to a fixed length of 512 tokens
🧠 Architecture: Embedding layer → LSTM → Dense (Softmax)
💧 Regularization: Dropout layers to prevent overfitting
⚙️ Optimizer: Adam
🔁 Epochs: Trained for X epochs (replace with your actual number)
📊 Evaluation: Final validation accuracy and F1-score: Insert actual metrics here

📦 Intended Usage

Ideal for:

🗨️ Social media sentiment tracking in Assamese
📢 Public opinion & brand monitoring
📚 Research on low-resource NLP in Indic languages
⚠️ Limitations / Not Recommended For:

Code-mixed Assamese-English input

Domain-specific texts (e.g., legal, medical) without additional fine-tuning

🧪 Quickstart: Using the Model

You can load and run the model easily via Hugging Face's transformers pipeline:

from transformers import pipeline

model_name = "pratyushee/assamese-sentiment-analysis"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name)

result = pipe("এই খাবাৰটা একদম ভালো আছিল!")  # Sample Assamese sentence
print(result)

pratyushee
/

assamese-sentiment-analysis