🎬 KoELECTRA Korean Sentiment Analyzer

Fine-tuned Korean sentiment classification model based on KoELECTRA, trained on the NSMC dataset (Naver Sentiment Movie Corpus).

Classifies Korean movie reviews as positive (1) or negative (0).

GitHub: cringepnh/korean-sentiment-analyzer

πŸ“Š Results

Evaluated on the full NSMC test set (49,157 samples after cleaning):

Metric Score
Accuracy 90.2%
Precision 90.2%
Recall 90.3%
F1 Score 90.3%

πŸš€ How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "cringepnh/koelectra-korean-sentiment"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", padding="max_length",
                       truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    label = torch.argmax(probs).item()
    confidence = probs[0][label].item()
    sentiment = "Positive βœ…" if label == 1 else "Negative ❌"
    return {"sentiment": sentiment, "confidence": f"{confidence:.1%}"}

# Examples
print(predict("이 μ˜ν™” 정말 μž¬λ―Έμžˆμ–΄μš”! λ°°μš°λ“€ 연기도 졜고!"))
# {'sentiment': 'Positive βœ…', 'confidence': '99.4%'}

print(predict("μ™„μ „ λ³„λ‘œ... μ‹œκ°„ λ‚­λΉ„ν–ˆλ‹€."))
# {'sentiment': 'Negative ❌', 'confidence': '99.5%'}

πŸ‹οΈ Training Details

  • Base model: monologg/koelectra-base-finetuned-sentiment
  • Dataset: NSMC (146,182 training samples after cleaning)
  • Epochs: Up to 10 with early stopping (patience=2 on eval_loss)
  • Batch size: 32
  • Learning rate: 2e-5
  • Max token length: 128
  • Hardware: CPU (laptop)

πŸ“ Dataset

NSMC (Naver Sentiment Movie Corpus)

  • 150,000 training / 50,000 test Korean movie reviews
  • Binary labels: 0 (negative), 1 (positive)
  • Source: github.com/e9t/nsmc

πŸ“„ License

MIT

Downloads last month
19
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cringepnh/koelectra-korean-sentiment

Finetuned
(1)
this model

Dataset used to train cringepnh/koelectra-korean-sentiment