keigo-evaluator / README.md
Ishraq99's picture
Update README.md
582e120 verified
metadata
language:
  - ja
license: mit
base_model: tohoku-nlp/bert-base-japanese-v3
tags:
  - japanese
  - keigo
  - text-classification
  - omotenashi
  - hospitality
  - bert
pipeline_tag: text-classification

Keigo Evaluator โ€” ๆ•ฌ่ชžใƒฌใƒ™ใƒซๅˆ†้กžใƒขใƒ‡ใƒซ

A fine-tuned Japanese BERT model that classifies the politeness level (ๆ•ฌ่ชžใƒฌใƒ™ใƒซ) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate Keigo (ๆ•ฌ่ชž) and Omotenashi (ใŠใ‚‚ใฆใชใ—) standards in a hospitality or service context.


Intended Use

This model is the NLP component of an AI-powered service quality evaluation pipeline:

Voice Recording โ†’ Whisper ASR โ†’ Transcribed Text โ†’ This Model โ†’ Keigo Verdict

It is intended for:

  • Evaluating employee speech quality in hospitality and customer service settings
  • Automated Keigo compliance checking in call centres or hotel/restaurant environments
  • Quality assurance systems for Japanese service staff training

Labels

The model predicts one of four classes:

Label Level Name Description Service Verdict
LABEL_0 1 ๆœ€้ซ˜ๆ•ฌ่ชž Highest honorific โ€” sonkeigo dominant โœ… Pass
LABEL_1 2 ๆ•ฌ่ชž Standard honorific โ€” appropriate for most service contexts โœ… Pass
LABEL_2 3 ไธๅฏง่ชž Polite but not honorific โ€” insufficient for hospitality โŒ Fail
LABEL_3 4 ๆ™ฎ้€š่ชž Casual / plain speech โ€” inappropriate in service contexts โŒ Fail

How to Use

Installation

pip install transformers torch fugashi unidic-lite

Note: unidic-lite is required (not ipadic) โ€” this model uses the UniDic dictionary for MeCab tokenization.

Basic Usage

from transformers import pipeline
import torch

classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

LEVEL_MAP = {
    'LABEL_0': {'level': 1, 'name': 'ๆœ€้ซ˜ๆ•ฌ่ชž', 'passed': True},
    'LABEL_1': {'level': 2, 'name': 'ๆ•ฌ่ชž',     'passed': True},
    'LABEL_2': {'level': 3, 'name': 'ไธๅฏง่ชž',   'passed': False},
    'LABEL_3': {'level': 4, 'name': 'ๆ™ฎ้€š่ชž',   'passed': False},
}

def evaluate_keigo(text: str) -> dict:
    result = classifier(text)[0]
    info   = LEVEL_MAP[result['label']]
    return {
        'text':       text,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    'โœ… ้ฉๅˆ‡ใชๆ•ฌ่ชžใงใ™' if info['passed'] else 'โŒ ๆ•ฌ่ชžใƒฌใƒ™ใƒซใŒไธ่ถณใ—ใฆใ„ใพใ™'
    }

print(evaluate_keigo('ใ„ใ‚‰ใฃใ—ใ‚ƒใ„ใพใ›ใ€‚ๆœฌๆ—ฅใฏใฉใฎใ‚ˆใ†ใชใ”็”จไปถใงใ”ใ–ใ„ใพใ—ใ‚‡ใ†ใ‹๏ผŸ'))
# {'level': 1, 'level_name': 'ๆœ€้ซ˜ๆ•ฌ่ชž', 'confidence': 0.91, 'passed': True, 'verdict': 'โœ… ้ฉๅˆ‡ใชๆ•ฌ่ชžใงใ™'}

print(evaluate_keigo('ใกใ‚‡ใฃใจๅพ…ใฃใฆใ€‚'))
# {'level': 4, 'level_name': 'ๆ™ฎ้€š่ชž', 'confidence': 0.99, 'passed': False, 'verdict': 'โŒ ๆ•ฌ่ชžใƒฌใƒ™ใƒซใŒไธ่ถณใ—ใฆใ„ใพใ™'}

Full Voice Pipeline (Whisper + Keigo Evaluator)

import whisper
from transformers import pipeline
import torch

asr        = whisper.load_model('medium')
classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

def evaluate_recording(audio_path: str) -> dict:
    transcript = asr.transcribe(audio_path, language='ja')['text']
    result     = classifier(transcript)[0]
    info       = LEVEL_MAP[result['label']]
    return {
        'transcript': transcript,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    'โœ… ้ฉๅˆ‡ใชๆ•ฌ่ชžใงใ™' if info['passed'] else 'โŒ ๆ•ฌ่ชžใƒฌใƒ™ใƒซใŒไธ่ถณใ—ใฆใ„ใพใ™'
    }

result = evaluate_recording('employee_call.mp3')
print(result)

Training Details

Dataset

KeiCO Corpus โ€” a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjลgo / teineigo) across a wide range of service situations including greetings (ๆŒจๆ‹ถ), apologies (่ฌใ‚‹), meetings (ไผšใ†), and seasonal expressions (ๅญฃ็ฏ€).

Level Count %
1 โ€” ๆœ€้ซ˜ๆ•ฌ่ชž 2,584 25.8%
2 โ€” ๆ•ฌ่ชž 2,044 20.4%
3 โ€” ไธๅฏง่ชž 2,692 26.9%
4 โ€” ๆ™ฎ้€š่ชž 2,682 26.8%

The dataset is well-balanced. No oversampling or class weighting was applied.

Training Hyperparameters

Parameter Value
Epochs 5
Batch size 32
Learning rate 2e-5
Weight decay 0.01
Warmup ratio 10%
Max sequence length 128
Optimizer AdamW
Scheduler Linear warmup + decay
Gradient clipping 1.0
Loss Cross-entropy

Training Infrastructure

  • Hardware: NVIDIA T4 GPU (Google Colab)
  • Framework: PyTorch + Hugging Face Transformers
  • Train / Val split: 85% / 15% stratified by label

Evaluation Results

Sample inference results on held-out test sentences:

Input Predicted Level Confidence Verdict
ๆœฌๆ—ฅใฏใŠๆ—ฉใ„ใฎใงใ™ใญใ€ใŠๆ•ฃๆญฉใงใ™ใ‹๏ผŸ 2 โ€” ๆ•ฌ่ชž 0.598 โœ… Pass
ใ”ๅคš็”จไธญใซใ‚‚ใ‹ใ‹ใ‚ใ‚‰ใšใ€ใ‚ˆใใŠๅ‡บใใ ใ•ใ„ใพใ—ใŸใ€‚ 2 โ€” ๆ•ฌ่ชž 0.557 โœ… Pass
ใŠๅ•ใ„ๅˆใ‚ใ›ใ‚’ใ„ใŸใ ใ„ใŸๅ•†ๅ“ใŒใ€ๆœฌๆ—ฅๅ…ฅ่ทใ—ใพใ—ใŸใ€‚ 3 โ€” ไธๅฏง่ชž 0.740 โŒ Fail
ไปŠๆ—ฅใฏใ†ใฉใ‚“ใซใ™ใ‚‹ใ€‚ 4 โ€” ๆ™ฎ้€š่ชž 0.993 โŒ Fail
ๅฟ™ใ—ใ„ใฎใซใ€ใ‚ˆใๆฅใŸใญใ€‚ 4 โ€” ๆ™ฎ้€š่ชž 0.996 โŒ Fail

Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores.


Limitations

  • The model evaluates transcribed text, not raw audio. Whisper transcription quality directly affects evaluation accuracy โ€” whisper medium or whisper large is recommended for Japanese.
  • Confidence scores below 0.60 on a passing result indicate borderline speech โ€” consider flagging for human review.
  • The model classifies overall politeness level and does not identify specific keigo errors (e.g. incorrect verb conjugation).
  • Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese.

Citation

If you use this model, please cite the KeiCO corpus and the base model:

Base model: Tohoku NLP Lab, BERT-base Japanese v3
Dataset: KeiCO Corpus โ€” Japanese Keigo Classification Corpus
Fine-tuned by: Ishraq (B-JET Ideathon 2026 โ€” Smart Service Evaluator)