Update README.md

582e120 verified 10 days ago

6.84 kB

language:
  - ja
license: mit
base_model: tohoku-nlp/bert-base-japanese-v3
tags:
  - japanese
  - keigo
  - text-classification
  - omotenashi
  - hospitality
  - bert
pipeline_tag: text-classification

Keigo Evaluator — 敬語レベル分類モデル

A fine-tuned Japanese BERT model that classifies the politeness level (敬語レベル) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate Keigo (敬語) and Omotenashi (おもてなし) standards in a hospitality or service context.

Intended Use

This model is the NLP component of an AI-powered service quality evaluation pipeline:

Voice Recording → Whisper ASR → Transcribed Text → This Model → Keigo Verdict

It is intended for:

Evaluating employee speech quality in hospitality and customer service settings
Automated Keigo compliance checking in call centres or hotel/restaurant environments
Quality assurance systems for Japanese service staff training

Labels

The model predicts one of four classes:

Label	Level	Name	Description	Service Verdict
LABEL_0	1	最高敬語	Highest honorific — sonkeigo dominant	✅ Pass
LABEL_1	2	敬語	Standard honorific — appropriate for most service contexts	✅ Pass
LABEL_2	3	丁寧語	Polite but not honorific — insufficient for hospitality	❌ Fail
LABEL_3	4	普通語	Casual / plain speech — inappropriate in service contexts	❌ Fail

How to Use

Installation

pip install transformers torch fugashi unidic-lite

Note: unidic-lite is required (not ipadic) — this model uses the UniDic dictionary for MeCab tokenization.

Basic Usage

from transformers import pipeline
import torch

classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

LEVEL_MAP = {
    'LABEL_0': {'level': 1, 'name': '最高敬語', 'passed': True},
    'LABEL_1': {'level': 2, 'name': '敬語',     'passed': True},
    'LABEL_2': {'level': 3, 'name': '丁寧語',   'passed': False},
    'LABEL_3': {'level': 4, 'name': '普通語',   'passed': False},
}

def evaluate_keigo(text: str) -> dict:
    result = classifier(text)[0]
    info   = LEVEL_MAP[result['label']]
    return {
        'text':       text,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています'
    }

print(evaluate_keigo('いらっしゃいませ。本日はどのようなご用件でございましょうか？'))
# {'level': 1, 'level_name': '最高敬語', 'confidence': 0.91, 'passed': True, 'verdict': '✅ 適切な敬語です'}

print(evaluate_keigo('ちょっと待って。'))
# {'level': 4, 'level_name': '普通語', 'confidence': 0.99, 'passed': False, 'verdict': '❌ 敬語レベルが不足しています'}

Full Voice Pipeline (Whisper + Keigo Evaluator)

import whisper
from transformers import pipeline
import torch

asr        = whisper.load_model('medium')
classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

def evaluate_recording(audio_path: str) -> dict:
    transcript = asr.transcribe(audio_path, language='ja')['text']
    result     = classifier(transcript)[0]
    info       = LEVEL_MAP[result['label']]
    return {
        'transcript': transcript,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています'
    }

result = evaluate_recording('employee_call.mp3')
print(result)

Training Details

Dataset

KeiCO Corpus — a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjōgo / teineigo) across a wide range of service situations including greetings (挨拶), apologies (謝る), meetings (会う), and seasonal expressions (季節).

Level	Count	%
1 — 最高敬語	2,584	25.8%
2 — 敬語	2,044	20.4%
3 — 丁寧語	2,692	26.9%
4 — 普通語	2,682	26.8%

The dataset is well-balanced. No oversampling or class weighting was applied.

Training Hyperparameters

Parameter	Value
Epochs	5
Batch size	32
Learning rate	2e-5
Weight decay	0.01
Warmup ratio	10%
Max sequence length	128
Optimizer	AdamW
Scheduler	Linear warmup + decay
Gradient clipping	1.0
Loss	Cross-entropy

Training Infrastructure

Hardware: NVIDIA T4 GPU (Google Colab)
Framework: PyTorch + Hugging Face Transformers
Train / Val split: 85% / 15% stratified by label

Evaluation Results

Sample inference results on held-out test sentences:

Input	Predicted Level	Confidence	Verdict
本日はお早いのですね、お散歩ですか？	2 — 敬語	0.598	✅ Pass
ご多用中にもかかわらず、よくお出くださいました。	2 — 敬語	0.557	✅ Pass
お問い合わせをいただいた商品が、本日入荷しました。	3 — 丁寧語	0.740	❌ Fail
今日はうどんにする。	4 — 普通語	0.993	❌ Fail
忙しいのに、よく来たね。	4 — 普通語	0.996	❌ Fail

Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores.

Limitations

The model evaluates transcribed text, not raw audio. Whisper transcription quality directly affects evaluation accuracy — whisper medium or whisper large is recommended for Japanese.
Confidence scores below 0.60 on a passing result indicate borderline speech — consider flagging for human review.
The model classifies overall politeness level and does not identify specific keigo errors (e.g. incorrect verb conjugation).
Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese.

Citation

If you use this model, please cite the KeiCO corpus and the base model:

Base model: Tohoku NLP Lab, BERT-base Japanese v3
Dataset: KeiCO Corpus — Japanese Keigo Classification Corpus
Fine-tuned by: Ishraq (B-JET Ideathon 2026 — Smart Service Evaluator)