--- language: - ja license: mit base_model: tohoku-nlp/bert-base-japanese-v3 tags: - japanese - keigo - text-classification - omotenashi - hospitality - bert pipeline_tag: text-classification --- # Keigo Evaluator — 敬語レベル分類モデル A fine-tuned Japanese BERT model that classifies the politeness level (敬語レベル) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate **Keigo (敬語)** and **Omotenashi (おもてなし)** standards in a hospitality or service context. --- ## Intended Use This model is the NLP component of an AI-powered service quality evaluation pipeline: ``` Voice Recording → Whisper ASR → Transcribed Text → This Model → Keigo Verdict ``` It is intended for: - Evaluating employee speech quality in hospitality and customer service settings - Automated Keigo compliance checking in call centres or hotel/restaurant environments - Quality assurance systems for Japanese service staff training --- ## Labels The model predicts one of four classes: | Label | Level | Name | Description | Service Verdict | |-------|-------|------|-------------|-----------------| | LABEL_0 | 1 | 最高敬語 | Highest honorific — sonkeigo dominant | ✅ Pass | | LABEL_1 | 2 | 敬語 | Standard honorific — appropriate for most service contexts | ✅ Pass | | LABEL_2 | 3 | 丁寧語 | Polite but not honorific — insufficient for hospitality | ❌ Fail | | LABEL_3 | 4 | 普通語 | Casual / plain speech — inappropriate in service contexts | ❌ Fail | --- ## How to Use ### Installation ```bash pip install transformers torch fugashi unidic-lite ``` > **Note**: `unidic-lite` is required (not `ipadic`) — this model uses the UniDic dictionary for MeCab tokenization. ### Basic Usage ```python from transformers import pipeline import torch classifier = pipeline( 'text-classification', model='ishraq/keigo-evaluator', device=0 if torch.cuda.is_available() else -1 ) LEVEL_MAP = { 'LABEL_0': {'level': 1, 'name': '最高敬語', 'passed': True}, 'LABEL_1': {'level': 2, 'name': '敬語', 'passed': True}, 'LABEL_2': {'level': 3, 'name': '丁寧語', 'passed': False}, 'LABEL_3': {'level': 4, 'name': '普通語', 'passed': False}, } def evaluate_keigo(text: str) -> dict: result = classifier(text)[0] info = LEVEL_MAP[result['label']] return { 'text': text, 'level': info['level'], 'level_name': info['name'], 'confidence': round(result['score'], 3), 'passed': info['passed'], 'verdict': '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています' } print(evaluate_keigo('いらっしゃいませ。本日はどのようなご用件でございましょうか?')) # {'level': 1, 'level_name': '最高敬語', 'confidence': 0.91, 'passed': True, 'verdict': '✅ 適切な敬語です'} print(evaluate_keigo('ちょっと待って。')) # {'level': 4, 'level_name': '普通語', 'confidence': 0.99, 'passed': False, 'verdict': '❌ 敬語レベルが不足しています'} ``` ### Full Voice Pipeline (Whisper + Keigo Evaluator) ```python import whisper from transformers import pipeline import torch asr = whisper.load_model('medium') classifier = pipeline( 'text-classification', model='ishraq/keigo-evaluator', device=0 if torch.cuda.is_available() else -1 ) def evaluate_recording(audio_path: str) -> dict: transcript = asr.transcribe(audio_path, language='ja')['text'] result = classifier(transcript)[0] info = LEVEL_MAP[result['label']] return { 'transcript': transcript, 'level': info['level'], 'level_name': info['name'], 'confidence': round(result['score'], 3), 'passed': info['passed'], 'verdict': '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています' } result = evaluate_recording('employee_call.mp3') print(result) ``` --- ## Training Details ### Dataset **KeiCO Corpus** — a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjōgo / teineigo) across a wide range of service situations including greetings (挨拶), apologies (謝る), meetings (会う), and seasonal expressions (季節). | Level | Count | % | |-------|-------|---| | 1 — 最高敬語 | 2,584 | 25.8% | | 2 — 敬語 | 2,044 | 20.4% | | 3 — 丁寧語 | 2,692 | 26.9% | | 4 — 普通語 | 2,682 | 26.8% | The dataset is well-balanced. No oversampling or class weighting was applied. ### Training Hyperparameters | Parameter | Value | |-----------|-------| | Epochs | 5 | | Batch size | 32 | | Learning rate | 2e-5 | | Weight decay | 0.01 | | Warmup ratio | 10% | | Max sequence length | 128 | | Optimizer | AdamW | | Scheduler | Linear warmup + decay | | Gradient clipping | 1.0 | | Loss | Cross-entropy | ### Training Infrastructure - **Hardware**: NVIDIA T4 GPU (Google Colab) - **Framework**: PyTorch + Hugging Face Transformers - **Train / Val split**: 85% / 15% stratified by label --- ## Evaluation Results Sample inference results on held-out test sentences: | Input | Predicted Level | Confidence | Verdict | |-------|----------------|------------|---------| | 本日はお早いのですね、お散歩ですか? | 2 — 敬語 | 0.598 | ✅ Pass | | ご多用中にもかかわらず、よくお出くださいました。 | 2 — 敬語 | 0.557 | ✅ Pass | | お問い合わせをいただいた商品が、本日入荷しました。 | 3 — 丁寧語 | 0.740 | ❌ Fail | | 今日はうどんにする。 | 4 — 普通語 | 0.993 | ❌ Fail | | 忙しいのに、よく来たね。 | 4 — 普通語 | 0.996 | ❌ Fail | Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores. --- ## Limitations - The model evaluates **transcribed text**, not raw audio. Whisper transcription quality directly affects evaluation accuracy — `whisper medium` or `whisper large` is recommended for Japanese. - Confidence scores below **0.60** on a passing result indicate borderline speech — consider flagging for human review. - The model classifies overall politeness level and does **not** identify specific keigo errors (e.g. incorrect verb conjugation). - Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese. --- ## Citation If you use this model, please cite the KeiCO corpus and the base model: ``` Base model: Tohoku NLP Lab, BERT-base Japanese v3 Dataset: KeiCO Corpus — Japanese Keigo Classification Corpus Fine-tuned by: Ishraq (B-JET Ideathon 2026 — Smart Service Evaluator) ```