File size: 6,840 Bytes

---
language:
- ja
license: mit
base_model: tohoku-nlp/bert-base-japanese-v3
tags:
- japanese
- keigo
- text-classification
- omotenashi
- hospitality
- bert
pipeline_tag: text-classification
---

# Keigo Evaluator — 敬語レベル分類モデル

A fine-tuned Japanese BERT model that classifies the politeness level (敬語レベル) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate **Keigo (敬語)** and **Omotenashi (おもてなし)** standards in a hospitality or service context.

---

## Intended Use

This model is the NLP component of an AI-powered service quality evaluation pipeline:

```
Voice Recording → Whisper ASR → Transcribed Text → This Model → Keigo Verdict
```

It is intended for:
- Evaluating employee speech quality in hospitality and customer service settings
- Automated Keigo compliance checking in call centres or hotel/restaurant environments
- Quality assurance systems for Japanese service staff training

---

## Labels

The model predicts one of four classes:

| Label | Level | Name | Description | Service Verdict |
|-------|-------|------|-------------|-----------------|
| LABEL_0 | 1 | 最高敬語 | Highest honorific — sonkeigo dominant | ✅ Pass |
| LABEL_1 | 2 | 敬語 | Standard honorific — appropriate for most service contexts | ✅ Pass |
| LABEL_2 | 3 | 丁寧語 | Polite but not honorific — insufficient for hospitality | ❌ Fail |
| LABEL_3 | 4 | 普通語 | Casual / plain speech — inappropriate in service contexts | ❌ Fail |

---

## How to Use

### Installation

```bash
pip install transformers torch fugashi unidic-lite
```

> **Note**: `unidic-lite` is required (not `ipadic`) — this model uses the UniDic dictionary for MeCab tokenization.

### Basic Usage

```python
from transformers import pipeline
import torch

classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

LEVEL_MAP = {
    'LABEL_0': {'level': 1, 'name': '最高敬語', 'passed': True},
    'LABEL_1': {'level': 2, 'name': '敬語',     'passed': True},
    'LABEL_2': {'level': 3, 'name': '丁寧語',   'passed': False},
    'LABEL_3': {'level': 4, 'name': '普通語',   'passed': False},
}

def evaluate_keigo(text: str) -> dict:
    result = classifier(text)[0]
    info   = LEVEL_MAP[result['label']]
    return {
        'text':       text,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています'
    }

print(evaluate_keigo('いらっしゃいませ。本日はどのようなご用件でございましょうか？'))
# {'level': 1, 'level_name': '最高敬語', 'confidence': 0.91, 'passed': True, 'verdict': '✅ 適切な敬語です'}

print(evaluate_keigo('ちょっと待って。'))
# {'level': 4, 'level_name': '普通語', 'confidence': 0.99, 'passed': False, 'verdict': '❌ 敬語レベルが不足しています'}
```

### Full Voice Pipeline (Whisper + Keigo Evaluator)

```python
import whisper
from transformers import pipeline
import torch

asr        = whisper.load_model('medium')
classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

def evaluate_recording(audio_path: str) -> dict:
    transcript = asr.transcribe(audio_path, language='ja')['text']
    result     = classifier(transcript)[0]
    info       = LEVEL_MAP[result['label']]
    return {
        'transcript': transcript,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています'
    }

result = evaluate_recording('employee_call.mp3')
print(result)
```

---

## Training Details

### Dataset

**KeiCO Corpus** — a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjōgo / teineigo) across a wide range of service situations including greetings (挨拶), apologies (謝る), meetings (会う), and seasonal expressions (季節).

| Level | Count | % |
|-------|-------|---|
| 1 — 最高敬語 | 2,584 | 25.8% |
| 2 — 敬語     | 2,044 | 20.4% |
| 3 — 丁寧語   | 2,692 | 26.9% |
| 4 — 普通語   | 2,682 | 26.8% |

The dataset is well-balanced. No oversampling or class weighting was applied.

### Training Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 5 |
| Batch size | 32 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 10% |
| Max sequence length | 128 |
| Optimizer | AdamW |
| Scheduler | Linear warmup + decay |
| Gradient clipping | 1.0 |
| Loss | Cross-entropy |

### Training Infrastructure

- **Hardware**: NVIDIA T4 GPU (Google Colab)
- **Framework**: PyTorch + Hugging Face Transformers
- **Train / Val split**: 85% / 15% stratified by label

---

## Evaluation Results

Sample inference results on held-out test sentences:

| Input | Predicted Level | Confidence | Verdict |
|-------|----------------|------------|---------|
| 本日はお早いのですね、お散歩ですか？ | 2 — 敬語 | 0.598 | ✅ Pass |
| ご多用中にもかかわらず、よくお出くださいました。 | 2 — 敬語 | 0.557 | ✅ Pass |
| お問い合わせをいただいた商品が、本日入荷しました。 | 3 — 丁寧語 | 0.740 | ❌ Fail |
| 今日はうどんにする。 | 4 — 普通語 | 0.993 | ❌ Fail |
| 忙しいのに、よく来たね。 | 4 — 普通語 | 0.996 | ❌ Fail |

Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores.

---

## Limitations

- The model evaluates **transcribed text**, not raw audio. Whisper transcription quality directly affects evaluation accuracy — `whisper medium` or `whisper large` is recommended for Japanese.
- Confidence scores below **0.60** on a passing result indicate borderline speech — consider flagging for human review.
- The model classifies overall politeness level and does **not** identify specific keigo errors (e.g. incorrect verb conjugation).
- Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese.

---

## Citation

If you use this model, please cite the KeiCO corpus and the base model:

```
Base model: Tohoku NLP Lab, BERT-base Japanese v3
Dataset: KeiCO Corpus — Japanese Keigo Classification Corpus
Fine-tuned by: Ishraq (B-JET Ideathon 2026 — Smart Service Evaluator)
```