File size: 6,840 Bytes
c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 c6cb844 582e120 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | ---
language:
- ja
license: mit
base_model: tohoku-nlp/bert-base-japanese-v3
tags:
- japanese
- keigo
- text-classification
- omotenashi
- hospitality
- bert
pipeline_tag: text-classification
---
# Keigo Evaluator โ ๆฌ่ชใฌใใซๅ้กใขใใซ
A fine-tuned Japanese BERT model that classifies the politeness level (ๆฌ่ชใฌใใซ) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate **Keigo (ๆฌ่ช)** and **Omotenashi (ใใใฆใชใ)** standards in a hospitality or service context.
---
## Intended Use
This model is the NLP component of an AI-powered service quality evaluation pipeline:
```
Voice Recording โ Whisper ASR โ Transcribed Text โ This Model โ Keigo Verdict
```
It is intended for:
- Evaluating employee speech quality in hospitality and customer service settings
- Automated Keigo compliance checking in call centres or hotel/restaurant environments
- Quality assurance systems for Japanese service staff training
---
## Labels
The model predicts one of four classes:
| Label | Level | Name | Description | Service Verdict |
|-------|-------|------|-------------|-----------------|
| LABEL_0 | 1 | ๆ้ซๆฌ่ช | Highest honorific โ sonkeigo dominant | โ
Pass |
| LABEL_1 | 2 | ๆฌ่ช | Standard honorific โ appropriate for most service contexts | โ
Pass |
| LABEL_2 | 3 | ไธๅฏง่ช | Polite but not honorific โ insufficient for hospitality | โ Fail |
| LABEL_3 | 4 | ๆฎ้่ช | Casual / plain speech โ inappropriate in service contexts | โ Fail |
---
## How to Use
### Installation
```bash
pip install transformers torch fugashi unidic-lite
```
> **Note**: `unidic-lite` is required (not `ipadic`) โ this model uses the UniDic dictionary for MeCab tokenization.
### Basic Usage
```python
from transformers import pipeline
import torch
classifier = pipeline(
'text-classification',
model='ishraq/keigo-evaluator',
device=0 if torch.cuda.is_available() else -1
)
LEVEL_MAP = {
'LABEL_0': {'level': 1, 'name': 'ๆ้ซๆฌ่ช', 'passed': True},
'LABEL_1': {'level': 2, 'name': 'ๆฌ่ช', 'passed': True},
'LABEL_2': {'level': 3, 'name': 'ไธๅฏง่ช', 'passed': False},
'LABEL_3': {'level': 4, 'name': 'ๆฎ้่ช', 'passed': False},
}
def evaluate_keigo(text: str) -> dict:
result = classifier(text)[0]
info = LEVEL_MAP[result['label']]
return {
'text': text,
'level': info['level'],
'level_name': info['name'],
'confidence': round(result['score'], 3),
'passed': info['passed'],
'verdict': 'โ
้ฉๅใชๆฌ่ชใงใ' if info['passed'] else 'โ ๆฌ่ชใฌใใซใไธ่ถณใใฆใใพใ'
}
print(evaluate_keigo('ใใใฃใใใใพใใๆฌๆฅใฏใฉใฎใใใชใ็จไปถใงใใใใพใใใใ๏ผ'))
# {'level': 1, 'level_name': 'ๆ้ซๆฌ่ช', 'confidence': 0.91, 'passed': True, 'verdict': 'โ
้ฉๅใชๆฌ่ชใงใ'}
print(evaluate_keigo('ใกใใฃใจๅพ
ใฃใฆใ'))
# {'level': 4, 'level_name': 'ๆฎ้่ช', 'confidence': 0.99, 'passed': False, 'verdict': 'โ ๆฌ่ชใฌใใซใไธ่ถณใใฆใใพใ'}
```
### Full Voice Pipeline (Whisper + Keigo Evaluator)
```python
import whisper
from transformers import pipeline
import torch
asr = whisper.load_model('medium')
classifier = pipeline(
'text-classification',
model='ishraq/keigo-evaluator',
device=0 if torch.cuda.is_available() else -1
)
def evaluate_recording(audio_path: str) -> dict:
transcript = asr.transcribe(audio_path, language='ja')['text']
result = classifier(transcript)[0]
info = LEVEL_MAP[result['label']]
return {
'transcript': transcript,
'level': info['level'],
'level_name': info['name'],
'confidence': round(result['score'], 3),
'passed': info['passed'],
'verdict': 'โ
้ฉๅใชๆฌ่ชใงใ' if info['passed'] else 'โ ๆฌ่ชใฌใใซใไธ่ถณใใฆใใพใ'
}
result = evaluate_recording('employee_call.mp3')
print(result)
```
---
## Training Details
### Dataset
**KeiCO Corpus** โ a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjลgo / teineigo) across a wide range of service situations including greetings (ๆจๆถ), apologies (่ฌใ), meetings (ไผใ), and seasonal expressions (ๅญฃ็ฏ).
| Level | Count | % |
|-------|-------|---|
| 1 โ ๆ้ซๆฌ่ช | 2,584 | 25.8% |
| 2 โ ๆฌ่ช | 2,044 | 20.4% |
| 3 โ ไธๅฏง่ช | 2,692 | 26.9% |
| 4 โ ๆฎ้่ช | 2,682 | 26.8% |
The dataset is well-balanced. No oversampling or class weighting was applied.
### Training Hyperparameters
| Parameter | Value |
|-----------|-------|
| Epochs | 5 |
| Batch size | 32 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 10% |
| Max sequence length | 128 |
| Optimizer | AdamW |
| Scheduler | Linear warmup + decay |
| Gradient clipping | 1.0 |
| Loss | Cross-entropy |
### Training Infrastructure
- **Hardware**: NVIDIA T4 GPU (Google Colab)
- **Framework**: PyTorch + Hugging Face Transformers
- **Train / Val split**: 85% / 15% stratified by label
---
## Evaluation Results
Sample inference results on held-out test sentences:
| Input | Predicted Level | Confidence | Verdict |
|-------|----------------|------------|---------|
| ๆฌๆฅใฏใๆฉใใฎใงใใญใใๆฃๆญฉใงใใ๏ผ | 2 โ ๆฌ่ช | 0.598 | โ
Pass |
| ใๅค็จไธญใซใใใใใใใใใใๅบใใ ใใใพใใใ | 2 โ ๆฌ่ช | 0.557 | โ
Pass |
| ใๅใๅใใใใใใ ใใๅๅใใๆฌๆฅๅ
ฅ่ทใใพใใใ | 3 โ ไธๅฏง่ช | 0.740 | โ Fail |
| ไปๆฅใฏใใฉใใซใใใ | 4 โ ๆฎ้่ช | 0.993 | โ Fail |
| ๅฟใใใฎใซใใใๆฅใใญใ | 4 โ ๆฎ้่ช | 0.996 | โ Fail |
Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores.
---
## Limitations
- The model evaluates **transcribed text**, not raw audio. Whisper transcription quality directly affects evaluation accuracy โ `whisper medium` or `whisper large` is recommended for Japanese.
- Confidence scores below **0.60** on a passing result indicate borderline speech โ consider flagging for human review.
- The model classifies overall politeness level and does **not** identify specific keigo errors (e.g. incorrect verb conjugation).
- Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese.
---
## Citation
If you use this model, please cite the KeiCO corpus and the base model:
```
Base model: Tohoku NLP Lab, BERT-base Japanese v3
Dataset: KeiCO Corpus โ Japanese Keigo Classification Corpus
Fine-tuned by: Ishraq (B-JET Ideathon 2026 โ Smart Service Evaluator)
``` |