File size: 6,840 Bytes
c6cb844
582e120
 
 
 
 
 
 
 
 
 
 
 
c6cb844
 
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
 
 
c6cb844
582e120
 
 
 
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
 
 
 
 
 
c6cb844
582e120
c6cb844
582e120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6cb844
582e120
c6cb844
 
 
582e120
c6cb844
582e120
c6cb844
582e120
 
 
 
 
 
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
 
 
 
 
 
 
 
 
 
 
 
c6cb844
582e120
c6cb844
582e120
 
 
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
 
 
 
 
 
 
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
 
 
 
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
c6cb844
582e120
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
language:
- ja
license: mit
base_model: tohoku-nlp/bert-base-japanese-v3
tags:
- japanese
- keigo
- text-classification
- omotenashi
- hospitality
- bert
pipeline_tag: text-classification
---

# Keigo Evaluator โ€” ๆ•ฌ่ชžใƒฌใƒ™ใƒซๅˆ†้กžใƒขใƒ‡ใƒซ

A fine-tuned Japanese BERT model that classifies the politeness level (ๆ•ฌ่ชžใƒฌใƒ™ใƒซ) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate **Keigo (ๆ•ฌ่ชž)** and **Omotenashi (ใŠใ‚‚ใฆใชใ—)** standards in a hospitality or service context.

---

## Intended Use

This model is the NLP component of an AI-powered service quality evaluation pipeline:

```
Voice Recording โ†’ Whisper ASR โ†’ Transcribed Text โ†’ This Model โ†’ Keigo Verdict
```

It is intended for:
- Evaluating employee speech quality in hospitality and customer service settings
- Automated Keigo compliance checking in call centres or hotel/restaurant environments
- Quality assurance systems for Japanese service staff training

---

## Labels

The model predicts one of four classes:

| Label | Level | Name | Description | Service Verdict |
|-------|-------|------|-------------|-----------------|
| LABEL_0 | 1 | ๆœ€้ซ˜ๆ•ฌ่ชž | Highest honorific โ€” sonkeigo dominant | โœ… Pass |
| LABEL_1 | 2 | ๆ•ฌ่ชž | Standard honorific โ€” appropriate for most service contexts | โœ… Pass |
| LABEL_2 | 3 | ไธๅฏง่ชž | Polite but not honorific โ€” insufficient for hospitality | โŒ Fail |
| LABEL_3 | 4 | ๆ™ฎ้€š่ชž | Casual / plain speech โ€” inappropriate in service contexts | โŒ Fail |

---

## How to Use

### Installation

```bash
pip install transformers torch fugashi unidic-lite
```

> **Note**: `unidic-lite` is required (not `ipadic`) โ€” this model uses the UniDic dictionary for MeCab tokenization.

### Basic Usage

```python
from transformers import pipeline
import torch

classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

LEVEL_MAP = {
    'LABEL_0': {'level': 1, 'name': 'ๆœ€้ซ˜ๆ•ฌ่ชž', 'passed': True},
    'LABEL_1': {'level': 2, 'name': 'ๆ•ฌ่ชž',     'passed': True},
    'LABEL_2': {'level': 3, 'name': 'ไธๅฏง่ชž',   'passed': False},
    'LABEL_3': {'level': 4, 'name': 'ๆ™ฎ้€š่ชž',   'passed': False},
}

def evaluate_keigo(text: str) -> dict:
    result = classifier(text)[0]
    info   = LEVEL_MAP[result['label']]
    return {
        'text':       text,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    'โœ… ้ฉๅˆ‡ใชๆ•ฌ่ชžใงใ™' if info['passed'] else 'โŒ ๆ•ฌ่ชžใƒฌใƒ™ใƒซใŒไธ่ถณใ—ใฆใ„ใพใ™'
    }

print(evaluate_keigo('ใ„ใ‚‰ใฃใ—ใ‚ƒใ„ใพใ›ใ€‚ๆœฌๆ—ฅใฏใฉใฎใ‚ˆใ†ใชใ”็”จไปถใงใ”ใ–ใ„ใพใ—ใ‚‡ใ†ใ‹๏ผŸ'))
# {'level': 1, 'level_name': 'ๆœ€้ซ˜ๆ•ฌ่ชž', 'confidence': 0.91, 'passed': True, 'verdict': 'โœ… ้ฉๅˆ‡ใชๆ•ฌ่ชžใงใ™'}

print(evaluate_keigo('ใกใ‚‡ใฃใจๅพ…ใฃใฆใ€‚'))
# {'level': 4, 'level_name': 'ๆ™ฎ้€š่ชž', 'confidence': 0.99, 'passed': False, 'verdict': 'โŒ ๆ•ฌ่ชžใƒฌใƒ™ใƒซใŒไธ่ถณใ—ใฆใ„ใพใ™'}
```

### Full Voice Pipeline (Whisper + Keigo Evaluator)

```python
import whisper
from transformers import pipeline
import torch

asr        = whisper.load_model('medium')
classifier = pipeline(
    'text-classification',
    model='ishraq/keigo-evaluator',
    device=0 if torch.cuda.is_available() else -1
)

def evaluate_recording(audio_path: str) -> dict:
    transcript = asr.transcribe(audio_path, language='ja')['text']
    result     = classifier(transcript)[0]
    info       = LEVEL_MAP[result['label']]
    return {
        'transcript': transcript,
        'level':      info['level'],
        'level_name': info['name'],
        'confidence': round(result['score'], 3),
        'passed':     info['passed'],
        'verdict':    'โœ… ้ฉๅˆ‡ใชๆ•ฌ่ชžใงใ™' if info['passed'] else 'โŒ ๆ•ฌ่ชžใƒฌใƒ™ใƒซใŒไธ่ถณใ—ใฆใ„ใพใ™'
    }

result = evaluate_recording('employee_call.mp3')
print(result)
```

---

## Training Details

### Dataset

**KeiCO Corpus** โ€” a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjลgo / teineigo) across a wide range of service situations including greetings (ๆŒจๆ‹ถ), apologies (่ฌใ‚‹), meetings (ไผšใ†), and seasonal expressions (ๅญฃ็ฏ€).

| Level | Count | % |
|-------|-------|---|
| 1 โ€” ๆœ€้ซ˜ๆ•ฌ่ชž | 2,584 | 25.8% |
| 2 โ€” ๆ•ฌ่ชž     | 2,044 | 20.4% |
| 3 โ€” ไธๅฏง่ชž   | 2,692 | 26.9% |
| 4 โ€” ๆ™ฎ้€š่ชž   | 2,682 | 26.8% |

The dataset is well-balanced. No oversampling or class weighting was applied.

### Training Hyperparameters

| Parameter | Value |
|-----------|-------|
| Epochs | 5 |
| Batch size | 32 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 10% |
| Max sequence length | 128 |
| Optimizer | AdamW |
| Scheduler | Linear warmup + decay |
| Gradient clipping | 1.0 |
| Loss | Cross-entropy |

### Training Infrastructure

- **Hardware**: NVIDIA T4 GPU (Google Colab)
- **Framework**: PyTorch + Hugging Face Transformers
- **Train / Val split**: 85% / 15% stratified by label

---

## Evaluation Results

Sample inference results on held-out test sentences:

| Input | Predicted Level | Confidence | Verdict |
|-------|----------------|------------|---------|
| ๆœฌๆ—ฅใฏใŠๆ—ฉใ„ใฎใงใ™ใญใ€ใŠๆ•ฃๆญฉใงใ™ใ‹๏ผŸ | 2 โ€” ๆ•ฌ่ชž | 0.598 | โœ… Pass |
| ใ”ๅคš็”จไธญใซใ‚‚ใ‹ใ‹ใ‚ใ‚‰ใšใ€ใ‚ˆใใŠๅ‡บใใ ใ•ใ„ใพใ—ใŸใ€‚ | 2 โ€” ๆ•ฌ่ชž | 0.557 | โœ… Pass |
| ใŠๅ•ใ„ๅˆใ‚ใ›ใ‚’ใ„ใŸใ ใ„ใŸๅ•†ๅ“ใŒใ€ๆœฌๆ—ฅๅ…ฅ่ทใ—ใพใ—ใŸใ€‚ | 3 โ€” ไธๅฏง่ชž | 0.740 | โŒ Fail |
| ไปŠๆ—ฅใฏใ†ใฉใ‚“ใซใ™ใ‚‹ใ€‚ | 4 โ€” ๆ™ฎ้€š่ชž | 0.993 | โŒ Fail |
| ๅฟ™ใ—ใ„ใฎใซใ€ใ‚ˆใๆฅใŸใญใ€‚ | 4 โ€” ๆ™ฎ้€š่ชž | 0.996 | โŒ Fail |

Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores.

---

## Limitations

- The model evaluates **transcribed text**, not raw audio. Whisper transcription quality directly affects evaluation accuracy โ€” `whisper medium` or `whisper large` is recommended for Japanese.
- Confidence scores below **0.60** on a passing result indicate borderline speech โ€” consider flagging for human review.
- The model classifies overall politeness level and does **not** identify specific keigo errors (e.g. incorrect verb conjugation).
- Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese.

---

## Citation

If you use this model, please cite the KeiCO corpus and the base model:

```
Base model: Tohoku NLP Lab, BERT-base Japanese v3
Dataset: KeiCO Corpus โ€” Japanese Keigo Classification Corpus
Fine-tuned by: Ishraq (B-JET Ideathon 2026 โ€” Smart Service Evaluator)
```