Update README.md

582e120 verified 12 days ago

6.84 kB

	---
	language:
	- ja
	license: mit
	base_model: tohoku-nlp/bert-base-japanese-v3
	tags:
	- japanese
	- keigo
	- text-classification
	- omotenashi
	- hospitality
	- bert
	pipeline_tag: text-classification
	---

	# Keigo Evaluator — 敬語レベル分類モデル

	A fine-tuned Japanese BERT model that classifies the politeness level (敬語レベル) of Japanese speech into four levels. Designed to evaluate whether an employee is speaking with appropriate Keigo (敬語) and Omotenashi (おもてなし) standards in a hospitality or service context.

	---

	## Intended Use

	This model is the NLP component of an AI-powered service quality evaluation pipeline:

	```
	Voice Recording → Whisper ASR → Transcribed Text → This Model → Keigo Verdict
	```

	It is intended for:
	- Evaluating employee speech quality in hospitality and customer service settings
	- Automated Keigo compliance checking in call centres or hotel/restaurant environments
	- Quality assurance systems for Japanese service staff training

	---

	## Labels

	The model predicts one of four classes:

	\| Label \| Level \| Name \| Description \| Service Verdict \|
	\|-------\|-------\|------\|-------------\|-----------------\|
	\| LABEL_0 \| 1 \| 最高敬語 \| Highest honorific — sonkeigo dominant \| ✅ Pass \|
	\| LABEL_1 \| 2 \| 敬語 \| Standard honorific — appropriate for most service contexts \| ✅ Pass \|
	\| LABEL_2 \| 3 \| 丁寧語 \| Polite but not honorific — insufficient for hospitality \| ❌ Fail \|
	\| LABEL_3 \| 4 \| 普通語 \| Casual / plain speech — inappropriate in service contexts \| ❌ Fail \|

	---

	## How to Use

	### Installation

	```bash
	pip install transformers torch fugashi unidic-lite
	```

	> Note: `unidic-lite` is required (not `ipadic`) — this model uses the UniDic dictionary for MeCab tokenization.

	### Basic Usage

	```python
	from transformers import pipeline
	import torch

	classifier = pipeline(
	'text-classification',
	model='ishraq/keigo-evaluator',
	device=0 if torch.cuda.is_available() else -1
	)

	LEVEL_MAP = {
	'LABEL_0': {'level': 1, 'name': '最高敬語', 'passed': True},
	'LABEL_1': {'level': 2, 'name': '敬語', 'passed': True},
	'LABEL_2': {'level': 3, 'name': '丁寧語', 'passed': False},
	'LABEL_3': {'level': 4, 'name': '普通語', 'passed': False},
	}

	def evaluate_keigo(text: str) -> dict:
	result = classifier(text)[0]
	info = LEVEL_MAP[result['label']]
	return {
	'text': text,
	'level': info['level'],
	'level_name': info['name'],
	'confidence': round(result['score'], 3),
	'passed': info['passed'],
	'verdict': '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています'
	}

	print(evaluate_keigo('いらっしゃいませ。本日はどのようなご用件でございましょうか？'))
	# {'level': 1, 'level_name': '最高敬語', 'confidence': 0.91, 'passed': True, 'verdict': '✅ 適切な敬語です'}

	print(evaluate_keigo('ちょっと待って。'))
	# {'level': 4, 'level_name': '普通語', 'confidence': 0.99, 'passed': False, 'verdict': '❌ 敬語レベルが不足しています'}
	```

	### Full Voice Pipeline (Whisper + Keigo Evaluator)

	```python
	import whisper
	from transformers import pipeline
	import torch

	asr = whisper.load_model('medium')
	classifier = pipeline(
	'text-classification',
	model='ishraq/keigo-evaluator',
	device=0 if torch.cuda.is_available() else -1
	)

	def evaluate_recording(audio_path: str) -> dict:
	transcript = asr.transcribe(audio_path, language='ja')['text']
	result = classifier(transcript)[0]
	info = LEVEL_MAP[result['label']]
	return {
	'transcript': transcript,
	'level': info['level'],
	'level_name': info['name'],
	'confidence': round(result['score'], 3),
	'passed': info['passed'],
	'verdict': '✅ 適切な敬語です' if info['passed'] else '❌ 敬語レベルが不足しています'
	}

	result = evaluate_recording('employee_call.mp3')
	print(result)
	```

	---

	## Training Details

	### Dataset

	KeiCO Corpus — a Japanese keigo classification corpus of 10,002 sentences labelled by politeness level and keigo type (sonkeigo / kenjōgo / teineigo) across a wide range of service situations including greetings (挨拶), apologies (謝る), meetings (会う), and seasonal expressions (季節).

	\| Level \| Count \| % \|
	\|-------\|-------\|---\|
	\| 1 — 最高敬語 \| 2,584 \| 25.8% \|
	\| 2 — 敬語 \| 2,044 \| 20.4% \|
	\| 3 — 丁寧語 \| 2,692 \| 26.9% \|
	\| 4 — 普通語 \| 2,682 \| 26.8% \|

	The dataset is well-balanced. No oversampling or class weighting was applied.

	### Training Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Epochs \| 5 \|
	\| Batch size \| 32 \|
	\| Learning rate \| 2e-5 \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 10% \|
	\| Max sequence length \| 128 \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Linear warmup + decay \|
	\| Gradient clipping \| 1.0 \|
	\| Loss \| Cross-entropy \|

	### Training Infrastructure

	- Hardware: NVIDIA T4 GPU (Google Colab)
	- Framework: PyTorch + Hugging Face Transformers
	- Train / Val split: 85% / 15% stratified by label

	---

	## Evaluation Results

	Sample inference results on held-out test sentences:

	\| Input \| Predicted Level \| Confidence \| Verdict \|
	\|-------\|----------------\|------------\|---------\|
	\| 本日はお早いのですね、お散歩ですか？ \| 2 — 敬語 \| 0.598 \| ✅ Pass \|
	\| ご多用中にもかかわらず、よくお出くださいました。 \| 2 — 敬語 \| 0.557 \| ✅ Pass \|
	\| お問い合わせをいただいた商品が、本日入荷しました。 \| 3 — 丁寧語 \| 0.740 \| ❌ Fail \|
	\| 今日はうどんにする。 \| 4 — 普通語 \| 0.993 \| ❌ Fail \|
	\| 忙しいのに、よく来たね。 \| 4 — 普通語 \| 0.996 \| ❌ Fail \|

	Casual speech (Level 4) is detected with near-perfect confidence. Borderline honorific sentences show appropriately lower confidence scores.

	---

	## Limitations

	- The model evaluates transcribed text, not raw audio. Whisper transcription quality directly affects evaluation accuracy — `whisper medium` or `whisper large` is recommended for Japanese.
	- Confidence scores below 0.60 on a passing result indicate borderline speech — consider flagging for human review.
	- The model classifies overall politeness level and does not identify specific keigo errors (e.g. incorrect verb conjugation).
	- Accuracy may be lower for highly domain-specific speech such as medical or legal Japanese.

	---

	## Citation

	If you use this model, please cite the KeiCO corpus and the base model:

	```
	Base model: Tohoku NLP Lab, BERT-base Japanese v3
	Dataset: KeiCO Corpus — Japanese Keigo Classification Corpus
	Fine-tuned by: Ishraq (B-JET Ideathon 2026 — Smart Service Evaluator)
	```