NEC-119 ModernBERT Phase & Boundary Detector

Model Description

This model is fine-tuned from sbintuitions/modernbert-ja-310m for Japanese emergency call (119) transcript analysis. It performs two tasks simultaneously:

Phase Classification: Classifies conversation phases (INIT/LOC/INC/SUP)
Boundary Detection: Detects phase boundaries in conversation

Training Details

Base Model: sbintuitions/modernbert-ja-310m
Training Data: 45,483 instances from Japanese emergency call transcripts
Validation Data: 4,984 instances
Test Data: 9,605 instances
Training Configuration:
- Epochs: 5
- Batch Size: 16 (effective 32 with gradient accumulation)
- Learning Rate: 1e-5
- Max Sequence Length: 1024 tokens
- Optimizer: AdamW
- Scheduler: Cosine

Performance

Test Set Results (After 1 epoch)

Phase Classification Accuracy: 84.9%
Boundary Detection Accuracy: 94.6%
Phase F1-Macro: 0.813
Boundary F1: 0.626
Both Correct Accuracy: 81.8%

Usage

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/nec119-modernbert-phase-boundary")
model = AutoModel.from_pretrained("your-username/nec119-modernbert-phase-boundary")

# Prepare input
context = "previous conversation text"
current_utterance = "current line to classify"
inputs = tokenizer(context, current_utterance, return_tensors="pt", max_length=1024, truncation=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    # Extract predictions from outputs

Phase Labels

INIT (0): Initial phase
LOC (1): Location identification phase
INC (2): Incident details phase
SUP (3): Support/supplementary phase

Limitations

This model is specifically trained for Japanese emergency call transcripts and may not generalize well to other domains or conversation types.

License

Apache 2.0

Downloads last month: 2

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support