NEC-119 ModernBERT Phase & Boundary Detector
Model Description
This model is fine-tuned from sbintuitions/modernbert-ja-310m for Japanese emergency call (119) transcript analysis.
It performs two tasks simultaneously:
- Phase Classification: Classifies conversation phases (INIT/LOC/INC/SUP)
- Boundary Detection: Detects phase boundaries in conversation
Training Details
- Base Model: sbintuitions/modernbert-ja-310m
- Training Data: 45,483 instances from Japanese emergency call transcripts
- Validation Data: 4,984 instances
- Test Data: 9,605 instances
- Training Configuration:
- Epochs: 5
- Batch Size: 16 (effective 32 with gradient accumulation)
- Learning Rate: 1e-5
- Max Sequence Length: 1024 tokens
- Optimizer: AdamW
- Scheduler: Cosine
Performance
Test Set Results (After 1 epoch)
- Phase Classification Accuracy: 84.9%
- Boundary Detection Accuracy: 94.6%
- Phase F1-Macro: 0.813
- Boundary F1: 0.626
- Both Correct Accuracy: 81.8%
Usage
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/nec119-modernbert-phase-boundary")
model = AutoModel.from_pretrained("your-username/nec119-modernbert-phase-boundary")
# Prepare input
context = "previous conversation text"
current_utterance = "current line to classify"
inputs = tokenizer(context, current_utterance, return_tensors="pt", max_length=1024, truncation=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
# Extract predictions from outputs
Phase Labels
- INIT (0): Initial phase
- LOC (1): Location identification phase
- INC (2): Incident details phase
- SUP (3): Support/supplementary phase
Limitations
This model is specifically trained for Japanese emergency call transcripts and may not generalize well to other domains or conversation types.
License
Apache 2.0
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support