--- language: ja tags: - modernbert - japanese - emergency-call - phase-detection - boundary-detection license: apache-2.0 datasets: - custom metrics: - accuracy - f1 --- # NEC-119 ModernBERT Phase & Boundary Detector ## Model Description This model is fine-tuned from `sbintuitions/modernbert-ja-310m` for Japanese emergency call (119) transcript analysis. It performs two tasks simultaneously: 1. **Phase Classification**: Classifies conversation phases (INIT/LOC/INC/SUP) 2. **Boundary Detection**: Detects phase boundaries in conversation ## Training Details - **Base Model**: sbintuitions/modernbert-ja-310m - **Training Data**: 45,483 instances from Japanese emergency call transcripts - **Validation Data**: 4,984 instances - **Test Data**: 9,605 instances - **Training Configuration**: - Epochs: 5 - Batch Size: 16 (effective 32 with gradient accumulation) - Learning Rate: 1e-5 - Max Sequence Length: 1024 tokens - Optimizer: AdamW - Scheduler: Cosine ## Performance ### Test Set Results (After 1 epoch) - **Phase Classification Accuracy**: 84.9% - **Boundary Detection Accuracy**: 94.6% - **Phase F1-Macro**: 0.813 - **Boundary F1**: 0.626 - **Both Correct Accuracy**: 81.8% ## Usage ```python from transformers import AutoTokenizer, AutoModel import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("your-username/nec119-modernbert-phase-boundary") model = AutoModel.from_pretrained("your-username/nec119-modernbert-phase-boundary") # Prepare input context = "previous conversation text" current_utterance = "current line to classify" inputs = tokenizer(context, current_utterance, return_tensors="pt", max_length=1024, truncation=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) # Extract predictions from outputs ``` ## Phase Labels - **INIT (0)**: Initial phase - **LOC (1)**: Location identification phase - **INC (2)**: Incident details phase - **SUP (3)**: Support/supplementary phase ## Limitations This model is specifically trained for Japanese emergency call transcripts and may not generalize well to other domains or conversation types. ## License Apache 2.0