NEC-119 ModernBERT Phase & Boundary Detector

Model Description

This model is fine-tuned from sbintuitions/modernbert-ja-310m for Japanese emergency call (119) transcript analysis. It performs two tasks simultaneously:

  1. Phase Classification: Classifies conversation phases (INIT/LOC/INC/SUP)
  2. Boundary Detection: Detects phase boundaries in conversation

Training Details

  • Base Model: sbintuitions/modernbert-ja-310m
  • Training Data: 45,483 instances from Japanese emergency call transcripts
  • Validation Data: 4,984 instances
  • Test Data: 9,605 instances
  • Training Configuration:
    • Epochs: 5
    • Batch Size: 16 (effective 32 with gradient accumulation)
    • Learning Rate: 1e-5
    • Max Sequence Length: 1024 tokens
    • Optimizer: AdamW
    • Scheduler: Cosine

Performance

Test Set Results (After 1 epoch)

  • Phase Classification Accuracy: 84.9%
  • Boundary Detection Accuracy: 94.6%
  • Phase F1-Macro: 0.813
  • Boundary F1: 0.626
  • Both Correct Accuracy: 81.8%

Usage

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/nec119-modernbert-phase-boundary")
model = AutoModel.from_pretrained("your-username/nec119-modernbert-phase-boundary")

# Prepare input
context = "previous conversation text"
current_utterance = "current line to classify"
inputs = tokenizer(context, current_utterance, return_tensors="pt", max_length=1024, truncation=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    # Extract predictions from outputs

Phase Labels

  • INIT (0): Initial phase
  • LOC (1): Location identification phase
  • INC (2): Incident details phase
  • SUP (3): Support/supplementary phase

Limitations

This model is specifically trained for Japanese emergency call transcripts and may not generalize well to other domains or conversation types.

License

Apache 2.0

Downloads last month
8
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support