# CamemBERT Model - NEEDS RETRAINING ## Current Status: OUTDATED This model was trained with CamembertTokenizer (old) but the code now uses CamembertTokenizerFast (required for NER). ### Symptoms: - Wrong entity predictions (Paris PER instead of DEP) - "Departure: Not found" even when cities are mentioned - Mixed up entity types (VIA, MISC instead of DEP, ARR) ### Fix: Retrain the Model **Option 1: Using Notebook (Recommended)** ` Open: notebooks/Complete_NER_Training_and_Analysis.ipynb Run: Cell 16 (Part 3.2 - Train CamemBERT) Time: ~10-15 min (GPU) or ~45 min (CPU) ` **Option 2: Python Script** `python from src.train_camembert import CamemBERTNERTrainer trainer = CamemBERTNERTrainer("camembert-base", "models/camembert", 128) trainer.train("data/train.csv", "data/val.csv", epochs=3, batch_size=16) ` ### After Retraining: `ash python app.py --model models/camembert --port 5000 # Should now correctly predict: "Paris Lyon" DEP: Paris, ARR: Lyon ` --- Last Updated: 2026-02-02