| # CamemBERT Model - NEEDS RETRAINING |
|
|
| ## Current Status: OUTDATED |
|
|
| This model was trained with CamembertTokenizer (old) but the code now uses CamembertTokenizerFast (required for NER). |
|
|
| ### Symptoms: |
| - Wrong entity predictions (Paris PER instead of DEP) |
| - "Departure: Not found" even when cities are mentioned |
| - Mixed up entity types (VIA, MISC instead of DEP, ARR) |
|
|
| ### Fix: Retrain the Model |
|
|
| **Option 1: Using Notebook (Recommended)** |
| ` |
| Open: notebooks/Complete_NER_Training_and_Analysis.ipynb |
| Run: Cell 16 (Part 3.2 - Train CamemBERT) |
| Time: ~10-15 min (GPU) or ~45 min (CPU) |
| ` |
|
|
| **Option 2: Python Script** |
| `python |
| from src.train_camembert import CamemBERTNERTrainer |
| trainer = CamemBERTNERTrainer("camembert-base", "models/camembert", 128) |
| trainer.train("data/train.csv", "data/val.csv", epochs=3, batch_size=16) |
| ` |
|
|
| ### After Retraining: |
| `ash |
| python app.py --model models/camembert --port 5000 |
| # Should now correctly predict: "Paris Lyon" DEP: Paris, ARR: Lyon |
| ` |
|
|
| --- |
| Last Updated: 2026-02-02
|
|
|