ner-camembert / README.md
maia2000's picture
Upload NER model camembert
2d86d6e verified
# CamemBERT Model - NEEDS RETRAINING
## Current Status: OUTDATED
This model was trained with CamembertTokenizer (old) but the code now uses CamembertTokenizerFast (required for NER).
### Symptoms:
- Wrong entity predictions (Paris PER instead of DEP)
- "Departure: Not found" even when cities are mentioned
- Mixed up entity types (VIA, MISC instead of DEP, ARR)
### Fix: Retrain the Model
**Option 1: Using Notebook (Recommended)**
`
Open: notebooks/Complete_NER_Training_and_Analysis.ipynb
Run: Cell 16 (Part 3.2 - Train CamemBERT)
Time: ~10-15 min (GPU) or ~45 min (CPU)
`
**Option 2: Python Script**
`python
from src.train_camembert import CamemBERTNERTrainer
trainer = CamemBERTNERTrainer("camembert-base", "models/camembert", 128)
trainer.train("data/train.csv", "data/val.csv", epochs=3, batch_size=16)
`
### After Retraining:
`ash
python app.py --model models/camembert --port 5000
# Should now correctly predict: "Paris Lyon" DEP: Paris, ARR: Lyon
`
---
Last Updated: 2026-02-02