--- license: mit language: - en base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification tags: - multilabel-classification - food-safety - product-category - hazard-category - bert - data-augmentation - optuna - interpretability - low-resource - imbalance-handling model_type: bert task: name: SemEval 2025 Task 9: The Food Hazard Detection Challenge - Multilabel Text Classification type: text-classification link: https://food-hazard-detection-semeval-2025.github.io/ dataset: - custom training: input_features: ["title", "text"] label_names: ["product-category", "hazard-category", "product", "hazard"] augmentation: methods: - lexical: [synonym-replacement, random-swap, word-deletion] - embedding: [contextual-substitution, insertion] - llm: [gpt-4-paraphrasing] strategy: "quantile-based underrepresented class boosting (q=0.99)" optimizer: AdamW scheduler: cosine_with_restarts hyperparameter_search: optuna evaluation: metrics: [f1-score] limitations: - Augmentation focused on titles only; text augmentation could further help.