--- language: en license: mit tags: - medical - classification - biobert - pubmedqa - healthcare-rag datasets: - qiaojin/PubMedQA metrics: - f1 pipeline_tag: text-classification --- # BioBERT Medical Query Classifier Fine-tuned `dmis-lab/biobert-v1.1` for classifying medical questions into 6 categories. ## Categories | ID | Category | |----|----------| | 0 | Diagnosis | | 1 | General | | 2 | Medication | | 3 | Prevention | | 4 | Symptoms | | 5 | Treatment | ## Results | Metric | Score | |--------|-------| | Macro F1 | 0.9066 | | Weighted F1 | 0.9094 | | Accuracy | 0.9088 | ## Training Config | Item | Value | |------|-------| | Base model | dmis-lab/biobert-v1.1 | | Dataset | qiaojin/PubMedQA (211,186 rows) | | Split | 80/10/10 | | Epochs | 3 | | Learning rate | 2e-5 | | Batch size | 16 | | Class weights | Balanced (custom WeightedTrainer) | ## Usage from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("AbdoMatrix/biobert-medical-classifier") model = AutoModelForSequenceClassification.from_pretrained("AbdoMatrix/biobert-medical-classifier") text = "What are the symptoms of diabetes?" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) with torch.no_grad(): outputs = model(**inputs) predicted = model.config.id2label[torch.argmax(outputs.logits, dim=1).item()] print(predicted) # → Symptoms ## Project Healthcare RAG-Powered Medical Q&A Assistant eyouth x DEPI | Microsoft Machine Learning Track | 2026 GitHub: https://github.com/AbdooMatrix/Healthcare-RAG-Powered-Medical-QA-Assistant