AbdoMatrix
/

biobert-medical-classifier

Text Classification

Model card Files Files and versions

biobert-medical-classifier / README.md

AbdoMatrix's picture

Upload BioBERT medical classifier

302e287 verified 9 days ago

|

history blame contribute delete

1.71 kB

	---
	language: en
	license: mit
	tags:
	- medical
	- classification
	- biobert
	- pubmedqa
	- healthcare-rag
	datasets:
	- qiaojin/PubMedQA
	metrics:
	- f1
	pipeline_tag: text-classification
	---

	# BioBERT Medical Query Classifier

	Fine-tuned `dmis-lab/biobert-v1.1` for classifying medical questions into 6 categories.

	## Categories
	\| ID \| Category \|
	\|----\|----------\|
	\| 0 \| Diagnosis \|
	\| 1 \| General \|
	\| 2 \| Medication \|
	\| 3 \| Prevention \|
	\| 4 \| Symptoms \|
	\| 5 \| Treatment \|

	## Results
	\| Metric \| Score \|
	\|--------\|-------\|
	\| Macro F1 \| 0.9066 \|
	\| Weighted F1 \| 0.9094 \|
	\| Accuracy \| 0.9088 \|

	## Training Config
	\| Item \| Value \|
	\|------\|-------\|
	\| Base model \| dmis-lab/biobert-v1.1 \|
	\| Dataset \| qiaojin/PubMedQA (211,186 rows) \|
	\| Split \| 80/10/10 \|
	\| Epochs \| 3 \|
	\| Learning rate \| 2e-5 \|
	\| Batch size \| 16 \|
	\| Class weights \| Balanced (custom WeightedTrainer) \|

	## Usage
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("AbdoMatrix/biobert-medical-classifier")
	model = AutoModelForSequenceClassification.from_pretrained("AbdoMatrix/biobert-medical-classifier")

	text = "What are the symptoms of diabetes?"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

	with torch.no_grad():
	outputs = model(**inputs)

	predicted = model.config.id2label[torch.argmax(outputs.logits, dim=1).item()]
	print(predicted) # → Symptoms

	## Project
	Healthcare RAG-Powered Medical Q&A Assistant
	eyouth x DEPI \| Microsoft Machine Learning Track \| 2026
	GitHub: https://github.com/AbdooMatrix/Healthcare-RAG-Powered-Medical-QA-Assistant