BookingCare
/

PhoBERT-Specialty-ClassificationBC

Text Classification

Model card Files Files and versions

PhoBERT-Specialty-ClassificationBC / README.md

thecuong's picture

Update README.md

90f69ac verified 10 months ago

|

history blame contribute delete

3.5 kB

	---
	library_name: transformers
	tags:
	- text-classification
	- multilabel
	- Vietnamese
	- BERT
	language:
	- vi
	metrics:
	- f1
	base_model:
	- vinai/phobert-base
	---

	# PhoBERT-Specialty-ClassificationBC

	## Model Details

	### Model Description

	PhoBERT-Specialty-ClassificationBC là mô hình phân loại đa nhãn dựa trên PhoBERT, được tinh chỉnh để phân loại chuyên khoa y tế từ văn bản tiếng Việt.

	- Developed by: Lê Thế Cường
	- Model type: BERT-based transformer
	- Language(s) (NLP): Tiếng Việt
	- Finetuned from model: `vinai/phobert-base`


	## Uses

	### Direct Use
	Mô hình có thể được sử dụng trực tiếp để phân loại chuyên khoa y tế từ văn bản đầu vào.

	### Downstream Use
	Mô hình có thể được tinh chỉnh thêm hoặc sử dụng trong các hệ thống hỗ trợ phân loại bệnh nhân, chatbot tư vấn y tế.

	### Out-of-Scope Use
	Mô hình không được thiết kế cho các ứng dụng chẩn đoán y tế hoặc thay thế chuyên gia y tế.

	## Bias, Risks, and Limitations

	### Recommendations
	Người dùng cần hiểu rõ giới hạn của mô hình, tránh sử dụng cho các quyết định y tế quan trọng mà không có xác nhận từ chuyên gia.

	## How to Get Started with the Model
	Dưới đây là cách sử dụng mô hình:
	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model = AutoModelForSequenceClassification.from_pretrained("thecuong/PhoBERT-Specialty-ClassificationBC")
	tokenizer = AutoTokenizer.from_pretrained("thecuong/PhoBERT-Specialty-ClassificationBC")
	mlb_path = hf_hub_download(repo_id="thecuong/PhoBERT-Specialty-ClassificationBC", filename="mlb.pkl")

	# Load mlb.pkl
	with open(mlb_path, "rb") as f:
	mlb = pickle.load(f)

	text = "u tuyến tiền liệt"
	tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
	outputs = model(**tokens)

	print(outputs.logits)
	optimal_thresholds = [0.2985745668411255,
	0.2442353218793869,
	0.460119366645813,
	0.25225114822387695,
	0.4248329699039459,
	0.4668178856372833,
	0.2842218279838562,
	0.43041494488716125,
	0.32972779870033264,
	0.36232128739356995,
	0.26634153723716736,
	0.24046610295772552,
	0.34447458386421204,
	0.2602587640285492,
	0.30732235312461853,
	0.34195464849472046,
	0.43998637795448303,
	0.1687939465045929,
	0.39311549067497253]
	binary_preds_thresh = (probs > optimal_thresholds).astype(int)
	predicted_labels = mlb.inverse_transform(binary_preds_thresh)
	```


	```
	{
	"0": "Chưa rõ ràng triệu chứng cần hỏi lại",
	"1": "Cơ Xương Khớp",
	"3": "Tim mạch",
	"4": "Tai Mũi Họng",
	"5": "Nhi khoa",
	"11": "Da liễu",
	"15": "Ung bướu",
	"17": "Nội khoa",
	"18": "Thần kinh",
	"19": "Sản Phụ khoa",
	"21": "Tiểu đường - Nội tiết",
	"22": "Tiêu hóa",
	"24": "Cột sống",
	"26": "Nam học",
	"27": "Sức khỏe tâm thần",
	"28": "Bệnh Viêm gan",
	"29": "Chuyên khoa Mắt",
	"31": "Khám tổng quát",
	"32": "Thận - Tiết niệu",
	"33": "Nha khoa",
	"43": "Hô hấp - Phổi",
	"67": "Vô sinh - Hiếm muộn"
	}
	```


	## Kết quả đánh giá

	![Precision-Recall Curve](https://huggingface.co/thecuong/PhoBERT-Specialty-ClassificationBC/resolve/main/precision_recall_curve_bert.png)