nlpie
/

tiny-clinicalbert

Model card Files Files and versions

tiny-clinicalbert / README.md

omidrohanian's picture

Update README.md

ec2ca4d over 2 years ago

|

1.65 kB

	---
	title: README
	emoji: 🏃
	colorFrom: gray
	colorTo: purple
	sdk: static
	pinned: false
	license: mit
	---

	# Model Description
	TinyClinicalBERT is a distilled version of the [BioClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) which is distilled for 3 epochs using a total batch size of 192 on the MIMIC-III notes dataset.

	# Distillation Procedure
	This model uses a unique distillation method called ‘transformer-layer distillation’ which is applied on each layer of the student to align the attention maps and the hidden states of the student with those of the teacher.

	# Architecture and Initialisation
	This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the model's small hidden dimension size, it uses random initialisation.

	# Citation

	If you use this model, please consider citing the following paper:

	```bibtex
	@misc{https://doi.org/10.48550/arxiv.2302.04725,
	doi = {10.48550/ARXIV.2302.04725},
	url = {https://arxiv.org/abs/2302.04725},
	author = {Rohanian, Omid and Nouriborji, Mohammadmahdi and Jauncey, Hannah and Kouchaki, Samaneh and Group, ISARIC Clinical Characterisation and Clifton, Lei and Merson, Laura and Clifton, David A.},
	keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7, 68T50},
	title = {Lightweight Transformers for Clinical Natural Language Processing},
	publisher = {arXiv},
	year = {2023},
	copyright = {arXiv.org perpetual, non-exclusive license}
	}
	```