dsrestrepo
/

BERT_Lab_Values

Model card Files Files and versions

BERT_Lab_Values / README.md

dsrestrepo's picture

Update README.md

cf9debb about 2 years ago

|

history blame contribute delete

1.85 kB

	# Model Details

	##### Model Name: NumericBERT

	##### Model Type: Transformer

	##### Architecture: BERT

	##### Training Method: Masked Language Modeling (MLM)

	##### Training Data: MIMIC IV Lab values data

	##### Training Hyperparameters:

	Optimizer: AdamW
	Learning Rate: 5e-5
	Masking Rate: 20%
	Tokenization
	Tokenizer: Custom numeric-to-text mapping using the TextEncoder class

	### Text Encoding Process:

	The process converts non-negative integers into uppercase letter-based representations. This mapping allows numerical values to be expressed as sequences of letters.
	Subsequently, a method is applied to scale numerical values and convert them into corresponding letters based on a predefined mapping.
	Finally, a text encoding is executed to add the corresponding lab ID using the numeric values in specified columns ('Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc').


	### Training Data Preprocessing
	Column Selection: Numerical values from the following lab values represented as: 'Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc'.
	Text Encoding: The numeric values are encoded into text.
	Masking: 20% of the data is randomly masked during training.

	### Model Output
	The model outputs predictions for masked values during training.
	The output contains the encoded text.

	### Limitations and Considerations
	Numeric Data Representation: The model relies on a custom text representation of numeric data, which might have limitations in capturing complex patterns present in the original numeric data.
	Training Data Source: The model is trained on MIMIC IV numeric data, and its performance might be influenced by the characteristics and biases present in that dataset.

	### Contact Information
	For inquiries or additional information, please contact:

	David Restrepo
	davidres@mit.edu
	MIT Critical Data

	---
	license: mit
	---