| ### Model Description | |
| This repository hosts three pre-trained models desgined for metadata attribute standardization for genomic regions metadata. The three pre-trained models are: `ENCODE`, `FAIRTRACKS` and `BEDBASE`. These models, along with their associated files and schema designs are used for standardization by `BEDMS` (BED Metadata Standardizer). To know more about BEDMS, you can visit: https://github.com/databio/bedms | |
| ### Directory struture | |
| ``` | |
| /attribute-standardizer-model6 | |
| /bedbase | |
| - bedbase_schema_design.yaml # BEDBASE schema | |
| - label_encoder_bedbase.pkl # Unqiue label values derived from training data, model classifies the output into these labels for BEDBASE schema | |
| - model_bedbase.pth # BEDBASE schema trained model | |
| - vectorizer_bedbase.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model | |
| - config_bedbase.yaml # Config file with model parameters | |
| /encode | |
| - encode_schema_design.yaml #ENCODE schema | |
| - label_encoder_encode.pkl # Unqiue label values derived from training data, model classifies the output into these labels for ENCODE schema | |
| - model_encode.pth # ENCODE schema trained model | |
| - vectorizer_encode.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model | |
| - config_encode.yaml # Config file with model parameters | |
| /fairtracks | |
| - fairtracks_schema_design.yaml # FAIRTRACKS schema | |
| - label_encoder_fairtracks.pkl # Unqiue label values derived from training data, model classifies the output into these labels for FAIRTRACKS schema | |
| - model_fairtracks.pth #FAIRTRACKS schema trained model | |
| - vectorizer_fairtracks.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model | |
| - config_fairtracks.yaml # Config file with model parameters | |
| ``` | |
| ### Usage | |
| To use this model, refer to the GitHub repository of `bedms`: | |
| [BEDMS](https://github.com/databio/bedms) | |
| ### Contribution | |
| To add a schema model: | |
| 1. You should first train the new model using [BEDMS](https://github.com/databio/bedms). | |
| 2. Create a new directory within this repository with the name of the new schema. ( For example, "new_schema"). | |
| 3. Maintain the directory structure like this: | |
| ``` | |
| /attribute-standardizer-model6 | |
| /new_schema | |
| - new_schema_design.yaml | |
| - label_encoder_new_schema.pkl | |
| - model_new_schema.pth | |
| - vectorizer_new_schema.pkl | |
| - config_new_schema.yaml | |
| ``` | |