--- license: afl-3.0 datasets: - WillHeld/hinglish_top language: - en - hi metrics: - accuracy library_name: transformers pipeline_tag: fill-mask --- ### SRDberta This is a BERT model trained for Masked Language Modeling for Hinglish Data. Hinglish is a term used to describe the hybrid language spoken in India, which combines elements of Hindi and English. It is commonly used in informal conversations and in media such as Bollywood films ### Dataset Hinglish-Top [Dataset](https://huggingface.co/datasets/WillHeld/hinglish_top) columns - en_query - cs_query - en_parse - cs_parse - domain ### Training |Epoch|Loss| |:--:|:--:| |1 |0.0485| |2 |0.00837| |3 |0.00812| |4 |0.0029| |5 |0.014| |6 |0.00748| |7 |0.0041| |8 |0.00543| |9 |0.00304| |10 |0.000574| ### Inference ```python from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline tokenizer = AutoTokenizer.from_pretrained("SRDdev/SRDBerta") model = AutoModelForMaskedLM.from_pretrained("SRDdev/SRDBerta") fill = pipeline('fill-mask', model='SRDberta', tokenizer='SRDberta') ``` ```python fill_mask = fill.tokenizer.mask_token fill(f'Aap {fill_mask} ho?') ``` ### Citation Author: @[SRDdev](https://huggingface.co/SRDdev) ``` Name : Shreyas Dixit framework : Pytorch Year: Jan 2023 Pipeline : fill-mask Github : https://github.com/SRDdev LinkedIn : https://www.linkedin.com/in/srddev/ ```