| # Custom BERT Model for Text Classification | |
| ## Model Description | |
| This is a custom BERT model fine-tuned for text classification. The model was trained using a subset of a publicly available dataset and is capable of classifying text into 3 classes. | |
| ## Training Details | |
| - **Architecture**: BERT Base Multilingual Cased | |
| - **Training data**: Custom dataset | |
| - **Preprocessing**: Tokenized using BERT's tokenizer, with a max sequence length of 80. | |
| - **Fine-tuning**: The model was trained for 1 epoch with a learning rate of 2e-5, using AdamW optimizer and Cross-Entropy Loss. | |
| - **Evaluation Metrics**: Accuracy on a held-out validation set. | |
| ## How to Use | |
| ### Dependencies | |
| - Transformers 4.x | |
| - Torch 1.x | |
| ### Code Snippet | |
| For classification: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("billfass/my_bert_model") | |
| model = AutoModelForSequenceClassification.from_pretrained("billfass/my_bert_model") | |
| text = "Your example text here." | |
| inputs = tokenizer(text, padding=True, truncation=True, max_length=80, return_tensors="pt") | |
| labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 | |
| outputs = model(**inputs, labels=labels) | |
| loss = outputs.loss | |
| logits = outputs.logits | |
| # To get probabilities: | |
| probs = torch.softmax(logits, dim=-1) | |
| ``` | |
| ## Limitations and Bias | |
| - Trained on a specific dataset, so may not generalize well to other kinds of text. | |
| - Uses multilingual cased BERT, so it's not optimized for any specific language. | |
| ## Authors | |
| - **Fassinou Bile** | |
| - **billfass2010@gmail.com** | |
| ## Acknowledgments | |
| Special thanks to Hugging Face for providing the Transformers library that made this project possible. | |
| --- | |