atsizelti
/

turkish_org_classifier_hand_coded

Text Classification

text-embeddings-inference

Model card Files Files and versions

atsizelti commited on Jun 23, 2024

Commit

23f1df0

·

verified ·

1 Parent(s): e018d63

Update README.md

Files changed (1) hide show

README.md +9 -8

README.md CHANGED Viewed

@@ -2,17 +2,18 @@
 This model is a fine-tuned version of the dbmdz/bert-base-turkish-uncased architecture, specifically designed for the binary classification task of identifying organizational accounts on Turkish Twitter. It leverages the pre-trained BERT model's understanding of Turkish language and context to effectively distinguish between organizational and non-organizational user accounts.
 ### Model Training and Optimization
-## Base Model: dbmdz/bert-base-turkish-uncased
-## Training Data:  The model was trained and validated using a dataset of Twitter accounts (descriptions, names, screen names) with meticulously annotated labels indicating whether each account belongs to an organization or not.
-## Fine-Tuning Process:
-## Data Preprocessing: Combined user descriptions, names, and screen names into a single text field for input.
-## Data Splitting: Split the dataset into 80% for training and 20% for validation.
-## Tokenization: Utilized the AutoTokenizer from Hugging Face to prepare text inputs for the BERT model.
-## Hyperparameter Optimization: Employed Optuna to find the best combination of learning rate, batch size, and training epochs, resulting in optimal performance and minimizing validation loss.
-## Optimal Hyperparameters:
 Learning Rate: 1.23e-5
 Batch Size: 32
 Epochs: 2

 This model is a fine-tuned version of the dbmdz/bert-base-turkish-uncased architecture, specifically designed for the binary classification task of identifying organizational accounts on Turkish Twitter. It leverages the pre-trained BERT model's understanding of Turkish language and context to effectively distinguish between organizational and non-organizational user accounts.
 ### Model Training and Optimization
+Base Model: dbmdz/bert-base-turkish-uncased
+Training Data:  The model was trained and validated using a dataset of Twitter accounts (descriptions, names, screen names) with meticulously annotated labels indicating whether each account belongs to an organization or not.
+### Fine-Tuning Process:
+Data Preprocessing: Combined user descriptions, names, and screen names into a single text field for input.
+Data Splitting: Split the dataset into 80% for training and 20% for validation.
+Tokenization: Utilized the AutoTokenizer from Hugging Face to prepare text inputs for the BERT model.
+Hyperparameter Optimization: Employed Optuna to find the best combination of learning rate, batch size, and training epochs, resulting in optimal performance and minimizing validation loss.
+Optimal Hyperparameters:
 Learning Rate: 1.23e-5
 Batch Size: 32
 Epochs: 2