SuccubusBot
/

distilbert-multilingual-incoherence-classifier

Text Classification

text-embeddings-inference

Model card Files Files and versions

CortexPE commited on Feb 27, 2025

Commit

47405d2

·

verified ·

1 Parent(s): cd6a6e6

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+# DistilBERT Incoherence Classifier
+This is a fine-tuned DistilBERT model for classifying text based on its coherence. It can identify various types of incoherence.
+## Model Details
+-   **Model:** DistilBERT (distilbert-base-multilingual-cased)
+-   **Task:** Text Classification (Coherence Detection)
+-   **Fine-tuning:** The model was fine-tuned using a custom-generated dataset that features various types of incoherence.
+- **Training Dataset** The model was trained on the [incoherent-text-dataset](https://huggingface.co/datasets/your_huggingface_username/incoherent-text-dataset) dataset, located on Huggingface.
+## Training Metrics
+| Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1       |
+| :---- | :------------ | :-------------- | :------- | :-------- | :----- | :------- |
+| 1     | 0.037500      | 0.071958        | 0.984995 | 0.985002  | 0.984995 | 0.984564 |
+| 2     | 0.008900      | 0.068670        | 0.985995 | 0.985973  | 0.985995 | 0.985603 |
+| 3     | 0.008500      | 0.058111        | 0.990330 | 0.990260  | 0.990330 | 0.990262 |
+## Evaluation Metrics
+The following metrics were measured on the test set:
+| Metric      | Value    |
+| :---------- | :------- |
+| Loss        | 0.049511 |
+| Accuracy    | 0.991    |
+| Precision   | 0.990958 |
+| Recall      | 0.991    |
+| F1-Score    | 0.990962 |
+## Classification Report:
+```
+                    precision    recall  f1-score   support
+          coherent       0.99      0.99      0.99      1500
+grammatical_errors       0.96      0.94      0.95       250
+      random_bytes       1.00      1.00      1.00       250
+     random_tokens       1.00      1.00      1.00       250
+      random_words       1.00      1.00      1.00       250
+            run_on       1.00      0.99      1.00       250
+         word_soup       1.00      1.00      1.00       250
+          accuracy                           0.99      3000
+         macro avg       0.99      0.99      0.99      3000
+      weighted avg       0.99      0.99      0.99      3000
+```
+## Confusion Matrix
+![Confusion Matrix](confusion_matrix.png)
+The confusion matrix above shows the performance of the model on each class.
+## Usage
+This model can be used for text classification tasks, specifically for detecting and categorizing different types of text incoherence. You can use the `inference_example` function provided in the notebook to test your own text.
+## Limitations
+The model has been trained on a generated dataset, so care must be taken in evaluating it in the real world. More data may need to be collected before evaluating this model in a real-world setting.
+## License
+CC-BY-SA 4.0