Upload folder using huggingface_hub
Browse files
learn_with_history_visualisation.ipynb
CHANGED
|
@@ -367,41 +367,22 @@
|
|
| 367 |
"id": "fe8ce873",
|
| 368 |
"metadata": {},
|
| 369 |
"source": [
|
|
|
|
|
|
|
| 370 |
"\n",
|
| 371 |
-
"
|
| 372 |
"\n",
|
| 373 |
-
"
|
| 374 |
-
"Best Performing Categories: Classes like contract (12 correct), educationdocument (8 correct), taxdocument (8 correct), and invoice (7 correct) show high accuracy with very few misclassifications.\n",
|
| 375 |
"\n",
|
| 376 |
-
"
|
| 377 |
"\n",
|
| 378 |
-
"
|
| 379 |
"\n",
|
| 380 |
-
"
|
| 381 |
"\n",
|
| 382 |
-
"medicaldocument
|
| 383 |
"\n",
|
| 384 |
-
"Data Sparsity:
|
| 385 |
-
]
|
| 386 |
-
},
|
| 387 |
-
{
|
| 388 |
-
"cell_type": "code",
|
| 389 |
-
"execution_count": null,
|
| 390 |
-
"id": "0d159f73",
|
| 391 |
-
"metadata": {},
|
| 392 |
-
"outputs": [],
|
| 393 |
-
"source": [
|
| 394 |
-
"from huggingface_hub import HfApi\n",
|
| 395 |
-
"\n",
|
| 396 |
-
"api = HfApi()\n",
|
| 397 |
-
"\n",
|
| 398 |
-
"# Wysyłanie całego folderu z nową wersją\n",
|
| 399 |
-
"api.upload_folder(\n",
|
| 400 |
-
" folder_path=\"../\",\n",
|
| 401 |
-
" repo_id=\"twoja_nazwa/nazwa-modelu\",\n",
|
| 402 |
-
" repo_type=\"model\", # lub \"dataset\"\n",
|
| 403 |
-
" commit_message=\"Aktualizacja modelu v2\"\n",
|
| 404 |
-
")"
|
| 405 |
]
|
| 406 |
}
|
| 407 |
],
|
|
|
|
| 367 |
"id": "fe8ce873",
|
| 368 |
"metadata": {},
|
| 369 |
"source": [
|
| 370 |
+
"Confusion Matrix Analysis\n",
|
| 371 |
+
"An examination of the confusion matrix demonstrates that the proposed classifier exhibits robust performance, particularly considering the inherent complexity of a 31-class document categorization task. The prominent diagonal line signifies a high degree of correlation between the ground truth and the model's predictions across the majority of categories.\n",
|
| 372 |
"\n",
|
| 373 |
+
"Key Findings:\n",
|
| 374 |
"\n",
|
| 375 |
+
"High-Performing Categories: The model demonstrates superior discriminative capabilities for classes such as contract (12 correct), educationdocument (8), taxdocument (8), and invoice (7). These categories show high classification accuracy with negligible misclassification rates, suggesting the model has successfully captured their distinct structural or linguistic features.\n",
|
|
|
|
| 376 |
"\n",
|
| 377 |
+
"Inter-class Ambiguities:\n",
|
| 378 |
"\n",
|
| 379 |
+
"A minor degree of confusion was observed between courtdocument and contract. This overlap is conceptually justified, as both categories frequently employ specialized legal terminology and formal syntactical structures, leading to high lexical similarity.\n",
|
| 380 |
"\n",
|
| 381 |
+
"The misclassification of an idcard as a cv suggests that the model may be responding to shared attributes, specifically the presence of personal identifiers and profile-oriented information layouts.\n",
|
| 382 |
"\n",
|
| 383 |
+
"The medicaldocument class acts as a slight focal point for errors regarding related sub-categories (e.g., vaccinationcard and referral). This indicates a high degree of semantic overlap in medical terminology, which presents a challenge for fine-grained classification.\n",
|
| 384 |
"\n",
|
| 385 |
+
"Data Sparsity and Generalization: Several categories, including bankstatement and birthcertificate, are represented by a limited number of samples within the validation set. While the model correctly identified these instances, the statistical significance of these results remains constrained. Further validation utilizing a more balanced and extensive dataset is required to confirm the model’s stability and generalizability across these sparsely represented classes."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 386 |
]
|
| 387 |
}
|
| 388 |
],
|