institutional
/

institutional-books-topic-classifier-bert

Text Classification

Trained with AutoTrain

Model card Files Files and versions

Metrics Training metrics Community

MatteoCargnelutti commited on May 30, 2025

Commit

ac86603

·

verified ·

1 Parent(s): 4995454

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -13,9 +13,9 @@ license: apache-2.0
 This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
-It is a text classifier, that we used to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
-Complete experimental setup and results are available in our [technical report]() (Section 4.5).
 ## Base model
 [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
@@ -57,8 +57,7 @@ All of the fields listed in this example are optional.
 ## Training data
 - Train split: 80,830 samples
 - Test split: 5,000 samples
-An additional set of 1,000 samples was set aside for benchmarking purposes.
 ## Validation Metrics
 | Metric | Value |
@@ -75,7 +74,7 @@ An additional set of 1,000 samples was set aside for benchmarking purposes.
 | recall_weighted | 0.9694 |
 | accuracy | 0.9694 |
-**Benchmark accuracy:** 97.2% (920)
 ## Cite
 ```

 This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
+We used this text classifier to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
+Complete experimental setup and results are available in our [technical report](TBD) (Section 4.5).
 ## Base model
 [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
 ## Training data
 - Train split: 80,830 samples
 - Test split: 5,000 samples
+- An additional set of 1,000 samples was set aside for benchmarking purposes
 ## Validation Metrics
 | Metric | Value |
 | recall_weighted | 0.9694 |
 | accuracy | 0.9694 |
+**Post-training benchmark accuracy:** 97.2% (920)
 ## Cite
 ```