Update README.md
Browse files
README.md
CHANGED
|
@@ -13,9 +13,9 @@ license: apache-2.0
|
|
| 13 |
|
| 14 |
This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
Complete experimental setup and results are available in our [technical report]() (Section 4.5).
|
| 19 |
|
| 20 |
## Base model
|
| 21 |
[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
|
|
@@ -57,8 +57,7 @@ All of the fields listed in this example are optional.
|
|
| 57 |
## Training data
|
| 58 |
- Train split: 80,830 samples
|
| 59 |
- Test split: 5,000 samples
|
| 60 |
-
|
| 61 |
-
An additional set of 1,000 samples was set aside for benchmarking purposes.
|
| 62 |
|
| 63 |
## Validation Metrics
|
| 64 |
| Metric | Value |
|
|
@@ -75,7 +74,7 @@ An additional set of 1,000 samples was set aside for benchmarking purposes.
|
|
| 75 |
| recall_weighted | 0.9694 |
|
| 76 |
| accuracy | 0.9694 |
|
| 77 |
|
| 78 |
-
**
|
| 79 |
|
| 80 |
## Cite
|
| 81 |
```
|
|
|
|
| 13 |
|
| 14 |
This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
|
| 15 |
|
| 16 |
+
We used this text classifier to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
|
| 17 |
|
| 18 |
+
Complete experimental setup and results are available in our [technical report](TBD) (Section 4.5).
|
| 19 |
|
| 20 |
## Base model
|
| 21 |
[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
|
|
|
|
| 57 |
## Training data
|
| 58 |
- Train split: 80,830 samples
|
| 59 |
- Test split: 5,000 samples
|
| 60 |
+
- An additional set of 1,000 samples was set aside for benchmarking purposes
|
|
|
|
| 61 |
|
| 62 |
## Validation Metrics
|
| 63 |
| Metric | Value |
|
|
|
|
| 74 |
| recall_weighted | 0.9694 |
|
| 75 |
| accuracy | 0.9694 |
|
| 76 |
|
| 77 |
+
**Post-training benchmark accuracy:** 97.2% (920)
|
| 78 |
|
| 79 |
## Cite
|
| 80 |
```
|