PleIAs
/

Topical

text2text-generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

Pclanglais commited on Jun 5, 2024

Commit

4662ef1

·

verified ·

1 Parent(s): 90cb4a1

Update README.md

Files changed (1) hide show

README.md +2 -34

README.md CHANGED Viewed

@@ -15,28 +15,10 @@ should probably proofread and complete it, then remove this comment. -->
 # Pleias-Topic-Detection
-This model is a fine-tuned version of [t5-small](https://huggingface.co/t5-small) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 2.6792
-- Rouge1: 23.9657
-- Rouge2: 7.6026
-- Rougel: 22.7062
-- Rougelsum: 22.7061
-- Gen Len: 6.0459
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -49,17 +31,3 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 1
 - mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Rouge1  | Rouge2 | Rougel  | Rougelsum | Gen Len |
-|:-------------:|:-----:|:-----:|:---------------:|:-------:|:------:|:-------:|:---------:|:-------:|
-| 2.9647        | 1.0   | 24707 | 2.6792          | 23.9657 | 7.6026 | 22.7062 | 22.7061   | 6.0459  |
-### Framework versions
-- Transformers 4.41.1
-- Pytorch 2.3.0+cu121
-- Datasets 2.19.2
-- Tokenizers 0.19.1

 # Pleias-Topic-Detection
+**Pleias-Topic-Detection** is an encoder-decoder specialized for topic detection. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)
+Pleias-Topic-Detection is a finetuned version of t5-small on a set of 70,000 documents and associated topics from Common Corpus. While t5-small has been reportedly only trained in English, the model actually shows unexpected capacities for multilingual annotation. The final corpus include a significant amount of texts in French, Spanish, Italian, Dutch and German and has been proven to work somewhat in all of theses languages.
 ### Training hyperparameters
 - lr_scheduler_type: linear
 - num_epochs: 1
 - mixed_precision_training: Native AMP