Zual
/

THIVLVC

@@ -15,11 +15,11 @@ metrics:
 # THIVLVC: Latin ByT5 Lemmatizer
-**THIVLVC** is a state-of-the-art Latin lemmatizer based on the ByT5 (base) architecture. It was developed at **LISN (CNRS)** to provide a high-performance, unified model for diverse Latin corpora.
 ## Performance Analysis
-The following table compares **THIVLVC** against industry standards on the five Universal Dependencies (UD) Latin benchmarks.
 | Benchmark | **THIVLVC** | UDPipe 2.0 | Trankit (XLM-R) | Stanza (v1.5) | GreTa (T5) |
 | :--- | :---: | :---: | :---: | :---: | :---: |
@@ -43,7 +43,7 @@ Basic usage in Python:
 ```python
 from transformers import AutoTokenizer, T5ForConditionalGeneration
-model_name = "Zual/latin-byt5-lemmatizer-sota"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = T5ForConditionalGeneration.from_pretrained(model_name)
@@ -56,4 +56,14 @@ def lemmatize(text):
 print(lemmatize("Amorem canat"))
 ```
 This model was produced by **Luc Pommeret** at LISN (CNRS, Université Paris-Saclay).

 # THIVLVC: Latin ByT5 Lemmatizer
+**THIVLVC** is a state-of-the-art Latin lemmatizer based on the ByT5 (base) architecture. It was developed by **Luc Pommeret** at **LISN (CNRS)** to provide a high-performance, unified model for diverse Latin corpora.
 ## Performance Analysis
+The following table compares **THIVLVC** against major industry standards across the five Universal Dependencies (UD) Latin benchmarks.
 | Benchmark | **THIVLVC** | UDPipe 2.0 | Trankit (XLM-R) | Stanza (v1.5) | GreTa (T5) |
 | :--- | :---: | :---: | :---: | :---: | :---: |
 ```python
 from transformers import AutoTokenizer, T5ForConditionalGeneration
+model_name = "Zual/THIVLVC"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = T5ForConditionalGeneration.from_pretrained(model_name)
 print(lemmatize("Amorem canat"))
 ```
+## Dataset and Training
+- **Model Architecture**: ByT5-base
+- **Author**: Luc Pommeret
+- **Institution**: LISN (CNRS, Université Paris-Saclay)
+- **Training Data**: Unified corpus including Universal Dependencies gold standard, massive silver data from the Latin Library, and targeted distillation from Gemini.
+- **Scope**: Unified lemmatization across multiple historical periods and genres of Latin.
+## Acknowledgments
 This model was produced by **Luc Pommeret** at LISN (CNRS, Université Paris-Saclay).