Zual
/

THIVLVC

@@ -13,15 +13,15 @@ metrics:
 - accuracy
 ---
-# Latin ByT5 Lemmatizer
-This is a state-of-the-art Latin lemmatizer based on the ByT5 (base) architecture. It was developed at LISN (CNRS) to provide a high-performance, unified model for diverse Latin corpora.
 ## Performance Analysis
-The following table compares this model against major industry standards across the five Universal Dependencies (UD) Latin benchmarks.
-| Benchmark | Our ByT5 | UDPipe 2.0 | Trankit (XLM-R) | Stanza (v1.5) | GreTa (T5) |
 | :--- | :---: | :---: | :---: | :---: | :---: |
 | Perseus (Poetry) | **93.48%** | 91.04% | 70.34% | 91.44% | 91.14% |
 | UDante (Medieval) | **85.85%** | 84.80% | - | 78.08% | - |
@@ -29,7 +29,7 @@ The following table compares this model against major industry standards across
 | ITTB (Scholastic) | 98.64% | 99.03% | **99.13%** | 96.50% | - |
 | LLCT (Late Latin) | 88.92% | **97.40%** | 96.2% | 97.10% | - |
-The model achieves state-of-the-art results on three major benchmarks: Perseus, UDante, and PROIEL. It is particularly effective for complex literary and medieval texts.
 ## Usage
@@ -59,9 +59,11 @@ print(lemmatize("Amorem canat"))
 ## Dataset and Training
 - **Model Architecture**: ByT5-base
-- **Training Data**: Unified corpus including Universal Dependencies gold standard, massive silver data from the Latin Library, and targeted distillation.
 - **Scope**: Unified lemmatization across multiple historical periods and genres of Latin.
 ## Acknowledgments
-This model was produced by Zual at LISN (CNRS, Université Paris-Saclay).

 - accuracy
 ---
+# THIVLVC: Latin ByT5 Lemmatizer
+**THIVLVC** is a state-of-the-art Latin lemmatizer based on the ByT5 (base) architecture. It was developed by **Luc Pommeret** at **LISN (CNRS)** to provide a high-performance, unified model for diverse Latin corpora.
 ## Performance Analysis
+The following table compares **THIVLVC** against major industry standards across the five Universal Dependencies (UD) Latin benchmarks.
+| Benchmark | **THIVLVC** | UDPipe 2.0 | Trankit (XLM-R) | Stanza (v1.5) | GreTa (T5) |
 | :--- | :---: | :---: | :---: | :---: | :---: |
 | Perseus (Poetry) | **93.48%** | 91.04% | 70.34% | 91.44% | 91.14% |
 | UDante (Medieval) | **85.85%** | 84.80% | - | 78.08% | - |
 | ITTB (Scholastic) | 98.64% | 99.03% | **99.13%** | 96.50% | - |
 | LLCT (Late Latin) | 88.92% | **97.40%** | 96.2% | 97.10% | - |
+**THIVLVC** achieves state-of-the-art results on three major benchmarks: Perseus (Classical Poetry), UDante (Medieval Prose), and PROIEL (Biblical/Classical). It is particularly effective for complex literary and medieval texts.
 ## Usage
 ## Dataset and Training
 - **Model Architecture**: ByT5-base
+- **Author**: Luc Pommeret
+- **Institution**: LISN (CNRS, Université Paris-Saclay)
+- **Training Data**: Unified corpus including Universal Dependencies gold standard, massive silver data from the Latin Library, and targeted distillation from Gemini.
 - **Scope**: Unified lemmatization across multiple historical periods and genres of Latin.
 ## Acknowledgments
+This model was produced by **Luc Pommeret** at LISN (CNRS, Université Paris-Saclay).