latincy
/

grc_dep_web_md

@@ -15,21 +15,24 @@ model-index:
     metrics:
     - name: POS Accuracy
       type: accuracy
-      value: 0.9150
   - task:
       name: Lemmatization
       type: token-classification
     metrics:
     - name: Lemma Accuracy
       type: accuracy
-      value: 0.9357
   - task:
       name: Dependency Parsing
       type: token-classification
     metrics:
     - name: Labeled Attachment Score
       type: f_score
-      value: 0.6666
 ---
 # grc_dep_web_md
@@ -43,7 +46,7 @@ Medium model with 50,000-key floret vectors (300 dimensions). Trained on Univers
 | Feature | Description |
 | --- | --- |
 | **Name** | `grc_dep_web_md` |
-| **Version** | `3.8.0` |
 | **spaCy** | `>=3.8.11,<3.9.0` |
 | **Default Pipeline** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
 | **Components** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
@@ -54,7 +57,7 @@ Medium model with 50,000-key floret vectors (300 dimensions). Trained on Univers
 ## Install
 ```bash
-pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_web_md-3.8.0-py3-none-any.whl
 ```
 ## Usage
@@ -63,15 +66,10 @@ pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_w
 import spacy
 nlp = spacy.load("grc_dep_web_md")
-doc = nlp("μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος")
 for token in doc:
     print(token.text, token.pos_, token.lemma_, token.dep_)
-# μῆνιν   NOUN  μῆνις    obj
-# ἄειδε   VERB  ἀείδω    ROOT
-# θεὰ     NOUN  θεά      nsubj
-# Πηληϊάδεω NOUN Πηλείδης nmod
-# Ἀχιλῆος NOUN  Ἀχιλῆος  nmod
 ```
 ## Evaluation
@@ -80,16 +78,14 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
 | Metric | Score |
 | --- | --- |
-| **POS (UPOS) Accuracy** | 91.50 |
-| **TAG (XPOS) Accuracy** | 0.00* |
-| **Morph (UFeats) Accuracy** | 82.46 |
-| **Lemma Accuracy** | 93.57 |
-| **Unlabeled Attachment Score (UAS)** | 74.93 |
-| **Labeled Attachment Score (LAS)** | 66.66 |
 | **Sentences F-Score** | 88.18 |
-*\*TAG (XPOS) reads 0.00 due to pre-harmonization tagger tagset mismatch with UD evaluation gold data. The tagger produces valid fine-grained POS tags but they do not align with the expected XPOS column. This will be fixed in a future release with a harmonized tagset.*
 ## Training Data
 | Source | Description |
@@ -101,7 +97,7 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
 ## Components
 - **tok2vec** -- Shared token-to-vector encoder (CNN, width 96)
-- **tagger** -- Fine-grained POS tagger (XPOS)
 - **morphologizer** -- Morphological feature assignment (UPOS + UFeats)
 - **trainable_lemmatizer** -- Edit-tree lemmatizer
 - **lookup_lemmatizer** -- 1.2M-entry dictionary lemmatizer overlay (CLTK Morpheus + UD + Wiktionary); normalizes grave accents to acute at query time
@@ -112,9 +108,9 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
 <details>
-<summary>View label scheme (2630 labels for 3 components)</summary>
-**`tagger`**: 850 fine-grained POS tags (pre-harmonization; mixed XPOS tagset from PTNK + PROIEL + Perseus)
 **`morphologizer`**: 1749 morphological feature combinations

     metrics:
     - name: POS Accuracy
       type: accuracy
+      value: 0.9175
+    - name: TAG (XPOS) Accuracy
+      type: accuracy
+      value: 0.9154
   - task:
       name: Lemmatization
       type: token-classification
     metrics:
     - name: Lemma Accuracy
       type: accuracy
+      value: 0.9359
   - task:
       name: Dependency Parsing
       type: token-classification
     metrics:
     - name: Labeled Attachment Score
       type: f_score
+      value: 0.6731
 ---
 # grc_dep_web_md
 | Feature | Description |
 | --- | --- |
 | **Name** | `grc_dep_web_md` |
+| **Version** | `3.8.1` |
 | **spaCy** | `>=3.8.11,<3.9.0` |
 | **Default Pipeline** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
 | **Components** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
 ## Install
 ```bash
+pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_web_md-3.8.1-py3-none-any.whl
 ```
 ## Usage
 import spacy
 nlp = spacy.load("grc_dep_web_md")
+doc = nlp("\u03bc\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5 \u03b8\u03b5\u1f70 \u03a0\u03b7\u03bb\u03b7\u03ca\u03ac\u03b4\u03b5\u03c9 \u1f08\u03c7\u03b9\u03bb\u1fc6\u03bf\u03c2")
 for token in doc:
     print(token.text, token.pos_, token.lemma_, token.dep_)
 ```
 ## Evaluation
 | Metric | Score |
 | --- | --- |
+| **POS (UPOS) Accuracy** | 91.75 |
+| **TAG (XPOS) Accuracy** | 91.54 |
+| **Morph (UFeats) Accuracy** | 81.32 |
+| **Lemma Accuracy** | 93.59 |
+| **Unlabeled Attachment Score (UAS)** | 75.71 |
+| **Labeled Attachment Score (LAS)** | 67.31 |
 | **Sentences F-Score** | 88.18 |
 ## Training Data
 | Source | Description |
 ## Components
 - **tok2vec** -- Shared token-to-vector encoder (CNN, width 96)
+- **tagger** -- Fine-grained POS tagger (XPOS, harmonized 16-tag tagset)
 - **morphologizer** -- Morphological feature assignment (UPOS + UFeats)
 - **trainable_lemmatizer** -- Edit-tree lemmatizer
 - **lookup_lemmatizer** -- 1.2M-entry dictionary lemmatizer overlay (CLTK Morpheus + UD + Wiktionary); normalizes grave accents to acute at query time
 <details>
+<summary>View label scheme (1796 labels for 3 components)</summary>
+**`tagger`**: `adjective`, `adverb`, `conjunction`, `conjunction_adverb`, `conjunction_pronoun`, `determiner`, `interjection`, `noun`, `number`, `particle`, `preposition`, `pronoun`, `proper_noun`, `punc`, `unknown`, `verb`
 **`morphologizer`**: 1749 morphological feature combinations