Update model card for v3.8.1
Browse files
README.md
CHANGED
|
@@ -15,21 +15,24 @@ model-index:
|
|
| 15 |
metrics:
|
| 16 |
- name: POS Accuracy
|
| 17 |
type: accuracy
|
| 18 |
-
value: 0.
|
|
|
|
|
|
|
|
|
|
| 19 |
- task:
|
| 20 |
name: Lemmatization
|
| 21 |
type: token-classification
|
| 22 |
metrics:
|
| 23 |
- name: Lemma Accuracy
|
| 24 |
type: accuracy
|
| 25 |
-
value: 0.
|
| 26 |
- task:
|
| 27 |
name: Dependency Parsing
|
| 28 |
type: token-classification
|
| 29 |
metrics:
|
| 30 |
- name: Labeled Attachment Score
|
| 31 |
type: f_score
|
| 32 |
-
value: 0.
|
| 33 |
---
|
| 34 |
|
| 35 |
# grc_dep_web_md
|
|
@@ -43,7 +46,7 @@ Medium model with 50,000-key floret vectors (300 dimensions). Trained on Univers
|
|
| 43 |
| Feature | Description |
|
| 44 |
| --- | --- |
|
| 45 |
| **Name** | `grc_dep_web_md` |
|
| 46 |
-
| **Version** | `3.8.
|
| 47 |
| **spaCy** | `>=3.8.11,<3.9.0` |
|
| 48 |
| **Default Pipeline** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
|
| 49 |
| **Components** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
|
|
@@ -54,7 +57,7 @@ Medium model with 50,000-key floret vectors (300 dimensions). Trained on Univers
|
|
| 54 |
## Install
|
| 55 |
|
| 56 |
```bash
|
| 57 |
-
pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_web_md-3.8.
|
| 58 |
```
|
| 59 |
|
| 60 |
## Usage
|
|
@@ -63,15 +66,10 @@ pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_w
|
|
| 63 |
import spacy
|
| 64 |
|
| 65 |
nlp = spacy.load("grc_dep_web_md")
|
| 66 |
-
doc = nlp("
|
| 67 |
|
| 68 |
for token in doc:
|
| 69 |
print(token.text, token.pos_, token.lemma_, token.dep_)
|
| 70 |
-
# μῆνιν NOUN μῆνις obj
|
| 71 |
-
# ἄειδε VERB ἀείδω ROOT
|
| 72 |
-
# θεὰ NOUN θεά nsubj
|
| 73 |
-
# Πηληϊάδεω NOUN Πηλείδης nmod
|
| 74 |
-
# Ἀχιλῆος NOUN Ἀχιλῆος nmod
|
| 75 |
```
|
| 76 |
|
| 77 |
## Evaluation
|
|
@@ -80,16 +78,14 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
|
|
| 80 |
|
| 81 |
| Metric | Score |
|
| 82 |
| --- | --- |
|
| 83 |
-
| **POS (UPOS) Accuracy** | 91.
|
| 84 |
-
| **TAG (XPOS) Accuracy** |
|
| 85 |
-
| **Morph (UFeats) Accuracy** |
|
| 86 |
-
| **Lemma Accuracy** | 93.
|
| 87 |
-
| **Unlabeled Attachment Score (UAS)** |
|
| 88 |
-
| **Labeled Attachment Score (LAS)** |
|
| 89 |
| **Sentences F-Score** | 88.18 |
|
| 90 |
|
| 91 |
-
*\*TAG (XPOS) reads 0.00 due to pre-harmonization tagger tagset mismatch with UD evaluation gold data. The tagger produces valid fine-grained POS tags but they do not align with the expected XPOS column. This will be fixed in a future release with a harmonized tagset.*
|
| 92 |
-
|
| 93 |
## Training Data
|
| 94 |
|
| 95 |
| Source | Description |
|
|
@@ -101,7 +97,7 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
|
|
| 101 |
## Components
|
| 102 |
|
| 103 |
- **tok2vec** -- Shared token-to-vector encoder (CNN, width 96)
|
| 104 |
-
- **tagger** -- Fine-grained POS tagger (XPOS)
|
| 105 |
- **morphologizer** -- Morphological feature assignment (UPOS + UFeats)
|
| 106 |
- **trainable_lemmatizer** -- Edit-tree lemmatizer
|
| 107 |
- **lookup_lemmatizer** -- 1.2M-entry dictionary lemmatizer overlay (CLTK Morpheus + UD + Wiktionary); normalizes grave accents to acute at query time
|
|
@@ -112,9 +108,9 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
|
|
| 112 |
|
| 113 |
<details>
|
| 114 |
|
| 115 |
-
<summary>View label scheme (
|
| 116 |
|
| 117 |
-
**`tagger`**:
|
| 118 |
|
| 119 |
**`morphologizer`**: 1749 morphological feature combinations
|
| 120 |
|
|
|
|
| 15 |
metrics:
|
| 16 |
- name: POS Accuracy
|
| 17 |
type: accuracy
|
| 18 |
+
value: 0.9175
|
| 19 |
+
- name: TAG (XPOS) Accuracy
|
| 20 |
+
type: accuracy
|
| 21 |
+
value: 0.9154
|
| 22 |
- task:
|
| 23 |
name: Lemmatization
|
| 24 |
type: token-classification
|
| 25 |
metrics:
|
| 26 |
- name: Lemma Accuracy
|
| 27 |
type: accuracy
|
| 28 |
+
value: 0.9359
|
| 29 |
- task:
|
| 30 |
name: Dependency Parsing
|
| 31 |
type: token-classification
|
| 32 |
metrics:
|
| 33 |
- name: Labeled Attachment Score
|
| 34 |
type: f_score
|
| 35 |
+
value: 0.6731
|
| 36 |
---
|
| 37 |
|
| 38 |
# grc_dep_web_md
|
|
|
|
| 46 |
| Feature | Description |
|
| 47 |
| --- | --- |
|
| 48 |
| **Name** | `grc_dep_web_md` |
|
| 49 |
+
| **Version** | `3.8.1` |
|
| 50 |
| **spaCy** | `>=3.8.11,<3.9.0` |
|
| 51 |
| **Default Pipeline** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
|
| 52 |
| **Components** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
|
|
|
|
| 57 |
## Install
|
| 58 |
|
| 59 |
```bash
|
| 60 |
+
pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_web_md-3.8.1-py3-none-any.whl
|
| 61 |
```
|
| 62 |
|
| 63 |
## Usage
|
|
|
|
| 66 |
import spacy
|
| 67 |
|
| 68 |
nlp = spacy.load("grc_dep_web_md")
|
| 69 |
+
doc = nlp("\u03bc\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5 \u03b8\u03b5\u1f70 \u03a0\u03b7\u03bb\u03b7\u03ca\u03ac\u03b4\u03b5\u03c9 \u1f08\u03c7\u03b9\u03bb\u1fc6\u03bf\u03c2")
|
| 70 |
|
| 71 |
for token in doc:
|
| 72 |
print(token.text, token.pos_, token.lemma_, token.dep_)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
```
|
| 74 |
|
| 75 |
## Evaluation
|
|
|
|
| 78 |
|
| 79 |
| Metric | Score |
|
| 80 |
| --- | --- |
|
| 81 |
+
| **POS (UPOS) Accuracy** | 91.75 |
|
| 82 |
+
| **TAG (XPOS) Accuracy** | 91.54 |
|
| 83 |
+
| **Morph (UFeats) Accuracy** | 81.32 |
|
| 84 |
+
| **Lemma Accuracy** | 93.59 |
|
| 85 |
+
| **Unlabeled Attachment Score (UAS)** | 75.71 |
|
| 86 |
+
| **Labeled Attachment Score (LAS)** | 67.31 |
|
| 87 |
| **Sentences F-Score** | 88.18 |
|
| 88 |
|
|
|
|
|
|
|
| 89 |
## Training Data
|
| 90 |
|
| 91 |
| Source | Description |
|
|
|
|
| 97 |
## Components
|
| 98 |
|
| 99 |
- **tok2vec** -- Shared token-to-vector encoder (CNN, width 96)
|
| 100 |
+
- **tagger** -- Fine-grained POS tagger (XPOS, harmonized 16-tag tagset)
|
| 101 |
- **morphologizer** -- Morphological feature assignment (UPOS + UFeats)
|
| 102 |
- **trainable_lemmatizer** -- Edit-tree lemmatizer
|
| 103 |
- **lookup_lemmatizer** -- 1.2M-entry dictionary lemmatizer overlay (CLTK Morpheus + UD + Wiktionary); normalizes grave accents to acute at query time
|
|
|
|
| 108 |
|
| 109 |
<details>
|
| 110 |
|
| 111 |
+
<summary>View label scheme (1796 labels for 3 components)</summary>
|
| 112 |
|
| 113 |
+
**`tagger`**: `adjective`, `adverb`, `conjunction`, `conjunction_adverb`, `conjunction_pronoun`, `determiner`, `interjection`, `noun`, `number`, `particle`, `preposition`, `pronoun`, `proper_noun`, `punc`, `unknown`, `verb`
|
| 114 |
|
| 115 |
**`morphologizer`**: 1749 morphological feature combinations
|
| 116 |
|