diyclassics commited on
Commit
d033fbe
·
verified ·
1 Parent(s): 4df8067

Update model card for v3.8.1

Browse files
Files changed (1) hide show
  1. README.md +18 -22
README.md CHANGED
@@ -15,21 +15,24 @@ model-index:
15
  metrics:
16
  - name: POS Accuracy
17
  type: accuracy
18
- value: 0.9150
 
 
 
19
  - task:
20
  name: Lemmatization
21
  type: token-classification
22
  metrics:
23
  - name: Lemma Accuracy
24
  type: accuracy
25
- value: 0.9357
26
  - task:
27
  name: Dependency Parsing
28
  type: token-classification
29
  metrics:
30
  - name: Labeled Attachment Score
31
  type: f_score
32
- value: 0.6666
33
  ---
34
 
35
  # grc_dep_web_md
@@ -43,7 +46,7 @@ Medium model with 50,000-key floret vectors (300 dimensions). Trained on Univers
43
  | Feature | Description |
44
  | --- | --- |
45
  | **Name** | `grc_dep_web_md` |
46
- | **Version** | `3.8.0` |
47
  | **spaCy** | `>=3.8.11,<3.9.0` |
48
  | **Default Pipeline** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
49
  | **Components** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
@@ -54,7 +57,7 @@ Medium model with 50,000-key floret vectors (300 dimensions). Trained on Univers
54
  ## Install
55
 
56
  ```bash
57
- pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_web_md-3.8.0-py3-none-any.whl
58
  ```
59
 
60
  ## Usage
@@ -63,15 +66,10 @@ pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_w
63
  import spacy
64
 
65
  nlp = spacy.load("grc_dep_web_md")
66
- doc = nlp("μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος")
67
 
68
  for token in doc:
69
  print(token.text, token.pos_, token.lemma_, token.dep_)
70
- # μῆνιν NOUN μῆνις obj
71
- # ἄειδε VERB ἀείδω ROOT
72
- # θεὰ NOUN θεά nsubj
73
- # Πηληϊάδεω NOUN Πηλείδης nmod
74
- # Ἀχιλῆος NOUN Ἀχιλῆος nmod
75
  ```
76
 
77
  ## Evaluation
@@ -80,16 +78,14 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
80
 
81
  | Metric | Score |
82
  | --- | --- |
83
- | **POS (UPOS) Accuracy** | 91.50 |
84
- | **TAG (XPOS) Accuracy** | 0.00* |
85
- | **Morph (UFeats) Accuracy** | 82.46 |
86
- | **Lemma Accuracy** | 93.57 |
87
- | **Unlabeled Attachment Score (UAS)** | 74.93 |
88
- | **Labeled Attachment Score (LAS)** | 66.66 |
89
  | **Sentences F-Score** | 88.18 |
90
 
91
- *\*TAG (XPOS) reads 0.00 due to pre-harmonization tagger tagset mismatch with UD evaluation gold data. The tagger produces valid fine-grained POS tags but they do not align with the expected XPOS column. This will be fixed in a future release with a harmonized tagset.*
92
-
93
  ## Training Data
94
 
95
  | Source | Description |
@@ -101,7 +97,7 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
101
  ## Components
102
 
103
  - **tok2vec** -- Shared token-to-vector encoder (CNN, width 96)
104
- - **tagger** -- Fine-grained POS tagger (XPOS)
105
  - **morphologizer** -- Morphological feature assignment (UPOS + UFeats)
106
  - **trainable_lemmatizer** -- Edit-tree lemmatizer
107
  - **lookup_lemmatizer** -- 1.2M-entry dictionary lemmatizer overlay (CLTK Morpheus + UD + Wiktionary); normalizes grave accents to acute at query time
@@ -112,9 +108,9 @@ Scores on held-out UD test data (combined PTNK + PROIEL + Perseus).
112
 
113
  <details>
114
 
115
- <summary>View label scheme (2630 labels for 3 components)</summary>
116
 
117
- **`tagger`**: 850 fine-grained POS tags (pre-harmonization; mixed XPOS tagset from PTNK + PROIEL + Perseus)
118
 
119
  **`morphologizer`**: 1749 morphological feature combinations
120
 
 
15
  metrics:
16
  - name: POS Accuracy
17
  type: accuracy
18
+ value: 0.9175
19
+ - name: TAG (XPOS) Accuracy
20
+ type: accuracy
21
+ value: 0.9154
22
  - task:
23
  name: Lemmatization
24
  type: token-classification
25
  metrics:
26
  - name: Lemma Accuracy
27
  type: accuracy
28
+ value: 0.9359
29
  - task:
30
  name: Dependency Parsing
31
  type: token-classification
32
  metrics:
33
  - name: Labeled Attachment Score
34
  type: f_score
35
+ value: 0.6731
36
  ---
37
 
38
  # grc_dep_web_md
 
46
  | Feature | Description |
47
  | --- | --- |
48
  | **Name** | `grc_dep_web_md` |
49
+ | **Version** | `3.8.1` |
50
  | **spaCy** | `>=3.8.11,<3.9.0` |
51
  | **Default Pipeline** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
52
  | **Components** | `senter`, `tok2vec`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `parser` |
 
57
  ## Install
58
 
59
  ```bash
60
+ pip install https://huggingface.co/latincy/grc_dep_web_md/resolve/main/grc_dep_web_md-3.8.1-py3-none-any.whl
61
  ```
62
 
63
  ## Usage
 
66
  import spacy
67
 
68
  nlp = spacy.load("grc_dep_web_md")
69
+ doc = nlp("\u03bc\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5 \u03b8\u03b5\u1f70 \u03a0\u03b7\u03bb\u03b7\u03ca\u03ac\u03b4\u03b5\u03c9 \u1f08\u03c7\u03b9\u03bb\u1fc6\u03bf\u03c2")
70
 
71
  for token in doc:
72
  print(token.text, token.pos_, token.lemma_, token.dep_)
 
 
 
 
 
73
  ```
74
 
75
  ## Evaluation
 
78
 
79
  | Metric | Score |
80
  | --- | --- |
81
+ | **POS (UPOS) Accuracy** | 91.75 |
82
+ | **TAG (XPOS) Accuracy** | 91.54 |
83
+ | **Morph (UFeats) Accuracy** | 81.32 |
84
+ | **Lemma Accuracy** | 93.59 |
85
+ | **Unlabeled Attachment Score (UAS)** | 75.71 |
86
+ | **Labeled Attachment Score (LAS)** | 67.31 |
87
  | **Sentences F-Score** | 88.18 |
88
 
 
 
89
  ## Training Data
90
 
91
  | Source | Description |
 
97
  ## Components
98
 
99
  - **tok2vec** -- Shared token-to-vector encoder (CNN, width 96)
100
+ - **tagger** -- Fine-grained POS tagger (XPOS, harmonized 16-tag tagset)
101
  - **morphologizer** -- Morphological feature assignment (UPOS + UFeats)
102
  - **trainable_lemmatizer** -- Edit-tree lemmatizer
103
  - **lookup_lemmatizer** -- 1.2M-entry dictionary lemmatizer overlay (CLTK Morpheus + UD + Wiktionary); normalizes grave accents to acute at query time
 
108
 
109
  <details>
110
 
111
+ <summary>View label scheme (1796 labels for 3 components)</summary>
112
 
113
+ **`tagger`**: `adjective`, `adverb`, `conjunction`, `conjunction_adverb`, `conjunction_pronoun`, `determiner`, `interjection`, `noun`, `number`, `particle`, `preposition`, `pronoun`, `proper_noun`, `punc`, `unknown`, `verb`
114
 
115
  **`morphologizer`**: 1749 morphological feature combinations
116