Spaces:
Sleeping
Sleeping
Commit ·
1f74af3
1
Parent(s): 572de91
Add support for global rules
Browse files- .claude/skills/optimize-element-descriptions/SKILL.md +2 -2
- README.md +19 -13
- docs/tei-element-descriptions.md +54 -10
- scripts/evaluate_llm.py +21 -37
- tei_annotator/models/schema.py +1 -0
- tei_annotator/prompting/templates/json_enforced.jinja2 +7 -0
- tei_annotator/prompting/templates/text_gen.jinja2 +8 -0
.claude/skills/optimize-element-descriptions/SKILL.md
CHANGED
|
@@ -2,8 +2,7 @@
|
|
| 2 |
name: optimize-element-descriptions
|
| 3 |
description: Iteratively improve TEIElement descriptions in _build_schema() to maximise F1 against the gold standard. Use when annotation quality is low or when evaluation shows missed or spurious spans.
|
| 4 |
disable-model-invocation: true
|
| 5 |
-
argument-hint:
|
| 6 |
-
allowed-tools: Read, Edit, Bash
|
| 7 |
---
|
| 8 |
|
| 9 |
# optimize-element-descriptions
|
|
@@ -59,6 +58,7 @@ Key principles (summary):
|
|
| 59 |
- Add negative constraints: "never tag X as Y"
|
| 60 |
- Include textual triggers (keywords, position) and inline surface-form examples
|
| 61 |
- Prefix critical constraints with `CRITICAL:`
|
|
|
|
| 62 |
|
| 63 |
Only edit descriptions for elements where you identified a clear failure pattern.
|
| 64 |
|
|
|
|
| 2 |
name: optimize-element-descriptions
|
| 3 |
description: Iteratively improve TEIElement descriptions in _build_schema() to maximise F1 against the gold standard. Use when annotation quality is low or when evaluation shows missed or spurious spans.
|
| 4 |
disable-model-invocation: true
|
| 5 |
+
argument-hint: "--max-items N --provider gemini|kisski|all"
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
# optimize-element-descriptions
|
|
|
|
| 58 |
- Add negative constraints: "never tag X as Y"
|
| 59 |
- Include textual triggers (keywords, position) and inline surface-form examples
|
| 60 |
- Prefix critical constraints with `CRITICAL:`
|
| 61 |
+
- If a failure pattern affects **multiple element types**, add the constraint to `TEISchema.rules` instead of duplicating it in each element description — the prompt renders `rules` as a numbered "General Rules" section before all element descriptions.
|
| 62 |
|
| 63 |
Only edit descriptions for elements where you identified a clear failure pattern.
|
| 64 |
|
README.md
CHANGED
|
@@ -100,7 +100,7 @@ API keys for real LLM endpoints go in `.env` (see `.env` for the expected variab
|
|
| 100 |
|
| 101 |
## Quick example
|
| 102 |
|
| 103 |
-
Element descriptions are the primary signal the LLM uses to decide what to annotate and how. See [docs/tei-element-descriptions.md](docs/tei-element-descriptions.md) for
|
| 104 |
|
| 105 |
```python
|
| 106 |
from tei_annotator import (
|
|
@@ -110,18 +110,24 @@ from tei_annotator import (
|
|
| 110 |
)
|
| 111 |
|
| 112 |
# 1. Describe the elements you want to annotate
|
| 113 |
-
schema = TEISchema(
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
# 2. Wrap your inference endpoint
|
| 127 |
def my_call_fn(prompt: str) -> str:
|
|
|
|
| 100 |
|
| 101 |
## Quick example
|
| 102 |
|
| 103 |
+
Element descriptions are the primary signal the LLM uses to decide what to annotate and how. Cross-element constraints that apply to multiple span types (e.g. "always emit a `surname` span inside an enclosing `author` span") can be placed in `TEISchema.rules` instead of duplicating them in every element description — the prompt builder renders them as a numbered "General Rules" section before the per-element descriptions. See [docs/tei-element-descriptions.md](docs/tei-element-descriptions.md) for full guidelines.
|
| 104 |
|
| 105 |
```python
|
| 106 |
from tei_annotator import (
|
|
|
|
| 110 |
)
|
| 111 |
|
| 112 |
# 1. Describe the elements you want to annotate
|
| 113 |
+
schema = TEISchema(
|
| 114 |
+
rules=[
|
| 115 |
+
# Cross-element constraints stated once, rendered before element descriptions
|
| 116 |
+
"Emit a 'surname' span within every enclosing 'persName' span.",
|
| 117 |
+
],
|
| 118 |
+
elements=[
|
| 119 |
+
TEIElement(
|
| 120 |
+
tag="persName",
|
| 121 |
+
description="a person's name",
|
| 122 |
+
attributes=[TEIAttribute(name="ref", description="authority URI")],
|
| 123 |
+
),
|
| 124 |
+
TEIElement(
|
| 125 |
+
tag="placeName",
|
| 126 |
+
description="a geographical place name",
|
| 127 |
+
attributes=[],
|
| 128 |
+
),
|
| 129 |
+
],
|
| 130 |
+
)
|
| 131 |
|
| 132 |
# 2. Wrap your inference endpoint
|
| 133 |
def my_call_fn(prompt: str) -> str:
|
docs/tei-element-descriptions.md
CHANGED
|
@@ -14,10 +14,10 @@ The LLM is asked to **emit spans** — tuples of *(element name, verbatim text,
|
|
| 14 |
surrounding context)*. It never writes raw XML. Descriptions therefore should
|
| 15 |
be phrased in terms of *emitting a span*, not *wrapping text in a tag*.
|
| 16 |
|
| 17 |
-
| Avoid
|
| 18 |
-
|-------|--------|
|
| 19 |
-
| "Wrap the author name in `<author>`."
|
| 20 |
-
| "Nest `<surname>` inside `<author>`."
|
| 21 |
|
| 22 |
---
|
| 23 |
|
|
@@ -80,10 +80,10 @@ Examples of effective negative constraints:
|
|
| 80 |
|
| 81 |
> "A person's name (or surname alone) that follows 'in' is an editor — emit an
|
| 82 |
> `editor` span, **never** a `title` span."
|
| 83 |
-
|
| 84 |
> "An institutional report name (e.g. 'Amok Internal Report') must be tagged as
|
| 85 |
> `note` with type='report', **NOT** as `orgName` or `title`."
|
| 86 |
-
|
| 87 |
> "A label is always a number or short code — **never** a word or name. An
|
| 88 |
> ALL-CAPS word at the start of an entry is an author surname, not a label."
|
| 89 |
|
|
@@ -101,10 +101,10 @@ span represents semantically.
|
|
| 101 |
|
| 102 |
> "An editor's name typically follows keywords such as 'in', 'ed.', 'éd.',
|
| 103 |
> 'Hrsg.', 'dir.', '(ed.)', '(eds.)'."
|
| 104 |
-
|
| 105 |
> "A label appears at the very start of a bibliographic entry, before any author
|
| 106 |
> or title."
|
| 107 |
-
|
| 108 |
> "The place of publication may appear in parentheses immediately after the
|
| 109 |
> title, e.g. 'Title (City, Region)' — the parenthesised location is the
|
| 110 |
> pubPlace."
|
|
@@ -119,7 +119,7 @@ text looks like:
|
|
| 119 |
> "Typical label forms: a plain number ('17'), a number with a trailing period
|
| 120 |
> ('17.'), a number in square brackets ('[77]', '[ACL30]'), or a compound number
|
| 121 |
> ('5,6')."
|
| 122 |
-
|
| 123 |
> "Institutional report designations — such as 'Amok Internal Report', 'USGS
|
| 124 |
> Open-File Report 97-123', or 'Technical Report No. 5' — must be tagged as
|
| 125 |
> `note`."
|
|
@@ -133,11 +133,54 @@ when surrounding punctuation could reasonably be included:
|
|
| 133 |
|
| 134 |
> "The separator that follows the label (period, dash, or space) is NOT part of
|
| 135 |
> the label."
|
| 136 |
-
|
| 137 |
> "Do not include the surrounding parentheses in the pubPlace span."
|
| 138 |
|
| 139 |
---
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
## Quick checklist
|
| 142 |
|
| 143 |
Before finalising a description, ask:
|
|
@@ -149,3 +192,4 @@ Before finalising a description, ask:
|
|
| 149 |
- [ ] Are there positional or keyword triggers that help the model find the span?
|
| 150 |
- [ ] Are edge-case surface forms illustrated with a quoted example?
|
| 151 |
- [ ] Are span boundaries (what's in / what's out) unambiguous?
|
|
|
|
|
|
| 14 |
surrounding context)*. It never writes raw XML. Descriptions therefore should
|
| 15 |
be phrased in terms of *emitting a span*, not *wrapping text in a tag*.
|
| 16 |
|
| 17 |
+
| Avoid | Prefer |
|
| 18 |
+
| --------------------------------------- | ------------------------------------------------------------------------- |
|
| 19 |
+
| "Wrap the author name in `<author>`." | "Emit an `author` span covering the full name text." |
|
| 20 |
+
| "Nest `<surname>` inside `<author>`." | "The `surname` span must fall within the enclosing `author` span's text." |
|
| 21 |
|
| 22 |
---
|
| 23 |
|
|
|
|
| 80 |
|
| 81 |
> "A person's name (or surname alone) that follows 'in' is an editor — emit an
|
| 82 |
> `editor` span, **never** a `title` span."
|
| 83 |
+
>
|
| 84 |
> "An institutional report name (e.g. 'Amok Internal Report') must be tagged as
|
| 85 |
> `note` with type='report', **NOT** as `orgName` or `title`."
|
| 86 |
+
>
|
| 87 |
> "A label is always a number or short code — **never** a word or name. An
|
| 88 |
> ALL-CAPS word at the start of an entry is an author surname, not a label."
|
| 89 |
|
|
|
|
| 101 |
|
| 102 |
> "An editor's name typically follows keywords such as 'in', 'ed.', 'éd.',
|
| 103 |
> 'Hrsg.', 'dir.', '(ed.)', '(eds.)'."
|
| 104 |
+
>
|
| 105 |
> "A label appears at the very start of a bibliographic entry, before any author
|
| 106 |
> or title."
|
| 107 |
+
>
|
| 108 |
> "The place of publication may appear in parentheses immediately after the
|
| 109 |
> title, e.g. 'Title (City, Region)' — the parenthesised location is the
|
| 110 |
> pubPlace."
|
|
|
|
| 119 |
> "Typical label forms: a plain number ('17'), a number with a trailing period
|
| 120 |
> ('17.'), a number in square brackets ('[77]', '[ACL30]'), or a compound number
|
| 121 |
> ('5,6')."
|
| 122 |
+
>
|
| 123 |
> "Institutional report designations — such as 'Amok Internal Report', 'USGS
|
| 124 |
> Open-File Report 97-123', or 'Technical Report No. 5' — must be tagged as
|
| 125 |
> `note`."
|
|
|
|
| 133 |
|
| 134 |
> "The separator that follows the label (period, dash, or space) is NOT part of
|
| 135 |
> the label."
|
| 136 |
+
>
|
| 137 |
> "Do not include the surrounding parentheses in the pubPlace span."
|
| 138 |
|
| 139 |
---
|
| 140 |
|
| 141 |
+
### 8. Use `TEISchema.rules` for cross-element constraints
|
| 142 |
+
|
| 143 |
+
When the same constraint applies to **multiple element types**, put it in
|
| 144 |
+
`TEISchema.rules` rather than copying it into every element description.
|
| 145 |
+
The prompt builder renders `rules` as a numbered **"General Rules"** section
|
| 146 |
+
that appears before all per-element descriptions.
|
| 147 |
+
|
| 148 |
+
Good candidates for `rules`:
|
| 149 |
+
|
| 150 |
+
- Parent–child pairing constraints shared by several elements (e.g. "`surname`
|
| 151 |
+
and `forename` must always appear inside an enclosing `author` or `editor`
|
| 152 |
+
span")
|
| 153 |
+
- Constraints that span the same surface form from both sides (e.g. the rule
|
| 154 |
+
that `orgName` requires a sibling `author`/`editor` span, stated for both
|
| 155 |
+
`author` and `orgName`)
|
| 156 |
+
- Bibliographic conventions that apply across multiple roles (e.g. "a dash or
|
| 157 |
+
underscore may stand for a repeated author **or editor** name")
|
| 158 |
+
|
| 159 |
+
Keep the individual element `description` focused on element-specific cues
|
| 160 |
+
(triggers, surface forms, boundaries, negative constraints) and let `rules`
|
| 161 |
+
carry the shared structural invariants.
|
| 162 |
+
|
| 163 |
+
**Example** — in `_build_schema()`:
|
| 164 |
+
|
| 165 |
+
```python
|
| 166 |
+
TEISchema(
|
| 167 |
+
rules=[
|
| 168 |
+
"For each person's name, emit an 'author' or 'editor' span covering "
|
| 169 |
+
"the full name AND separate 'surname', 'forename', or 'orgName' spans "
|
| 170 |
+
"for the individual name parts within that span.",
|
| 171 |
+
"Never emit 'surname', 'forename', or 'orgName' without a corresponding "
|
| 172 |
+
"enclosing 'author' or 'editor' span.",
|
| 173 |
+
],
|
| 174 |
+
elements=[
|
| 175 |
+
TEIElement(tag="author", description="Names appearing at the start …"),
|
| 176 |
+
TEIElement(tag="surname", description="The inherited (family) name …"),
|
| 177 |
+
# 'surname' description no longer repeats the parent-span constraint
|
| 178 |
+
],
|
| 179 |
+
)
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
+
---
|
| 183 |
+
|
| 184 |
## Quick checklist
|
| 185 |
|
| 186 |
Before finalising a description, ask:
|
|
|
|
| 192 |
- [ ] Are there positional or keyword triggers that help the model find the span?
|
| 193 |
- [ ] Are edge-case surface forms illustrated with a quoted example?
|
| 194 |
- [ ] Are span boundaries (what's in / what's out) unambiguous?
|
| 195 |
+
- [ ] Are cross-element constraints factored into `TEISchema.rules` rather than duplicated across descriptions?
|
scripts/evaluate_llm.py
CHANGED
|
@@ -137,6 +137,19 @@ def _build_schema():
|
|
| 137 |
return TEIAttribute(name=name, description=desc, allowed_values=allowed)
|
| 138 |
|
| 139 |
return TEISchema(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
elements=[
|
| 141 |
TEIElement(
|
| 142 |
tag="label",
|
|
@@ -156,16 +169,8 @@ def _build_schema():
|
|
| 156 |
tag="author",
|
| 157 |
description=(
|
| 158 |
"Name(s) of the author(s) of the cited work. "
|
| 159 |
-
"Emit a separate 'author' span for each distinct author — never merge multiple "
|
| 160 |
-
"authors into a single span. "
|
| 161 |
-
"Each 'author' span covers the full name text of one author. "
|
| 162 |
-
"Also emit separate 'surname', 'forename', or 'orgName' spans for the "
|
| 163 |
-
"individual name parts; those spans must fall within the 'author' span's text. "
|
| 164 |
-
"When an organisation is the author, emit both an 'author' span and an "
|
| 165 |
-
"'orgName' span covering the same text — never emit 'orgName' alone in that role. "
|
| 166 |
"Names appearing at the start of a bibliographic entry before the title and "
|
| 167 |
-
"date are authors.
|
| 168 |
-
"In a bibliography, a dash or underscore may stand for a repeated author name."
|
| 169 |
),
|
| 170 |
allowed_children=['surname', 'forename', 'orgName'],
|
| 171 |
attributes=[],
|
|
@@ -174,53 +179,32 @@ def _build_schema():
|
|
| 174 |
tag="editor",
|
| 175 |
description=(
|
| 176 |
"Name of an editor of the cited work. "
|
| 177 |
-
"Emit an 'editor' span covering the full name text; also emit separate "
|
| 178 |
-
"'surname', 'forename', or 'orgName' spans for the individual name parts — "
|
| 179 |
-
"those spans must fall within the 'editor' span's text. "
|
| 180 |
"An editor's name typically follows keywords such as 'in', 'ed.', 'éd.', "
|
| 181 |
"'Hrsg.', 'dir.', '(ed.)', '(eds.)'. "
|
| 182 |
"CRITICAL: A person's name (or surname alone) that follows 'in' is an editor — "
|
| 183 |
-
"emit an 'editor' span (plus name-part spans), never a 'title' span.
|
| 184 |
-
"In a bibliography, a dash or underscore may stand for a repeated editor name."
|
| 185 |
),
|
| 186 |
allowed_children=['surname', 'forename', 'orgName'],
|
| 187 |
attributes=[],
|
| 188 |
),
|
| 189 |
TEIElement(
|
| 190 |
tag="surname",
|
| 191 |
-
description=(
|
| 192 |
-
"The inherited (family) name of a person. "
|
| 193 |
-
"Always emit together with an enclosing 'author' or 'editor' span covering "
|
| 194 |
-
"the full name — never emit a 'surname' span without a corresponding "
|
| 195 |
-
"'author' or 'editor' span."
|
| 196 |
-
),
|
| 197 |
allowed_children=[],
|
| 198 |
attributes=[],
|
| 199 |
),
|
| 200 |
TEIElement(
|
| 201 |
tag="forename",
|
| 202 |
-
description=(
|
| 203 |
-
"The given (first) name or initials of a person. "
|
| 204 |
-
"Always emit together with an enclosing 'author' or 'editor' span covering "
|
| 205 |
-
"the full name — never emit a 'forename' span without a corresponding "
|
| 206 |
-
"'author' or 'editor' span."
|
| 207 |
-
),
|
| 208 |
allowed_children=[],
|
| 209 |
attributes=[],
|
| 210 |
),
|
| 211 |
TEIElement(
|
| 212 |
tag="orgName",
|
| 213 |
-
description=
|
| 214 |
-
"Name of an organisation. "
|
| 215 |
-
"When the organisation is an author or editor of the cited work, you MUST emit "
|
| 216 |
-
"both the 'orgName' span and an enclosing 'author' (or 'editor') span covering "
|
| 217 |
-
"the same text. For example, if 'Acme Research Group' is an author, emit an "
|
| 218 |
-
"'author' span AND an 'orgName' span both covering 'Acme Research Group'. "
|
| 219 |
-
"Never emit 'orgName' alone when the organisation acts as author or editor."
|
| 220 |
-
),
|
| 221 |
allowed_children=[],
|
| 222 |
attributes=[],
|
| 223 |
-
),
|
| 224 |
TEIElement(
|
| 225 |
tag="title",
|
| 226 |
description="Title of the cited work.",
|
|
@@ -261,8 +245,8 @@ def _build_schema():
|
|
| 261 |
tag="biblScope",
|
| 262 |
description=(
|
| 263 |
"Scope reference within the cited item (page range, volume, issue). "
|
| 264 |
-
"Emit a separate biblScope span for volume and issue.
|
| 265 |
-
|
| 266 |
allowed_children=[],
|
| 267 |
attributes=[
|
| 268 |
attr(
|
|
|
|
| 137 |
return TEIAttribute(name=name, description=desc, allowed_values=allowed)
|
| 138 |
|
| 139 |
return TEISchema(
|
| 140 |
+
rules=[
|
| 141 |
+
"For each person's name, emit an 'author' or 'editor' span covering the full name "
|
| 142 |
+
"AND separate 'surname', 'forename', or 'orgName' spans for the individual name "
|
| 143 |
+
"parts within that span.",
|
| 144 |
+
"Never emit 'surname', 'forename', or 'orgName' without a corresponding enclosing "
|
| 145 |
+
"'author' or 'editor' span.",
|
| 146 |
+
"When an organisation acts as author or editor, emit BOTH an 'orgName' span AND an "
|
| 147 |
+
"enclosing 'author' (or 'editor') span covering the same text.",
|
| 148 |
+
"Emit a separate 'author' span for each distinct author — never merge multiple "
|
| 149 |
+
"authors into a single span.",
|
| 150 |
+
"In a bibliography, a dash or underscore may stand for a repeated author or editor "
|
| 151 |
+
"name — tag it as 'author' or 'editor' accordingly.",
|
| 152 |
+
],
|
| 153 |
elements=[
|
| 154 |
TEIElement(
|
| 155 |
tag="label",
|
|
|
|
| 169 |
tag="author",
|
| 170 |
description=(
|
| 171 |
"Name(s) of the author(s) of the cited work. "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
"Names appearing at the start of a bibliographic entry before the title and "
|
| 173 |
+
"date are authors."
|
|
|
|
| 174 |
),
|
| 175 |
allowed_children=['surname', 'forename', 'orgName'],
|
| 176 |
attributes=[],
|
|
|
|
| 179 |
tag="editor",
|
| 180 |
description=(
|
| 181 |
"Name of an editor of the cited work. "
|
|
|
|
|
|
|
|
|
|
| 182 |
"An editor's name typically follows keywords such as 'in', 'ed.', 'éd.', "
|
| 183 |
"'Hrsg.', 'dir.', '(ed.)', '(eds.)'. "
|
| 184 |
"CRITICAL: A person's name (or surname alone) that follows 'in' is an editor — "
|
| 185 |
+
"emit an 'editor' span (plus name-part spans), never a 'title' span."
|
|
|
|
| 186 |
),
|
| 187 |
allowed_children=['surname', 'forename', 'orgName'],
|
| 188 |
attributes=[],
|
| 189 |
),
|
| 190 |
TEIElement(
|
| 191 |
tag="surname",
|
| 192 |
+
description="The inherited (family) name of a person.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
allowed_children=[],
|
| 194 |
attributes=[],
|
| 195 |
),
|
| 196 |
TEIElement(
|
| 197 |
tag="forename",
|
| 198 |
+
description="The given (first) name or initials of a person.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
allowed_children=[],
|
| 200 |
attributes=[],
|
| 201 |
),
|
| 202 |
TEIElement(
|
| 203 |
tag="orgName",
|
| 204 |
+
description="Name of an organisation.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
allowed_children=[],
|
| 206 |
attributes=[],
|
| 207 |
+
),
|
| 208 |
TEIElement(
|
| 209 |
tag="title",
|
| 210 |
description="Title of the cited work.",
|
|
|
|
| 245 |
tag="biblScope",
|
| 246 |
description=(
|
| 247 |
"Scope reference within the cited item (page range, volume, issue). "
|
| 248 |
+
"Emit a separate biblScope span for volume and issue."
|
| 249 |
+
),
|
| 250 |
allowed_children=[],
|
| 251 |
attributes=[
|
| 252 |
attr(
|
tei_annotator/models/schema.py
CHANGED
|
@@ -22,6 +22,7 @@ class TEIElement:
|
|
| 22 |
@dataclass
|
| 23 |
class TEISchema:
|
| 24 |
elements: list[TEIElement] = field(default_factory=list)
|
|
|
|
| 25 |
|
| 26 |
def get(self, tag: str) -> TEIElement | None:
|
| 27 |
for elem in self.elements:
|
|
|
|
| 22 |
@dataclass
|
| 23 |
class TEISchema:
|
| 24 |
elements: list[TEIElement] = field(default_factory=list)
|
| 25 |
+
rules: list[str] = field(default_factory=list)
|
| 26 |
|
| 27 |
def get(self, tag: str) -> TEIElement | None:
|
| 28 |
for elem in self.elements:
|
tei_annotator/prompting/templates/json_enforced.jinja2
CHANGED
|
@@ -1,6 +1,13 @@
|
|
| 1 |
You are a TEI XML annotation assistant.
|
| 2 |
|
| 3 |
## TEI Schema
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
{% for elem in schema.elements %}
|
| 5 |
- `{{ elem.tag }}`: {{ elem.description }}{% if elem.attributes %} (attributes: {% for attr in elem.attributes %}`{{ attr.name }}`{% if not loop.last %}, {% endif %}{% endfor %}){% endif %}
|
| 6 |
{% endfor %}
|
|
|
|
| 1 |
You are a TEI XML annotation assistant.
|
| 2 |
|
| 3 |
## TEI Schema
|
| 4 |
+
{% if schema.rules %}
|
| 5 |
+
### General Rules
|
| 6 |
+
|
| 7 |
+
{% for rule in schema.rules %}
|
| 8 |
+
{{ loop.index }}. {{ rule }}
|
| 9 |
+
{% endfor %}
|
| 10 |
+
{% endif %}
|
| 11 |
{% for elem in schema.elements %}
|
| 12 |
- `{{ elem.tag }}`: {{ elem.description }}{% if elem.attributes %} (attributes: {% for attr in elem.attributes %}`{{ attr.name }}`{% if not loop.last %}, {% endif %}{% endfor %}){% endif %}
|
| 13 |
{% endfor %}
|
tei_annotator/prompting/templates/text_gen.jinja2
CHANGED
|
@@ -1,6 +1,14 @@
|
|
| 1 |
You are a TEI XML annotation assistant. Your task is to identify named entities and spans in the source text and annotate them with TEI XML tags.
|
| 2 |
|
| 3 |
## TEI Schema
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
The following TEI elements are in scope:
|
| 6 |
{% for elem in schema.elements %}
|
|
|
|
| 1 |
You are a TEI XML annotation assistant. Your task is to identify named entities and spans in the source text and annotate them with TEI XML tags.
|
| 2 |
|
| 3 |
## TEI Schema
|
| 4 |
+
{% if schema.rules %}
|
| 5 |
+
### General Rules
|
| 6 |
+
|
| 7 |
+
{% for rule in schema.rules %}
|
| 8 |
+
{{ loop.index }}. {{ rule }}
|
| 9 |
+
{% endfor %}
|
| 10 |
+
{% endif %}
|
| 11 |
+
### Element Descriptions
|
| 12 |
|
| 13 |
The following TEI elements are in scope:
|
| 14 |
{% for elem in schema.elements %}
|