Update README.md
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ OneKE is a new bilingual knowledge extraction large model developed jointly by Z
|
|
| 28 |
|
| 29 |
|
| 30 |
## How is OneKE trained?
|
| 31 |
-
OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based
|
| 32 |
|
| 33 |
The zero-shot generalization comparison results of OneKE with other large models are as follows:
|
| 34 |
* `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
|
|
@@ -268,7 +268,7 @@ split_num_mapper = {
|
|
| 268 |
```
|
| 269 |
|
| 270 |
|
| 271 |
-
Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a
|
| 272 |
|
| 273 |
**schema格式**:
|
| 274 |
|
|
@@ -281,7 +281,7 @@ EEA: [{"event_type": "Finance/Trading - Interest Rate Hike", "arguments": ["Time
|
|
| 281 |
```
|
| 282 |
|
| 283 |
|
| 284 |
-
Below is a simple
|
| 285 |
|
| 286 |
```python
|
| 287 |
def get_instruction(language, task, schema, input):
|
|
@@ -359,7 +359,7 @@ for split_schema in split_schemas:
|
|
| 359 |
|
| 360 |
|
| 361 |
<details>
|
| 362 |
-
<summary><b>Event Extraction (EE)
|
| 363 |
|
| 364 |
```json
|
| 365 |
{
|
|
@@ -407,7 +407,7 @@ for split_schema in split_schemas:
|
|
| 407 |
|
| 408 |
|
| 409 |
<details>
|
| 410 |
-
<summary><b>Knowledge Graph Construction (KGC)
|
| 411 |
|
| 412 |
```json
|
| 413 |
{
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
## How is OneKE trained?
|
| 31 |
+
OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based batched instruction construction, as shown in the illustration. For more detailed information, refer to the paper "[IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus](https://arxiv.org/abs/2402.14710) [[Github](https://github.com/zjunlp/IEPile)]".
|
| 32 |
|
| 33 |
The zero-shot generalization comparison results of OneKE with other large models are as follows:
|
| 34 |
* `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
|
|
|
|
| 268 |
```
|
| 269 |
|
| 270 |
|
| 271 |
+
Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a batched approach during training. It divides the number of schemas asked in the instructions, querying a fixed number of schemas at a time. Hence, if the label set of a piece of data is too long, it will be split into multiple instructions that the model will address in turns.
|
| 272 |
|
| 273 |
**schema格式**:
|
| 274 |
|
|
|
|
| 281 |
```
|
| 282 |
|
| 283 |
|
| 284 |
+
Below is a simple Batched Instruction Generation script:
|
| 285 |
|
| 286 |
```python
|
| 287 |
def get_instruction(language, task, schema, input):
|
|
|
|
| 359 |
|
| 360 |
|
| 361 |
<details>
|
| 362 |
+
<summary><b>Event Extraction (EE) Description Instructions</b></summary>
|
| 363 |
|
| 364 |
```json
|
| 365 |
{
|
|
|
|
| 407 |
|
| 408 |
|
| 409 |
<details>
|
| 410 |
+
<summary><b>Knowledge Graph Construction (KGC) Description Instructions</b></summary>
|
| 411 |
|
| 412 |
```json
|
| 413 |
{
|