File size: 5,838 Bytes
cdbbe85 2eb6369 cdbbe85 2eb6369 cdbbe85 2eb6369 cdbbe85 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | ---
language:
- en
metrics:
- f1
base_model:
- FacebookAI/roberta-base
tags:
- parsing
- hashing
- unsupervised
---
## On Eliciting Syntax from Language Models via Hashing
## Model Details
This repository contains the implementation of [**Parserker v2**](https://aclanthology.org/2024.emnlp-main.479/), a
hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations
or tree labels).
## Usage
### Requirements
`pip install transformers torch nltk torchrua`
### Demo
```python
from nltk import TreePrettyPrinter
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True)
model.eval()
words, input_ids, duration = tokenizer([
"The quick brown fox jumps over the lazy dog",
"The man who you met yesterday is my teacher",
"The boy saw the girl with a telescope",
"The dog on the hill barked at the man who laughed",
])
for w, s in zip(words, model.parse(input_ids, duration).tolist()):
t = model.to_tree(w, s)
t = TreePrettyPrinter(t).text()
print(t)
```
### Ouput Examples
The quick brown fox jumps over the lazy dog
```
854D
_____________________|_________
| DF41
| _________|____
955F | DD59
__________|_____ | _________|____
| DC45 | | D457
| __________|____ | | _________|____
| | DE45 | | | DECD
| | ____|____ | | | ____|____
103B C404 DC05 D60D 9300 C995 D0B7 DC8D DE8D
| | | | | | | | |
The quick brown fox jumps over the lazy dog
```
The man who you met yesterday is my teacher
```
C50D
________________________|____________
| D558
| _________________|___________
| 9718 |
| _________|____ |
| | C718 5D52
| | _________|____ ____|____
965F | | DF00 | DF5D
____|____ | | ____|_______ | ____|______
103B C60D 1799 4719 D300 47BC 7192 5895 CE0D
| | | | | | | | |
The man who you met yesterday is my teacher
```
The boy saw the girl with a telescope
```
C14C
______________|_________
| DF41
| ______________|____
| | C54D
| | _________|____
| | | 9D59
| | | ____|____
165F | D657 | 9E55
____|____ | ____|____ | ____|_______
103B C20D D100 D0B7 C60D 3991 9817 CE8D
| | | | | | | |
The boy saw the girl with a telescope
```
The dog on the hill barked at the man who laughed
```
C50D
____________________|__________
| DF40
| __________|_________
C54D | DD19
_________|____ | ______________|____
| 9D09 | | C5CD
| ____|____ | | _________|_________
965F | D65F | | D657 97D8
____|____ | ____|____ | | ____|____ ____|______
103B C20D 2F99 D0B3 C60D D300 6395 D0B7 C60D 1799 CF89
| | | | | | | | | | |
The dog on the hill barked at the man who laughed
```
## Citation
```bib
@inproceedings{wang-utiyama-2024-eliciting,
title = "On Eliciting Syntax from Language Models via Hashing",
author = "Wang, Yiran and
Utiyama, Masao",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.479/",
doi = "10.18653/v1/2024.emnlp-main.479",
pages = "8412--8427"
}
``` |