parserker / README.md
yehzw's picture
Upload README.md with huggingface_hub
2eb6369 verified
---
language:
- en
metrics:
- f1
base_model:
- FacebookAI/roberta-base
tags:
- parsing
- hashing
- unsupervised
---
## On Eliciting Syntax from Language Models via Hashing
## Model Details
This repository contains the implementation of [**Parserker v2**](https://aclanthology.org/2024.emnlp-main.479/), a
hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations
or tree labels).
## Usage
### Requirements
`pip install transformers torch nltk torchrua`
### Demo
```python
from nltk import TreePrettyPrinter
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True)
model.eval()
words, input_ids, duration = tokenizer([
"The quick brown fox jumps over the lazy dog",
"The man who you met yesterday is my teacher",
"The boy saw the girl with a telescope",
"The dog on the hill barked at the man who laughed",
])
for w, s in zip(words, model.parse(input_ids, duration).tolist()):
t = model.to_tree(w, s)
t = TreePrettyPrinter(t).text()
print(t)
```
### Ouput Examples
The quick brown fox jumps over the lazy dog
```
854D
_____________________|_________
| DF41
| _________|____
955F | DD59
__________|_____ | _________|____
| DC45 | | D457
| __________|____ | | _________|____
| | DE45 | | | DECD
| | ____|____ | | | ____|____
103B C404 DC05 D60D 9300 C995 D0B7 DC8D DE8D
| | | | | | | | |
The quick brown fox jumps over the lazy dog
```
The man who you met yesterday is my teacher
```
C50D
________________________|____________
| D558
| _________________|___________
| 9718 |
| _________|____ |
| | C718 5D52
| | _________|____ ____|____
965F | | DF00 | DF5D
____|____ | | ____|_______ | ____|______
103B C60D 1799 4719 D300 47BC 7192 5895 CE0D
| | | | | | | | |
The man who you met yesterday is my teacher
```
The boy saw the girl with a telescope
```
C14C
______________|_________
| DF41
| ______________|____
| | C54D
| | _________|____
| | | 9D59
| | | ____|____
165F | D657 | 9E55
____|____ | ____|____ | ____|_______
103B C20D D100 D0B7 C60D 3991 9817 CE8D
| | | | | | | |
The boy saw the girl with a telescope
```
The dog on the hill barked at the man who laughed
```
C50D
____________________|__________
| DF40
| __________|_________
C54D | DD19
_________|____ | ______________|____
| 9D09 | | C5CD
| ____|____ | | _________|_________
965F | D65F | | D657 97D8
____|____ | ____|____ | | ____|____ ____|______
103B C20D 2F99 D0B3 C60D D300 6395 D0B7 C60D 1799 CF89
| | | | | | | | | | |
The dog on the hill barked at the man who laughed
```
## Citation
```bib
@inproceedings{wang-utiyama-2024-eliciting,
title = "On Eliciting Syntax from Language Models via Hashing",
author = "Wang, Yiran and
Utiyama, Masao",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.479/",
doi = "10.18653/v1/2024.emnlp-main.479",
pages = "8412--8427"
}
```