| --- |
| language: |
| - en |
| metrics: |
| - f1 |
| base_model: |
| - FacebookAI/roberta-base |
| tags: |
| - parsing |
| - hashing |
| - unsupervised |
| --- |
| |
| ## On Eliciting Syntax from Language Models via Hashing |
|
|
| ## Model Details |
|
|
| This repository contains the implementation of [**Parserker v2**](https://aclanthology.org/2024.emnlp-main.479/), a |
| hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations |
| or tree labels). |
|
|
| ## Usage |
|
|
| ### Requirements |
|
|
| `pip install transformers torch nltk torchrua` |
|
|
| ### Demo |
|
|
| ```python |
| from nltk import TreePrettyPrinter |
| from transformers import AutoModel, AutoTokenizer |
| |
| model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True) |
| tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True) |
| |
| model.eval() |
| |
| words, input_ids, duration = tokenizer([ |
| "The quick brown fox jumps over the lazy dog", |
| "The man who you met yesterday is my teacher", |
| "The boy saw the girl with a telescope", |
| "The dog on the hill barked at the man who laughed", |
| ]) |
| |
| for w, s in zip(words, model.parse(input_ids, duration).tolist()): |
| t = model.to_tree(w, s) |
| t = TreePrettyPrinter(t).text() |
| print(t) |
| ``` |
|
|
| ### Ouput Examples |
|
|
| The quick brown fox jumps over the lazy dog |
|
|
| ``` |
| 854D |
| _____________________|_________ |
| | DF41 |
| | _________|____ |
| 955F | DD59 |
| __________|_____ | _________|____ |
| | DC45 | | D457 |
| | __________|____ | | _________|____ |
| | | DE45 | | | DECD |
| | | ____|____ | | | ____|____ |
| 103B C404 DC05 D60D 9300 C995 D0B7 DC8D DE8D |
| | | | | | | | | | |
| The quick brown fox jumps over the lazy dog |
| ``` |
|
|
| The man who you met yesterday is my teacher |
|
|
| ``` |
| C50D |
| ________________________|____________ |
| | D558 |
| | _________________|___________ |
| | 9718 | |
| | _________|____ | |
| | | C718 5D52 |
| | | _________|____ ____|____ |
| 965F | | DF00 | DF5D |
| ____|____ | | ____|_______ | ____|______ |
| 103B C60D 1799 4719 D300 47BC 7192 5895 CE0D |
| | | | | | | | | | |
| The man who you met yesterday is my teacher |
| ``` |
|
|
| The boy saw the girl with a telescope |
|
|
| ``` |
| C14C |
| ______________|_________ |
| | DF41 |
| | ______________|____ |
| | | C54D |
| | | _________|____ |
| | | | 9D59 |
| | | | ____|____ |
| 165F | D657 | 9E55 |
| ____|____ | ____|____ | ____|_______ |
| 103B C20D D100 D0B7 C60D 3991 9817 CE8D |
| | | | | | | | | |
| The boy saw the girl with a telescope |
| ``` |
|
|
| The dog on the hill barked at the man who laughed |
|
|
| ``` |
| C50D |
| ____________________|__________ |
| | DF40 |
| | __________|_________ |
| C54D | DD19 |
| _________|____ | ______________|____ |
| | 9D09 | | C5CD |
| | ____|____ | | _________|_________ |
| 965F | D65F | | D657 97D8 |
| ____|____ | ____|____ | | ____|____ ____|______ |
| 103B C20D 2F99 D0B3 C60D D300 6395 D0B7 C60D 1799 CF89 |
| | | | | | | | | | | | |
| The dog on the hill barked at the man who laughed |
| ``` |
|
|
| ## Citation |
|
|
| ```bib |
| @inproceedings{wang-utiyama-2024-eliciting, |
| title = "On Eliciting Syntax from Language Models via Hashing", |
| author = "Wang, Yiran and |
| Utiyama, Masao", |
| editor = "Al-Onaizan, Yaser and |
| Bansal, Mohit and |
| Chen, Yun-Nung", |
| booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", |
| month = nov, |
| year = "2024", |
| address = "Miami, Florida, USA", |
| publisher = "Association for Computational Linguistics", |
| url = "https://aclanthology.org/2024.emnlp-main.479/", |
| doi = "10.18653/v1/2024.emnlp-main.479", |
| pages = "8412--8427" |
| } |
| ``` |