--- language: - en metrics: - f1 base_model: - FacebookAI/roberta-base tags: - parsing - hashing - unsupervised --- ## On Eliciting Syntax from Language Models via Hashing ## Model Details This repository contains the implementation of [**Parserker v2**](https://aclanthology.org/2024.emnlp-main.479/), a hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations or tree labels). ## Usage ### Requirements `pip install transformers torch nltk torchrua` ### Demo ```python from nltk import TreePrettyPrinter from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True) model.eval() words, input_ids, duration = tokenizer([ "The quick brown fox jumps over the lazy dog", "The man who you met yesterday is my teacher", "The boy saw the girl with a telescope", "The dog on the hill barked at the man who laughed", ]) for w, s in zip(words, model.parse(input_ids, duration).tolist()): t = model.to_tree(w, s) t = TreePrettyPrinter(t).text() print(t) ``` ### Ouput Examples The quick brown fox jumps over the lazy dog ``` 854D _____________________|_________ | DF41 | _________|____ 955F | DD59 __________|_____ | _________|____ | DC45 | | D457 | __________|____ | | _________|____ | | DE45 | | | DECD | | ____|____ | | | ____|____ 103B C404 DC05 D60D 9300 C995 D0B7 DC8D DE8D | | | | | | | | | The quick brown fox jumps over the lazy dog ``` The man who you met yesterday is my teacher ``` C50D ________________________|____________ | D558 | _________________|___________ | 9718 | | _________|____ | | | C718 5D52 | | _________|____ ____|____ 965F | | DF00 | DF5D ____|____ | | ____|_______ | ____|______ 103B C60D 1799 4719 D300 47BC 7192 5895 CE0D | | | | | | | | | The man who you met yesterday is my teacher ``` The boy saw the girl with a telescope ``` C14C ______________|_________ | DF41 | ______________|____ | | C54D | | _________|____ | | | 9D59 | | | ____|____ 165F | D657 | 9E55 ____|____ | ____|____ | ____|_______ 103B C20D D100 D0B7 C60D 3991 9817 CE8D | | | | | | | | The boy saw the girl with a telescope ``` The dog on the hill barked at the man who laughed ``` C50D ____________________|__________ | DF40 | __________|_________ C54D | DD19 _________|____ | ______________|____ | 9D09 | | C5CD | ____|____ | | _________|_________ 965F | D65F | | D657 97D8 ____|____ | ____|____ | | ____|____ ____|______ 103B C20D 2F99 D0B3 C60D D300 6395 D0B7 C60D 1799 CF89 | | | | | | | | | | | The dog on the hill barked at the man who laughed ``` ## Citation ```bib @inproceedings{wang-utiyama-2024-eliciting, title = "On Eliciting Syntax from Language Models via Hashing", author = "Wang, Yiran and Utiyama, Masao", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.479/", doi = "10.18653/v1/2024.emnlp-main.479", pages = "8412--8427" } ```