File size: 5,838 Bytes
cdbbe85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2eb6369
 
 
cdbbe85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2eb6369
cdbbe85
 
2eb6369
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cdbbe85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
language:
  - en
metrics:
  - f1
base_model:
  - FacebookAI/roberta-base
tags:
  - parsing
  - hashing
  - unsupervised
---

## On Eliciting Syntax from Language Models via Hashing

## Model Details

This repository contains the implementation of [**Parserker v2**](https://aclanthology.org/2024.emnlp-main.479/), a
hashing-based unsupervised parser trained on the Penn Treebank dataset using only the raw text (no syntactic annotations
or tree labels).

## Usage

### Requirements

`pip install transformers torch nltk torchrua`

### Demo

```python
from nltk import TreePrettyPrinter
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("yehzw/parserker", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("yehzw/parserker", trust_remote_code=True)

model.eval()

words, input_ids, duration = tokenizer([
    "The quick brown fox jumps over the lazy dog",
    "The man who you met yesterday is my teacher",
    "The boy saw the girl with a telescope",
    "The dog on the hill barked at the man who laughed",
])

for w, s in zip(words, model.parse(input_ids, duration).tolist()):
    t = model.to_tree(w, s)
    t = TreePrettyPrinter(t).text()
    print(t)
```

### Ouput Examples

The quick brown fox jumps over the lazy dog

```
                                 854D                              
             _____________________|_________                        
            |                              DF41                    
            |                      _________|____                   
           955F                   |             DD59               
  __________|_____                |     _________|____              
 |               DC45             |    |             D457          
 |      __________|____           |    |     _________|____         
 |     |              DE45        |    |    |             DECD     
 |     |           ____|____      |    |    |          ____|____    
103B  C404       DC05      D60D  9300 C995 D0B7      DC8D      DE8D
 |     |          |         |     |    |    |         |         |   
The  quick      brown      fox  jumps over the       lazy      dog
```

The man who you met yesterday is my teacher

```
                              C50D                                      
       ________________________|____________                             
      |                                    D558                         
      |                    _________________|___________                 
      |                  9718                           |               
      |          _________|____                         |                
      |         |             C718                     5D52             
      |         |     _________|____                ____|____            
     965F       |    |             DF00            |        DF5D        
  ____|____     |    |          ____|_______       |     ____|______     
103B      C60D 1799 4719      D300         47BC   7192 5895        CE0D 
 |         |    |    |         |            |      |    |           |    
The       man  who  you       met       yesterday  is   my       teacher
```

The boy saw the girl with a telescope

```
                    C14C                                   
       ______________|_________                             
      |                       DF41                         
      |          ______________|____                        
      |         |                  C54D                    
      |         |          _________|____                   
      |         |         |             9D59               
      |         |         |          ____|____              
     165F       |        D657       |        9E55          
  ____|____     |     ____|____     |     ____|_______      
103B      C20D D100 D0B7      C60D 3991 9817         CE8D  
 |         |    |    |         |    |    |            |     
The       boy  saw  the       girl with  a        telescope
```

The dog on the hill barked at the man who laughed

```
                                    C50D                                            
                 ____________________|__________                                     
                |                              DF40                                 
                |                     __________|_________                           
               C54D                  |                   DD19                       
       _________|____                |      ______________|____                      
      |             9D09             |     |                  C5CD                  
      |          ____|____           |     |          _________|_________            
     965F       |        D65F        |     |        D657                97D8        
  ____|____     |     ____|____      |     |     ____|____           ____|______     
103B      C20D 2F99 D0B3      C60D  D300  6395 D0B7      C60D      1799        CF89 
 |         |    |    |         |     |     |    |         |         |           |    
The       dog   on  the       hill barked  at  the       man       who       laughed
```

## Citation

```bib
@inproceedings{wang-utiyama-2024-eliciting,
    title = "On Eliciting Syntax from Language Models via Hashing",
    author = "Wang, Yiran  and
      Utiyama, Masao",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.479/",
    doi = "10.18653/v1/2024.emnlp-main.479",
    pages = "8412--8427"
}
```