File size: 17,626 Bytes
6d92a68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448

# Solving Olympiad Geometry without Human Demonstrations


This repository contains the code necessary to
reproduce DDAR and AlphaGeometry,
the two geometry theorem provers
introduced in the [Nature 2024](https://www.nature.com/articles/s41586-023-06747-5) paper:

*<center>"Solving Olympiad Geometry without Human Demonstrations".</center>*


</br>


<center>
<img alt="fig1" width="800px" src="fig1.svg">
</center>


## Dependencies

For the instructions presented below,
we use Python 3.10.9, and dependencies with their exact
version numbers listed in `requirements.txt`.

Our code depends on `meliad`, which is
not a registered package with `pip`. See instructions below
for how to manually install `meliad`.

Note that one can still run the DDAR solver
without the `meliad` and `sentencepiece` dependencies.

## Run the instructions

All instructions in this `README.md` can be run in one go by:

```

bash run.sh

```

Below, we explain these instructions step-by-step.

## Install dependencies, download weights and vocabulary.

Installation is done in a virtual environment:

```

virtualenv -p python3 .

source ./bin/activate

pip install --require-hashes -r requirements.txt

```

Download weights and vocabulary:

```

bash download.sh

DATA=ag_ckpt_vocab

```

Finally, install `meliad` separately as it is not
registered with `pip`:

```

MELIAD_PATH=meliad_lib/meliad

mkdir -p $MELIAD_PATH

git clone https://github.com/google-research/meliad $MELIAD_PATH

export PYTHONPATH=$PYTHONPATH:$MELIAD_PATH

```

## Set up common flags

Before running the python scripts,
let us first prepare some commonly used flags.
The symbolic engine needs definitions and deduction rules to operate.
These definitions and rules are provided in two text files
`defs.txt` and `rules.txt`.

```shell

DDAR_ARGS=(

  --defs_file=$(pwd)/defs.txt \

  --rules_file=$(pwd)/rules.txt \

);

```

Next, we define the flags relevant to the proof search.
To reproduce the simple examples below,
we use lightweight values for the proof search parameters:

```shell

BATCH_SIZE=2

BEAM_SIZE=2

DEPTH=2



SEARCH_ARGS=(

  --beam_size=$BEAM_SIZE

  --search_depth=$DEPTH

)

```

NOTE: The results in our paper can be obtained by setting
`BATCH_SIZE=32`, `BEAM_SIZE=512`, `DEPTH=16`
as described in section Methods.
To stay under IMO time limits, 4 V100-GPUs and 250 CPU workers
are needed as shown in Extended Data - Figure 1.
Note that we also strip away other memory/speed optimizations
due to internal dependencies and to promote code clarity.

Assume the downloaded checkpoint and vocabulary is placed in `DATA`,
and the installed `meliad` source code is at `MELIAD_PATH`.
We make use of the `gin` library to manage model configurations,
following `meliad` conventions. We now define the flags relevant to the
language model:

```shell

LM_ARGS=(

  --ckpt_path=$DATA \

  --vocab_path=$DATA/geometry.757.model

  --gin_search_paths=$MELIAD_PATH/transformer/configs,$(pwd) \

  --gin_file=base_htrans.gin \

  --gin_file=size/medium_150M.gin \

  --gin_file=options/positions_t5.gin \

  --gin_file=options/lr_cosine_decay.gin \

  --gin_file=options/seq_1024_nocache.gin \

  --gin_file=geometry_150M_generate.gin \

  --gin_param=DecoderOnlyLanguageModelGenerate.output_token_losses=True \

  --gin_param=TransformerTaskConfig.batch_size=$BATCH_SIZE \

  --gin_param=TransformerTaskConfig.sequence_length=128 \

  --gin_param=Trainer.restore_state_variables=False

);

```

TIP: Note that you can still run the DDAR solver
without defining `SEARCH_ARGS` and `LM_ARGS`.
In such case, simply disable the import of the `lm_inference` module
inside `alphageometry.py`.

## Run DDAR

The script loads a problem by reading a list of problems
from a text file and solves the specific problem in the list according
to its name. We pass these two pieces of information through the flags
`--problems_file` and `--problem_name`.
We use `--mode=ddar` to indicate that we want to use the DDAR solver.

Below we showed this solver solving IMO 2000 P1:

```shell

python -m alphageometry \

--alsologtostderr \

--problems_file=$(pwd)/imo_ag_30.txt \

--problem_name=translated_imo_2000_p1 \

--mode=ddar \

"${DDAR_ARGS[@]}"

```

Expect the following output

```shell

graph.py:468] translated_imo_2000_p1

graph.py:469] a b = segment a b; g1 = on_tline g1 a a b; g2 = on_tline g2 b b a; m = on_circle m g1 a, on_circle m g2 b; n = on_circle n g1 a, on_circle n g2 b; c = on_pline c m a b, on_circle c g1 a; d = on_pline d m a b, on_circle d g2 b; e = on_line e a c, on_line e b d; p = on_line p a n, on_line p c d; q = on_line q b n, on_line q c d ? cong e p e q

ddar.py:41] Depth 1/1000 time = 1.7772269248962402

ddar.py:41] Depth 2/1000 time = 5.63526177406311

ddar.py:41] Depth 3/1000 time = 6.883412837982178

ddar.py:41] Depth 4/1000 time = 10.275688409805298

ddar.py:41] Depth 5/1000 time = 12.048273086547852

alphageometry.py:190]

==========================

 * From theorem premises:

A B G1 G2 M N C D E P Q : Points

AG_1 ⟂ AB [00]

BA ⟂ G_2B [01]

G_2M = G_2B [02]

G_1M = G_1A [03]



...

[log omitted]

...



036. ∠QEB = ∠(QP-EA) [46] & ∠(BE-QP) = ∠AEP [55] ⇒  ∠EQP = ∠QPE [56]

037. ∠PQE = ∠EPQ [56] ⇒  EP = EQ



==========================

```

The output first includes a list of relevant premises that it uses,
and then proof steps that gradually build up the proof.
All predicates are numbered to track how they are derived
from the premises, and to show that the proof is fully justified.

TIP: Additionally passing the flag `--out_file=path/to/output/text/file.txt`
will write the proof to a text file.

Running on all problems in `imo_ag_30.txt` will yield solutions to
14 of them, as reported in Table 1 in our paper.

## Run AlphaGeometry:

As a simple example, we load `--problem_name=orthocenter`
from `--problem_file=examples.txt`.
This time, we pass `--mode=alphageometry` to use the AlphaGeometry solver
and pass the `SEARCH_ARGS` and `LM_ARGS` flags.

```shell

python -m alphageometry \

--alsologtostderr \

--problems_file=$(pwd)/examples.txt \

--problem_name=orthocenter \

--mode=alphageometry \

"${DDAR_ARGS[@]}" \

"${SEARCH_ARGS[@]}" \

"${LM_ARGS[@]}"

```

Expect the following output:

```shell

...

[log omitted]

...

training_loop.py:725] Total parameters: 152072288

training_loop.py:739] Total state size: 0

training_loop.py:492] Training loop: creating task for mode beam_search



graph.py:468] orthocenter

graph.py:469] a b c = triangle a b c; d = on_tline d b a c, on_tline d c a b ? perp a d b c

ddar.py:41] Depth 1/1000 time = 0.009987592697143555 branch = 4

ddar.py:41] Depth 2/1000 time = 0.00672602653503418 branch = 0

alphageometry.py:221] DD+AR failed to solve the problem.

alphageometry.py:457] Depth 0. There are 1 nodes to expand:

alphageometry.py:460] {S} a : ; b : ; c : ; d : T a b c d 00 T a c b d 01 ? T a d b c {F1} x00

alphageometry.py:465] Decoding from {S} a : ; b : ; c : ; d : T a b c d 00 T a c b d 01 ? T a d b c {F1} x00

...

[log omitted]

...

alphageometry.py:470] LM output (score=-1.102287): "e : C a c e 02 C b d e 03 ;"

alphageometry.py:471] Translation: "e = on_line e a c, on_line e b d"



alphageometry.py:480] Solving: "a b c = triangle a b c; d = on_tline d b a c, on_tline d c a b; e = on_line e a c, on_line e b d ? perp a d b c"

graph.py:468]

graph.py:469] a b c = triangle a b c; d = on_tline d b a c, on_tline d c a b; e = on_line e a c, on_line e b d ? perp a d b c

ddar.py:41] Depth 1/1000 time = 0.021120786666870117

ddar.py:41] Depth 2/1000 time = 0.033370018005371094

ddar.py:41] Depth 3/1000 time = 0.04297471046447754

alphageometry.py:140]

==========================

 * From theorem premises:

A B C D : Points

BD ⟂ AC [00]

CD ⟂ AB [01]



 * Auxiliary Constructions:

E : Points

E,B,D are collinear [02]

E,C,A are collinear [03]



 * Proof steps:

001. E,B,D are collinear [02] & E,C,A are collinear [03] & BD ⟂ AC [00] ⇒  ∠BEA = ∠CED [04]

002. E,B,D are collinear [02] & E,C,A are collinear [03] & BD ⟂ AC [00] ⇒  ∠BEC = ∠AED [05]

003. A,E,C are collinear [03] & E,B,D are collinear [02] & AC ⟂ BD [00] ⇒  EC ⟂ EB [06]

004. EC ⟂ EB [06] & CD ⟂ AB [01] ⇒  ∠(EC-BA) = ∠(EB-CD) [07]

005. E,C,A are collinear [03] & E,B,D are collinear [02] & ∠(EC-BA) = ∠(EB-CD) [07] ⇒  ∠BAE = ∠CDE [08]

006. ∠BEA = ∠CED [04] & ∠BAE = ∠CDE [08] (Similar Triangles)⇒  EB:EC = EA:ED [09]

007. EB:EC = EA:ED [09] & ∠BEC = ∠AED [05] (Similar Triangles)⇒  ∠BCE = ∠ADE [10]

008. EB:EC = EA:ED [09] & ∠BEC = ∠AED [05] (Similar Triangles)⇒  ∠EBC = ∠EAD [11]

009. ∠BCE = ∠ADE [10] & E,C,A are collinear [03] & E,B,D are collinear [02] & ∠EBC = ∠EAD [11] ⇒  AD ⟂ BC

==========================



alphageometry.py:505] Solved.

```

NOTE: Point `H` is automatically renamed to `D`,
as the LM is trained on synthetic problems
where the points are named alphabetically, and so it expects
the same during test time.

NOTE: In this implementation of AlphaGeometry,
we removed all optimizations that are dependent on
internal infrastructure, e.g.,
parallelized model inference on multi GPUs,
parallelized DDAR on multiple CPUs,
parallel execution of LM and DDAR,
shared pool of CPU workers across different problems, etc.
We also removed some memory/speed optimizations and code
abstractions in favor of code clarity.

As can be seen in the output, initially DDAR failed to solve the problem.
The LM proposes two auxiliary constructions (because `BATCH_SIZE=2`):

* `e = eqdistance e c a b, eqdistance e b a c`, i.e.,
construct `E` as the intersection of circle (center=C, radius=AB) and
circle (center=B, radius=AC). This construction has a score of `-1.186`.
* `e = on_line e a c, on_line e b d`, i.e.,
`E` is the intersection of `AC` and `BD`.
This construction has a higher score (`-1.102287`) than the previous.

Since the second construction has a higher score, DDAR attempted the second
construction first and found the solution right away.
The proof search therefore terminates and there is no second iteration.

## Results

Before attempting to reproduce the AlphaGeometry numbers in our paper,
please make sure to pass all tests in the prepared test suite:

```

bash run_tests.sh

```

NOTE: [Issues#14](https://github.com/google-deepmind/alphageometry/issues/14) reports that although the top beam decodes are still the same, the LM is not giving the same score for different users.

Then, pass the corresponding values for `--problem_file` (column)
and `--mode` (row), and
iterate on all problems to obtain the following results:

<center>

<b>Number of solved problems:</b>

|          | `imo_ag_30.txt`  | `jgex_ag_231.txt` |
|----------|------------------|-------------------|
| `ddar`   | 14               | 198               |
| `alphageometry`     | 25               | 228               |

</center>

## Source code description

Files in this repository include python modules/scripts to run the solvers and
resource files necessary for the script to execute. We listed below
each of them and their description.

| File name              | Description                                                                        |
|------------------------|------------------------------------------------------------------------------------|
| `geometry.py`          | Implements nodes (Point, Line, Circle, etc) in the proof state graph.              |
| `numericals.py`        | Implements the numerical engine in the dynamic geometry environment.               |
| `graph_utils.py`       | Implements utilities for the proof state graph.                                    |
| `graph.py`             | Implements the proof state graph.                                                  |
| `problem.py`           | Implements the classes that represent the problem premises, conclusion, DAG nodes. |
| `dd.py`                | Implements DD and its traceback.                                                   |
| `ar.py`                | Implements AR and its traceback.                                                   |
| `trace_back.py`        | Implements the recursive traceback and dependency difference algorithm.            |
| `ddar.py`              | Implements the combination DD+AR.                                                  |
| `beam_search.py`       | Implements beam decoding of a language model in JAX.                               |
| `models.py`            | Implements the transformer model.                                                  |
| `transformer_layer.py` | Implements the transformer layer.                                                  |
| `decoder_stack.py`     | Implements the transformer decoder stack.                                          |
| `lm_inference.py`      | Implements an interface to a trained LM to perform decoding.                       |
| `alphageometry.py`                | Main script that loads problems, calls DD+AR or AlphaGeometry solver, and prints solutions.   |
| `pretty.py`            | Pretty formating the solutions output by solvers.                                  |
| `*_test.py`            | Tests for the corresponding module.                                                |
| `download.sh`          | Script to download model checkpoints and LM                                        |
| `run.sh`               | Script to execute instructions in README.                                          |
| `run_tests.sh`         | Script to execute the test suite.                                                  |


Resource files:

| Resource file name     | Description                                                                        |
|------------------------|------------------------------------------------------------------------------------|
| `defs.txt`             | Definitions of different geometric construction actions.                           |
| `rules.txt`            | Deduction rules for DD.                                                            |
| `geometry_150M_generate.gin`| Gin config of the LM implemented in meliad.                                   |
| `imo_ag_30.txt`        | Problems in IMO-AG-30.                                                             |
| `jgex_ag_231.txt`      | Problems in JGEX-AG-231.                                                           |



## Citing this work

```bibtex

@Article{AlphaGeometryTrinh2024,

  author  = {Trinh, Trieu and Wu, Yuhuai and Le, Quoc and He, He and Luong, Thang},

  journal = {Nature},

  title   = {Solving Olympiad Geometry without Human Demonstrations},

  year    = {2024},

  doi     = {10.1038/s41586-023-06747-5}

}

```

## Acknowledgements

This research is a collaboration between the Google Brain team
(now Google Deepmind) and
the Computer Science Department of New York University.
We thank Rif A. Saurous, Denny Zhou, Christian Szegedy, Delesley Hutchins,
Thomas Kipf, Hieu Pham, Petar Veličković, Debidatta Dwibedi,
Kyunghyun Cho, Lerrel Pinto, Alfredo Canziani,
Thomas Wies, He He’s research group,
Evan Chen (the USA’s IMO team coach),
Mirek Olsak, Patrik Bak,
and all three Nature's referees for their help and support.

The code of AlphaGeometry communicates with and/or references the following
separate libraries and packages:

*   [Abseil](https://github.com/abseil/abseil-py)
*   [JAX](https://github.com/google/jax/)
*   [matplotlib](https://matplotlib.org/)
*   [NumPy](https://numpy.org)
*   [SciPy](https://scipy.org)
*   [TensorFlow](https://github.com/tensorflow/tensorflow)
*   [Meliad](https://github.com/google-research/meliad)
*   [Flax](https://github.com/google/flax)
*   [Gin](https://github.com/google/gin-config)
*   [T5](https://github.com/google-research/text-to-text-transfer-transformer)
*   [SentencePiece](https://github.com/google/sentencepiece)



We thank all their contributors and maintainers!


## Disclaimer

This is not an officially supported Google product.

This research code is provided "as-is" to the broader research community.
Google does not promise to maintain or otherwise support this code in any way.

## Code License

Copyright 2023 DeepMind Technologies Limited

All software is licensed under the Apache License, Version 2.0 (Apache 2.0);
you may not use this file except in compliance with the Apache 2.0 license.
You may obtain a copy of the Apache 2.0 license at:
https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0
International License (CC-BY). You may obtain a copy of the CC-BY license at:
https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and
materials distributed here under the Apache 2.0 or CC-BY licenses are
distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. See the licenses for the specific language governing
permissions and limitations under those licenses.

## Model Parameters License

The AlphaGeometry checkpoints and vocabulary are made available
under the terms of the Creative Commons Attribution 4.0
International (CC BY 4.0) license.
You can find details at:
https://creativecommons.org/licenses/by/4.0/legalcode