JeppeKlitgaard commited on
Commit
b4c706a
·
verified ·
1 Parent(s): 390d86b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - math
5
+ - ocr
6
+ - typst
7
+ - latex
8
+ size_categories:
9
+ - 1M<n<10M
10
+ ---
11
+
12
+ # Typst Image Dataset
13
+
14
+ This dataset was generated with a [fork](https://github.com/JeppeKlitgaard/tex2typ) of [tex2typ] and the [hoang-quoc-trung/fusion-image-to-latex-datasets] dataset, which itself is a compilation of LaTeX labels and images of equations.
15
+
16
+ The hoang-quoc-trung dataset is difficult to work with in that it has the image data stored in a large compressed RAR archive, which does not permit efficient random read access. Additionally, it appears to have a larger number of corrupted filenames inside the archive, which has been mended in this dataset.
17
+
18
+ This dataset instead opts to use a WebDataset for convenient and efficient storage of the image files and associated metadata.
19
+
20
+ The code used to generate this dataset can be found at here: https://github.com/JeppeKlitgaard/DTU-02456-Deep-Learning-Project (this is currently private but should be released after examination. If this is not the case prod me at `huggingface@jeppe.science`)
21
+
22
+ [tex2typ]: https://github.com/ParaN3xus/tex2typ
23
+ [hoang-quoc-trung/fusion-image-to-latex-datasets]: https://huggingface.co/datasets/hoang-quoc-trung/fusion-image-to-latex-datasets