SanDP / ReadMe.md
shivrajanand's picture
Upload folder using huggingface_hub
a7b3936 verified
Official code for the paper ["Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing"](https://arxiv.org/abs/2201.11374).
If you use this code please cite our paper.
## Requirements
* Python 3.7
* Pytorch 1.1.0
* Cuda 9.0
* Gensim 3.8.1
We assume that you have installed conda beforehand.
```
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch
pip install gensim==3.8.1
```
## Pretrained embeddings for Sanskrit
* Pretrained FastText embeddings for STBC/VST can be obtained from [here](https://drive.google.com/drive/folders/1SwdEqikTq-N2vOL7QSUX2vqi3faZE7bq?usp=sharing). Make sure that `.txt` file is placed at `data/`
* The main results are reported on the systems trained by combining train and dev splits.
## How to train model for Sanskrit
To run proposed system: (1) Pretraining (2) Integration, then simply run bash script `run_STBC.sh` or `run_VST.sh` for the respective dataset. With these scripts you will be able to reproduce our results reported in Section-3 and Table 2.
```bash
bash run_STBC.sh
```
## Citations
```
@misc{sandhan_systematic,
doi = {10.48550/ARXIV.2201.11374},
url = {https://arxiv.org/abs/2201.11374},
author = {Sandhan, Jivnesh and Behera, Laxmidhar and Goyal, Pawan},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
```
## Acknowledgements
Our ensembled system is built on the top of ["DCST Implementation"](https://github.com/rotmanguy/DCST)