SanDP

File size: 1,788 Bytes

a7b3936

Official code for the paper ["Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing"](https://arxiv.org/abs/2201.11374).
If you use this code please cite our paper.

## Requirements

* Python 3.7 
* Pytorch 1.1.0 
* Cuda 9.0 
* Gensim 3.8.1

We assume that you have installed conda beforehand. 

```
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch
pip install gensim==3.8.1
```

## Pretrained embeddings for Sanskrit
* Pretrained FastText embeddings for STBC/VST can be obtained from [here](https://drive.google.com/drive/folders/1SwdEqikTq-N2vOL7QSUX2vqi3faZE7bq?usp=sharing). Make sure that `.txt` file is placed at `data/`
* The main results are reported on the systems trained by combining train and dev splits. 


## How to train model for Sanskrit
To run proposed system: (1) Pretraining (2) Integration, then simply run bash script `run_STBC.sh` or `run_VST.sh` for the respective dataset. With these scripts you will be able to reproduce our results reported in Section-3 and Table 2.

```bash
bash run_STBC.sh

```

## Citations
```
@misc{sandhan_systematic,
  doi = {10.48550/ARXIV.2201.11374},
  url = {https://arxiv.org/abs/2201.11374},
  author = {Sandhan, Jivnesh and Behera, Laxmidhar and Goyal, Pawan},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

```

## Acknowledgements
Our ensembled system is built on the top of ["DCST Implementation"](https://github.com/rotmanguy/DCST)