sanganaka
/

SanDP

Model card Files Files and versions

SanDP / ReadMe.md

shivrajanand's picture

Upload folder using huggingface_hub

a7b3936 verified 5 months ago

|

history blame contribute delete

1.79 kB

	Official code for the paper ["Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing"](https://arxiv.org/abs/2201.11374).
	If you use this code please cite our paper.

	## Requirements

	* Python 3.7
	* Pytorch 1.1.0
	* Cuda 9.0
	* Gensim 3.8.1

	We assume that you have installed conda beforehand.

	```
	conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch
	pip install gensim==3.8.1
	```

	## Pretrained embeddings for Sanskrit
	* Pretrained FastText embeddings for STBC/VST can be obtained from [here](https://drive.google.com/drive/folders/1SwdEqikTq-N2vOL7QSUX2vqi3faZE7bq?usp=sharing). Make sure that `.txt` file is placed at `data/`
	* The main results are reported on the systems trained by combining train and dev splits.


	## How to train model for Sanskrit
	To run proposed system: (1) Pretraining (2) Integration, then simply run bash script `run_STBC.sh` or `run_VST.sh` for the respective dataset. With these scripts you will be able to reproduce our results reported in Section-3 and Table 2.

	```bash
	bash run_STBC.sh

	```

	## Citations
	```
	@misc{sandhan_systematic,
	doi = {10.48550/ARXIV.2201.11374},
	url = {https://arxiv.org/abs/2201.11374},
	author = {Sandhan, Jivnesh and Behera, Laxmidhar and Goyal, Pawan},
	keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
	title = {Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing},
	publisher = {arXiv},
	year = {2022},
	copyright = {Creative Commons Attribution 4.0 International}
	}

	```

	## Acknowledgements
	Our ensembled system is built on the top of ["DCST Implementation"](https://github.com/rotmanguy/DCST)