gabykim's picture
fix readme
7b1c6e1
---
license: mit
---
## GraphCodeBERT CodeSearch Python
- base model comes from ['microsoft/graphcodebert-base'](https://huggingface.co/microsoft/graphcodebert-base)
- fine tuined for the python code-search task based on [GraphCodeBERT CodeSearch](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/codesearch)
- best mrr: 0.662
### fine tuning script
```sh
lang=python
mkdir -p ./saved_models/$lang
python run.py \
--output_dir=./saved_models/$lang \
--config_name=microsoft/graphcodebert-base \
--model_name_or_path=microsoft/graphcodebert-base \
--tokenizer_name=microsoft/graphcodebert-base \
--lang=$lang \
--do_train \
--train_data_file=dataset/$lang/train.jsonl \
--eval_data_file=dataset/$lang/valid.jsonl \
--test_data_file=dataset/$lang/test.jsonl \
--codebase_file=dataset/$lang/codebase.jsonl \
--num_train_epochs 10 \
--code_length 256 \
--data_flow_length 64 \
--nl_length 128 \
--train_batch_size 16 \
--eval_batch_size 32 \
--learning_rate 2e-5 \
--seed 123456 2>&1| tee saved_models/$lang/train.log
```