File size: 1,054 Bytes
72aebfc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
license: mit
---
## GraphCodeBERT CodeSearch Python
- base model comes from ['microsoft/graphcodebert-base']
- fine tuined for the python code-search task based on [GraphCodeBERT CodeSearch](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/codesearch)
- best mrr: 0.662
### fine tuning script
```sh
lang=python
mkdir -p ./saved_models/$lang
python run.py \
--output_dir=./saved_models/$lang \
--config_name=microsoft/graphcodebert-base \
--model_name_or_path=microsoft/graphcodebert-base \
--tokenizer_name=microsoft/graphcodebert-base \
--lang=$lang \
--do_train \
--train_data_file=dataset/$lang/train.jsonl \
--eval_data_file=dataset/$lang/valid.jsonl \
--test_data_file=dataset/$lang/test.jsonl \
--codebase_file=dataset/$lang/codebase.jsonl \
--num_train_epochs 10 \
--code_length 256 \
--data_flow_length 64 \
--nl_length 128 \
--train_batch_size 16 \
--eval_batch_size 32 \
--learning_rate 2e-5 \
--seed 123456 2>&1| tee saved_models/$lang/train.log
``` |