|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
## GraphCodeBERT CodeSearch Python |
|
|
|
|
|
- base model comes from ['microsoft/graphcodebert-base'] |
|
|
- fine tuined for the python code-search task based on [GraphCodeBERT CodeSearch](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/codesearch) |
|
|
- best mrr: 0.662 |
|
|
|
|
|
### fine tuning script |
|
|
```sh |
|
|
lang=python |
|
|
mkdir -p ./saved_models/$lang |
|
|
python run.py \ |
|
|
--output_dir=./saved_models/$lang \ |
|
|
--config_name=microsoft/graphcodebert-base \ |
|
|
--model_name_or_path=microsoft/graphcodebert-base \ |
|
|
--tokenizer_name=microsoft/graphcodebert-base \ |
|
|
--lang=$lang \ |
|
|
--do_train \ |
|
|
--train_data_file=dataset/$lang/train.jsonl \ |
|
|
--eval_data_file=dataset/$lang/valid.jsonl \ |
|
|
--test_data_file=dataset/$lang/test.jsonl \ |
|
|
--codebase_file=dataset/$lang/codebase.jsonl \ |
|
|
--num_train_epochs 10 \ |
|
|
--code_length 256 \ |
|
|
--data_flow_length 64 \ |
|
|
--nl_length 128 \ |
|
|
--train_batch_size 16 \ |
|
|
--eval_batch_size 32 \ |
|
|
--learning_rate 2e-5 \ |
|
|
--seed 123456 2>&1| tee saved_models/$lang/train.log |
|
|
``` |