--- license: mit --- ## GraphCodeBERT CodeSearch Python - base model comes from ['microsoft/graphcodebert-base'](https://huggingface.co/microsoft/graphcodebert-base) - fine tuined for the python code-search task based on [GraphCodeBERT CodeSearch](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/codesearch) - best mrr: 0.662 ### fine tuning script ```sh lang=python mkdir -p ./saved_models/$lang python run.py \ --output_dir=./saved_models/$lang \ --config_name=microsoft/graphcodebert-base \ --model_name_or_path=microsoft/graphcodebert-base \ --tokenizer_name=microsoft/graphcodebert-base \ --lang=$lang \ --do_train \ --train_data_file=dataset/$lang/train.jsonl \ --eval_data_file=dataset/$lang/valid.jsonl \ --test_data_file=dataset/$lang/test.jsonl \ --codebase_file=dataset/$lang/codebase.jsonl \ --num_train_epochs 10 \ --code_length 256 \ --data_flow_length 64 \ --nl_length 128 \ --train_batch_size 16 \ --eval_batch_size 32 \ --learning_rate 2e-5 \ --seed 123456 2>&1| tee saved_models/$lang/train.log ```