File size: 1,107 Bytes
72aebfc
 
 
 
 
 
7b1c6e1
72aebfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: mit
---

## GraphCodeBERT CodeSearch Python

- base model comes from ['microsoft/graphcodebert-base'](https://huggingface.co/microsoft/graphcodebert-base)
- fine tuined for the python code-search task based on [GraphCodeBERT CodeSearch](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/codesearch)
- best mrr: 0.662

### fine tuning script
```sh
lang=python
mkdir -p ./saved_models/$lang
python run.py \
    --output_dir=./saved_models/$lang \
    --config_name=microsoft/graphcodebert-base \
    --model_name_or_path=microsoft/graphcodebert-base \
    --tokenizer_name=microsoft/graphcodebert-base \
    --lang=$lang \
    --do_train \
    --train_data_file=dataset/$lang/train.jsonl \
    --eval_data_file=dataset/$lang/valid.jsonl \
    --test_data_file=dataset/$lang/test.jsonl \
    --codebase_file=dataset/$lang/codebase.jsonl \
    --num_train_epochs 10 \
    --code_length 256 \
    --data_flow_length 64 \
    --nl_length 128 \
    --train_batch_size 16 \
    --eval_batch_size 32 \
    --learning_rate 2e-5 \
    --seed 123456 2>&1| tee saved_models/$lang/train.log
```