| --- |
| license: mit |
| datasets: |
| - jiebi/CodeConvo |
| language: |
| - en |
| base_model: |
| - BAAI/bge-large-en-v1.5 |
| pipeline_tag: feature-extraction |
| tags: |
| - retrieval |
| --- |
| |
|
|
| # Model Card for Model ID |
|
|
| <!-- Provide a quick summary of what the model is/does. --> |
|
|
| IDs-C2I-Enc is a bi-encoder retrieval model specifically fine-tuned for the IDs subset of the CodeConvo dataset. (IDs is short for Internet-Drafts) |
|
|
| - **Paper:** [Automated Insights Into GitHub Collaboration Dynamics](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982058) |
| - **Training Dataset:** [jiebi/CodeConvo](https://huggingface.co/datasets/jiebi/CodeConvo) |
|
|
| ## Uses |
|
|
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
| You can use MTEB to load this model ([source code](https://github.com/cheop-byeon/mteb-R2Gen/blob/main/mteb/models/bge_models.py)). |
|
|
| To run the IR evaluation task ([source code](https://github.com/cheop-byeon/mteb-R2Gen/blob/main/RFCAlign_IR_mteb.py)). |
|
|
| ## Training Details |
|
|
| You can reproduce this with [this script](https://github.com/cheop-byeon/FlagEmbedding/blob/main/examples/finetune/embedder/encoder_only/ft_CodeConvo_encoder.sh) (reproduced here for convenience). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{bian2025automated, |
| title={Automated Insights Into GitHub Collaboration Dynamics}, |
| author={Bian, Jie and Arefev, Nikolay and M{\"u}hlh{\"a}user, Max and Welzl, Michael}, |
| journal={IEEE Access}, |
| volume={13}, |
| pages={85526--85542}, |
| year={2025} |
| } |
| ``` |