IDs-C2I-Enc / README.md
jiebi's picture
Update README.md
e999efe verified
---
license: mit
datasets:
- jiebi/CodeConvo
language:
- en
base_model:
- BAAI/bge-large-en-v1.5
pipeline_tag: feature-extraction
tags:
- retrieval
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
IDs-C2I-Enc is a bi-encoder retrieval model specifically fine-tuned for the IDs subset of the CodeConvo dataset. (IDs is short for Internet-Drafts)
- **Paper:** [Automated Insights Into GitHub Collaboration Dynamics](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982058)
- **Training Dataset:** [jiebi/CodeConvo](https://huggingface.co/datasets/jiebi/CodeConvo)
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
You can use MTEB to load this model ([source code](https://github.com/cheop-byeon/mteb-R2Gen/blob/main/mteb/models/bge_models.py)).
To run the IR evaluation task ([source code](https://github.com/cheop-byeon/mteb-R2Gen/blob/main/RFCAlign_IR_mteb.py)).
## Training Details
You can reproduce this with [this script](https://github.com/cheop-byeon/FlagEmbedding/blob/main/examples/finetune/embedder/encoder_only/ft_CodeConvo_encoder.sh) (reproduced here for convenience).
## Citation
```bibtex
@article{bian2025automated,
title={Automated Insights Into GitHub Collaboration Dynamics},
author={Bian, Jie and Arefev, Nikolay and M{\"u}hlh{\"a}user, Max and Welzl, Michael},
journal={IEEE Access},
volume={13},
pages={85526--85542},
year={2025}
}
```