--- license: mit datasets: - jiebi/CodeConvo language: - en base_model: - BAAI/bge-large-en-v1.5 pipeline_tag: feature-extraction tags: - retrieval --- # Model Card for Model ID IDs-C2I-Enc is a bi-encoder retrieval model specifically fine-tuned for the IDs subset of the CodeConvo dataset. (IDs is short for Internet-Drafts) - **Paper:** [Automated Insights Into GitHub Collaboration Dynamics](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982058) - **Training Dataset:** [jiebi/CodeConvo](https://huggingface.co/datasets/jiebi/CodeConvo) ## Uses You can use MTEB to load this model ([source code](https://github.com/cheop-byeon/mteb-R2Gen/blob/main/mteb/models/bge_models.py)). To run the IR evaluation task ([source code](https://github.com/cheop-byeon/mteb-R2Gen/blob/main/RFCAlign_IR_mteb.py)). ## Training Details You can reproduce this with [this script](https://github.com/cheop-byeon/FlagEmbedding/blob/main/examples/finetune/embedder/encoder_only/ft_CodeConvo_encoder.sh) (reproduced here for convenience). ## Citation ```bibtex @article{bian2025automated, title={Automated Insights Into GitHub Collaboration Dynamics}, author={Bian, Jie and Arefev, Nikolay and M{\"u}hlh{\"a}user, Max and Welzl, Michael}, journal={IEEE Access}, volume={13}, pages={85526--85542}, year={2025} } ```