|
|
--- |
|
|
base_model: |
|
|
- meta-llama/Llama-3.1-8B-Instruct |
|
|
library_name: peft |
|
|
license: mit |
|
|
datasets: |
|
|
- Roihn/Einstein-Puzzles-Data |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
# Einstein-Puzzles |
|
|
|
|
|
**Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry** ([Arxiv](https://arxiv.org/abs/2510.25595)) |
|
|
|
|
|
*Run Peng\*, Ziqiao Ma\*, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai* |
|
|
|
|
|
## Model Details |
|
|
|
|
|
For all the model fine-tuning, we employ LoRA with a rank of 32, training with a global batch size of 128 and a learning rate of 2e-4 using a cosine decay schedule for 1 epoch. Fine-tuning is conducted using [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), while FlashAttention-2 is used to speed up training. The process takes approximately 30 minutes on 4 A40 GPUs with 48GB RAM each. |
|
|
|
|
|
This repo provides the fine-tuned model with full capability of information providing and seeking and chain-of-thought reasoning. |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@misc{peng2025communicationverificationllmagents, |
|
|
title={Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry}, |
|
|
author={Run Peng and Ziqiao Ma and Amy Pang and Sikai Li and Zhang Xi-Jia and Yingzhuo Yu and Cristian-Paul Bara and Joyce Chai}, |
|
|
year={2025}, |
|
|
eprint={2510.25595}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2510.25595}, |
|
|
} |
|
|
``` |
|
|
|