PEFT
Safetensors
English
llama
Roihn's picture
Update README.md
7f3b831 verified
metadata
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
library_name: peft
license: mit
datasets:
  - Roihn/Einstein-Puzzles-Data
language:
  - en

Einstein-Puzzles

Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry (Arxiv)

Run Peng*, Ziqiao Ma*, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai

Model Details

For all the model fine-tuning, we employ LoRA with a rank of 32, training with a global batch size of 128 and a learning rate of 2e-4 using a cosine decay schedule for 1 epoch. Fine-tuning is conducted using OpenRLHF, while FlashAttention-2 is used to speed up training. The process takes approximately 30 minutes on 4 A40 GPUs with 48GB RAM each.

This repo provides the fine-tuned model with full capability of information providing and seeking and chain-of-thought reasoning.

Citation

@misc{peng2025communicationverificationllmagents,
      title={Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry}, 
      author={Run Peng and Ziqiao Ma and Amy Pang and Sikai Li and Zhang Xi-Jia and Yingzhuo Yu and Cristian-Paul Bara and Joyce Chai},
      year={2025},
      eprint={2510.25595},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.25595}, 
}