Einstein-Puzzles
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry (Arxiv)
Run Peng*, Ziqiao Ma*, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai
Model Details
For all the model fine-tuning, we employ LoRA with a rank of 32, training with a global batch size of 128 and a learning rate of 2e-4 using a cosine decay schedule for 1 epoch. Fine-tuning is conducted using OpenRLHF, while FlashAttention-2 is used to speed up training. The process takes approximately 30 minutes on 4 A40 GPUs with 48GB RAM each.
This repo provides the fine-tuned model with full capability of information providing and seeking and chain-of-thought reasoning.
Citation
- Downloads last month
- 9
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	🙋
			
		Ask for provider support
Model tree for Roihn/Einstein-Puzzles-Model
Base model
meta-llama/Llama-3.1-8B
				Finetuned
	
	
meta-llama/Llama-3.1-8B-Instruct