You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This Working Group emerged from discussions at <ahref="https://sites.google.com/view/social-sims-with-llms" target="_blank" rel="noopener noreferrer">LLM-based Social Simulation Workshop</a> at <ahref="https://colmweb.org/">COLM</a> as a way to grow a vibrant research community with prodcutive research norms, e.g. as outlined in the pre-print, <ahref="papers/puelmatouzel_CloseEvalGap.pdf" target="_blank" rel="noopener noreferrer">Time to Close The Validation Gap in LLM Social Simulations</a> by members in the <ahref="www.complexdatalab.com" target="_blank" rel="noopener noreferrer">Complex Data Lab</a>.
136
+
This Working Group emerged from discussions at <ahref="https://sites.google.com/view/social-sims-with-llms" target="_blank" rel="noopener noreferrer">LLM-based Social Simulation Workshop</a> at <ahref="https://colmweb.org/">COLM</a> as a way to grow a vibrant research community with productive research norms, e.g. as outlined in the pre-print, <ahref="papers/puelmatouzel_CloseEvalGap.pdf" target="_blank" rel="noopener noreferrer">Time to Close The Validation Gap in LLM Social Simulations</a> by members in the <ahref="www.complexdatalab.com" target="_blank" rel="noopener noreferrer">Complex Data Lab</a>.
137
137
</p>
138
138
</div>
139
139
</div>
@@ -188,15 +188,16 @@ <h4>[DATE Y/M/D]</h4>
188
188
--><!-- END TALK TEMPLATE -->
189
189
<h4>2026/03/10</h4>
190
190
<li>
191
-
<b><ahref="[PAPER LINK]">Testing and Improving Multi-Agent LLM Cooperation</a></b>
191
+
<b><ahref="https://drive.google.com/file/d/1UNVlGqzhnh2BNpviwctUlvue33MjN34k/view">Evaluating Cooperation in LLM Social Groups through Self-Organizing Leadership</a></b>
192
192
<br>
193
-
Presenter: <u><ahref="https://zhijing-jin.com/" target="_blank" rel="noopener noreferrer">Zhijing Jin</a></u>, University of Toronto
193
+
Presenter: <u><ahref="https://www.cs.toronto.edu/~rfaulk/" target="_blank" rel="noopener noreferrer">Ryan Faulkner</a></u>, University of Toronto/Deepmind
Zhijing Jin (she/her) is an Assistant Professor at the University of Toronto and Research Scientist at the Max Planck Institute. She serves as a CIFAR AI Chair, an ELLIS advisor, and a faculty member at the Vector Institute, and the Schwartz Reisman Institute. She co-chairs the ACL Ethics Committee, and the ACL Year-Round Mentorship. Her research focuses on Causal Reasoning with LLMs, and AI Safety in Multi-Agent LLMs. She has published over 80 papers and has received the ELLIS PhD Award, three Rising Star awards, and two Best Paper awards at NeurIPS 2024 Workshops.
199
+
Ryan is a Computer Scientist and Machine Learning researcher with a background in reinforcement learning and foundation models. He has worked as a Research Engineer over the past decade at Google Deepmind and he is also a PhD Student at the University of Toronto advised by Zhijing Jin. At GDM he works in the Concordia group led by Joel Leibo. At a high level his current research focus is on multi-agent systems, LLMs, and social learning. In this context he is interested in memory mechanisms, agent theory of mind, collective decision making, and simulating political systems.
200
+
200
201
</div>
201
202
</div>
202
203
<br>
@@ -209,11 +210,7 @@ <h4>2026/03/10</h4>
209
210
</a>
210
211
<divclass="collapse" id="20260310-abstract">
211
212
<divclass="card card-body">
212
-
While progress has been made in evaluating single-agent LLMs for persona modeling, the behavior of these models within multi-agent groups remains underexplored. This presentation outlines a research series dedicated to closing this gap by testing LLM cooperation through autonomous social simulations. Specifically, we ask: what happens when personas are tasked to interact and cooperate?
213
-
<br>
214
-
To answer this, we introduce a suite of simulation environments (GovSim, MoralSim, and SanctSim) designed to stress-test persona interaction. These environments simulate high-stakes scenarios, such as the tragedy of the commons and ethical trade-offs, allowing us to investigate whether simulated societies can autonomously negotiate social order and how personas with differing ethical constraints navigate social dilemmas.
215
-
<br>
216
-
Our findings highlight implications for persona modeling. We show that agents exhibit a functional "theory of mind," capable of inferring the identities of their interlocutors and strategically adapting their behavior, sometimes exploiting specific model vulnerabilities. Furthermore, we discuss a counterintuitive phenomenon where advanced reasoning capabilities lead to exploitative behaviors that humans typically avoid, highlighting a significant misalignment between agent optimization and human social norms.
213
+
Governing common-pool resources requires agents to develop enduring strategies through cooperation and self-governance to avoid collective failure. While foundation models have shown potential for cooperation in these settings, existing multi-agent research provides little insight into whether structured leadership and election mechanisms can improve collective decision making. The lack of such a critical organizational feature ubiquitous in human society presents a significant shortcoming of the current methods. In this work we aim to directly address whether leadership and elections can support improved social welfare and cooperation through multi-agent simulation with LLMs. We present a new framework that simulates leadership through elected personas and candidate-driven agendas and carry out an empirical study of LLMs under controlled governance conditions. Our experiments demonstrate that structured leadership can improve social welfare scores by 55.4% and survival time by 128.6% across a range of high performing LLMs. Through the construction of an agent social graph we compute centrality metrics to assess the social influence of leader personas and also analyze rhetorical and cooperative tendencies revealed through a sentiment analysis on leader utterances. This work lays the foundation for developing prosocial, self-governing multi-agent systems capable of navigating complex resource dilemmas.
217
214
</div>
218
215
</div>
219
216
</li>
@@ -252,6 +249,40 @@ <h4>2026/03/24</h4>
252
249
</div>
253
250
</li>
254
251
252
+
<br>
253
+
254
+
<h4>2026/04/14</h4>
255
+
<li>
256
+
<b><ahref="[PAPER LINK]">Testing and Improving Multi-Agent LLM Cooperation</a></b>
257
+
<br>
258
+
Presenter: <u><ahref="https://zhijing-jin.com/" target="_blank" rel="noopener noreferrer">Zhijing Jin</a></u>, University of Toronto
Zhijing Jin (she/her) is an Assistant Professor at the University of Toronto and Research Scientist at the Max Planck Institute. She serves as a CIFAR AI Chair, an ELLIS advisor, and a faculty member at the Vector Institute, and the Schwartz Reisman Institute. She co-chairs the ACL Ethics Committee, and the ACL Year-Round Mentorship. Her research focuses on Causal Reasoning with LLMs, and AI Safety in Multi-Agent LLMs. She has published over 80 papers and has received the ELLIS PhD Award, three Rising Star awards, and two Best Paper awards at NeurIPS 2024 Workshops.
265
+
</div>
266
+
</div>
267
+
<br>
268
+
<!-- <a href="[RECORDING LINK - ADD AFTER TALK]"><img src="https://img.shields.io/badge/Youtube-Recording-orange"></a> -->
While progress has been made in evaluating single-agent LLMs for persona modeling, the behavior of these models within multi-agent groups remains underexplored. This presentation outlines a research series dedicated to closing this gap by testing LLM cooperation through autonomous social simulations. Specifically, we ask: what happens when personas are tasked to interact and cooperate?
278
+
<br>
279
+
To answer this, we introduce a suite of simulation environments (GovSim, MoralSim, and SanctSim) designed to stress-test persona interaction. These environments simulate high-stakes scenarios, such as the tragedy of the commons and ethical trade-offs, allowing us to investigate whether simulated societies can autonomously negotiate social order and how personas with differing ethical constraints navigate social dilemmas.
280
+
<br>
281
+
Our findings highlight implications for persona modeling. We show that agents exhibit a functional "theory of mind," capable of inferring the identities of their interlocutors and strategically adapting their behavior, sometimes exploiting specific model vulnerabilities. Furthermore, we discuss a counterintuitive phenomenon where advanced reasoning capabilities lead to exploitative behaviors that humans typically avoid, highlighting a significant misalignment between agent optimization and human social norms.
Martin Weiss is Co-Founder of Tiptree Systems, a startup building AI agents that help ML researchers find, create, and share knowledge more efficiently. Tiptree is deployed to researchers across many top-tier institutes including Mila, ELLIS, MIT, and many more. Martin holds a PhD in AI from Mila, where he studied under Hugo Larochelle and Chris Pal. Before his PhD, he was an early employee at YesGraph, a social graph startup acquired by Lyft.
This talk examines three converging crises. First, the decoupling of control from comprehension — we can increasingly predict and manipulate systems without understanding why they work. Second, the collapse of the generator-verifier gap — AI makes it trivial to produce the aesthetics of deep thought. This makes peer review more difficult because we can no longer rely on easy-to-verify signals of work quality. Third, the credit assignment gap — our academic reward systems optimize for publication metrics, not the increase in understanding that a new paper produces.
332
+
</div>
333
+
</div>
334
+
</li>
335
+
336
+
<br>
337
+
277
338
<h4>2026/02/17</h4>
278
339
<li>
279
340
<b><ahref="https://arxiv.org/abs/2510.25003">Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations</a></b>
0 commit comments