Paper addressing multi-agent AI problems wins top accolade

Paper addressing multi-agent AI problems wins top accolade

The winning paper ‘Counterfactual Multi-Agent Policy Gradients’ (COMA) presents a method which could soon make it possible to deploy learning multi-agent systems in the real world.

COMA differs from a lot artificial intelligence research by focussing on multi-agent problems, rather than single agent setting and two player games. There are many challenging multi-agent problems to tackle, ranging from self-driving cars to drones and even social interactions. In many of these applications a number of independent entities needs to be able to take independent actions based on local observations in order to achieve a common goal.

For example, in a fleet of search-and-rescue drones each single drone typically needs to be able to decide on its best course of action using only local information. This is commonly referred to as ‘decentralised execution’. However, often the design of the policies can be carried out in a centralised fashion, for example when training of the policies is carried out using a simulator which has access to the observations and actions of all agents. The research team believes that this domain of centralised training and decentralised execution is one of the key avenues for successfully developing and deploying multi-agent systems in the real world.

One of the great challenges when training multi-agent policies is the credit assignment problem. Just like in a football team, the reward achieved depends on the actions of all of the different agents. Given that all agents are constantly improving their policies, it is difficult for any given agent to evaluate the impact of their individual action on the overall performance of the team. To address this issue, the research team (Computer Science’s Jakob N. Foerster, Gregory Farquhar (CDT in AIMS) and Professor Shimon Whiteson, with Engineering Science’s Triantafyllos Afouras ( CDT in AIMS) and Nantas Nardelli) developed the COMA method. In the paper, the researchers model the problem setting of StarCraft unit-management as a challenging cooperative multi-agent problem. The team’s training method outperforms existing methods and achieves high win rates against the StarCraft bot.

The team’s certificate will be presented at AAAI-18 on 6 February.

Article originally published on the Department of Computer Science website: http://www.cs.ox.ac.uk/news/1448-full.html on 17 January 2018.