Ian Gemp

Ian Gemp

External Links

Inspired by Sameer Singh; Powered by Bootstrap


Publications (grouped by year): DeepMind (2019-), UMass / Internships (2013-18), Northwestern / UTexas (2006-11).
  • L. Marris, I. Gemp, G. Piliouras. Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of 2x2 Normal-Form Games. arXiv. 2023
    Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study 2x2 games. The equilibrium-invariant embedding of 2x2 games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of 2x2 games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in 2x2 games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for 2x2 games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.
                title={Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2 $ Normal-Form Games},
                author={Marris, Luke, Ian Gemp, and Georgios Piliouras},
                journal={arXiv preprint arXiv:2304.09978},
  • K. Du, I. Gemp, Y. Wu, Y. Wu. AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov Decision Process. AAAI Student Program. 2023
    Reinforcement learning has recently been used to approach well-known NP-hard combinatorial problems in graph theory. Among these problems, Hamiltonian cycle problems are exceptionally difficult to analyze, even when restricted to individual instances of structurally complex graphs. In this paper, we use Monte Carlo Tree Search (MCTS), the search algorithm behind many state-of-the-art reinforcement learning algorithms such as AlphaZero, to create autonomous agents that learn to play the game of Snake, a game centered on properties of Hamiltonian cycles on grid graphs. The game of Snake can be formulated as a single-player discounted Markov Decision Process (MDP) where the agent must behave optimally in a stochastic environment. Determining the optimal policy for Snake, defined as the policy that maximizes the probability of winning - or win rate - with higher priority and minimizes the expected number of time steps to win with lower priority, is conjectured to be NP-hard. Performance-wise, compared to prior work in the Snake game, our algorithm is the first to achieve a win rate over 0.5 (a uniform random policy achieves a win rate <2.57×10^15), demonstrating the versatility of AlphaZero in approaching NP-hard environments.
                title={Alpha{S}nake: Policy Iteration on a Nondeterministic {NP}-hard {M}arkov Decision Process},
                author={Du, Kevin and Gemp, Ian and Wu, Yi and Wu, Yingying},
                booktitle={AAAI 2023 Student Abstract and Poster Program},
  • I. Gemp, C. Chen, B. McWilliams. The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium. ICLR. 2023
    The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive when dealing with streaming data sets (i.e., minibatches) and research has instead concentrated on finding efficient solutions to specific problem instances. In this work, we develop a game-theoretic formulation of the top-k SGEP whose Nash equilibrium is the set of generalized eigenvectors. We also present a parallelizable algorithm with guaranteed asymptotic convergence to the Nash. Current state-of-the-art methods require O(d2k) runtime complexity per iteration which is prohibitively expensive when the number of dimensions (d) is large. We show how to modify this parallel approach to achieve O(dk) runtime complexity. Empirically we demonstrate that this resulting algorithm is able to solve a variety of SGEP problem instances including a large-scale analysis of neural network activations.
                title={The Symmetric Generalized Eigenvalue Problem as a {N}ash Equilibrium},
                author={Gemp, Ian and Chen, Charlie and McWilliams, Brian},
  • L. Marris, I. Gemp, T. Anthony, A. Tacchetti, Y. Bachrach. Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers. Neurips. 2022
    Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms. Unfortunately, solving a normal-form game could take prohibitive or non-deterministic time to converge, and could fail. We introduce the Neural Equilibrium Solver which utilizes a special equivariant neural network architecture to approximately solve the space of all games of fixed shape, buying speed and determinism. We define a flexible equilibrium selection framework, that is capable of uniquely selecting an equilibrium that minimizes relative entropy, or maximizes welfare. The network is trained without needing to generate any supervised training data. We show remarkable zero-shot generalization to larger games. We argue that such a network is a powerful component for many possible multiagent algorithms.
                title={Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers},
                author={Marris, Luke and Gemp, Ian and Anthony, Thomas and Tacchetti, Andrea and Liu, Siqi and Tuyls, Karl},
  • I. Gemp, T. Anthony, J. Kramár, T. Eccles, A. Tacchetti, Y. Bachrach. Designing All-pay Auctions using Deep Learning and Multi-agent Simulation. Nature: Scientific Reports. 2022
    We propose a multi-agent learning approach for designing crowdsourcing contests and All-Pay auctions. Prizes in contests incentivise contestants to expend effort on their entries, with different prize allocations resulting in different incentives and bidding behaviors. In contrast to auctions designed manually by economists, our method searches the possible design space using a simulation of the multi-agent learning process, and can thus handle settings where a game-theoretic equilibrium analysis is not tractable. Our method simulates agent learning in contests and evaluates the utility of the resulting outcome for the auctioneer. Given a large contest design space, we assess through simulation many possible contest designs within the space, and fit a neural network to predict outcomes for previously untested contest designs. Finally, we apply mirror ascent to optimize the design so as to achieve more desirable outcomes. Our empirical analysis shows our approach closely matches the optimal outcomes in settings where the equilibrium is known, and can produce high quality designs in settings where the equilibrium strategies are not solvable analytically.
                title={Designing all-pay auctions using deep learning and multi-agent simulation},
                author={Gemp, Ian and Anthony, Thomas and Kramar, Janos and Eccles, Tom and Tacchetti, Andrea and Bachrach, Yoram},
                journal={Scientific Reports},
                publisher={Nature Publishing Group}
  • L. Marris, M. Lanctot, I. Gemp, S. Omidshafie, S. McAleer, J. Connor, K. Tuyls, T. Graepel. Game Theoretic Rating in N-player General-sum Games with Equilibria. arXiv. 2022
    Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting. Traditionally, only transitive dependencies between strategies have been used to rate strategies (e.g. Elo), however recent work has expanded ratings to utilize game theoretic solutions to better rate strategies in non-transitive games. This work generalizes these ideas and proposes novel algorithms suitable for N-player, general-sum rating of strategies in normal-form games according to the payoff rating system. This enables well-established solution concepts, such as equilibria, to be leveraged to efficiently rate strategies in games with complex strategic interactions, which arise in multiagent training and real-world interactions between many agents. We empirically validate our methods on real world normal-form data (Premier League) and multiagent reinforcement learning agent evaluation.
                title={Game Theoretic Rating in N-player general-sum games with Equilibria},
                author={Marris, Luke and Lanctot, Marc and Gemp, Ian and Omidshafiei, Shayegan and McAleer, Stephen and Connor, Jerome and Tuyls, Karl and Graepel, Thore},
                journal={arXiv preprint arXiv:2210.02205},
  • E. Van der Pol, I. Gemp, Y. Bachrach, R. Everett. Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering. ICML Workshop: Topology, Algebra, and Geometry in Machine Learning. 2022
    Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than ) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.
                title={Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering},
                author={van der Pol, Elise and Gemp, Ian and Bachrach, Yoram and Everett, Richard},
                journal={arXiv preprint arXiv:2207.14589},
  • Z. Yu, J. Zhang, Z. Wen, A. Tacchetti, M. Wang, I. Gemp. Teamwork Reinforcement Learning with Concave Utilities. ICLR Workshop: Gamification and Multiagent Solutions. 2022
    Complex reinforcement learning (RL) tasks often require a divide-and-conquer approach, where a large task is divided into pieces and solved by individual agents. In this paper, we study a teamwork RL setting where individual agents make decisions on disjoint subsets (blocks) of the state space and have private interests (reward functions), while the entire team aims to maximize a general long-term team utility function and may be subject to constraints. This team utility, which is not necessarily a cumulative sum of rewards, is modeled as a nonlinear function of the team's joint state-action occupancy distribution. By leveraging the inherent duality of policy optimization, we propose a min-max multi-block policy optimization framework to decompose the overall problem into individual local tasks. This enables a federated teamwork mechanism where a team lead coordinates individual agents via reward shaping, and each agent solves her local task defined only on their local state subset. We analyze the convergence of this teamwork policy optimization mechanism and establish an O(1/T) convergence rate to the team's joint optimum. This mechanism allows team members to jointly find the global socially optimal policy while keeping their local privacy.
                title={Teamwork Reinforcement Learning with Concave Utilities},
                author={Yu, Zheng and Zhang, Junyu and Wen, Zheng and Tacchetti, Andrea and Wang, Mengdi and Gemp, Ian},
                booktitle={ICLR 2022 Workshop on Gamification and Multiagent Solutions},
  • I. Gemp, T. Anthony, Y. Bachrach, A. Bhoopchand, K. Bullard, J. Connor, V. Dasagi, B. De Vylder, E. Duéñez-Guzmán, R. Elie, R. Everett, D. Hennes, E. Hughes, M. Khan, M. Lanctot, K. Larson, G. Lever, S. Liu, L. Marris, K. R. McKee, P. Muller, J. Pérolat, F. Strub, A. Tacchetti, E. Tarassov, Z. Wang, K. Tuyls. Developing, Evaluating and Scaling Learning Agents in Multi-agent Environments. AI Communications. 2022
    The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.
                title={Developing, evaluating and scaling learning agents in multi-agent environments},
                author={Gemp, Ian and Anthony, Thomas and Bachrach, Yoram and Bhoopchand, Avishkar and Bullard, Kalesha and Connor, Jerome and Dasagi, Vibhavari and De Vylder, Bart and Du{\'e}{\~n}ez-Guzm{\'a}n, Edgar A and Elie, Romuald and Everett, Richard and Hennes, Daniel and Hughes, Edward and Khan, Mina and Lanctot, Marc and Larson, Kate and Lever, Guy and Liu, Siqi and Marris, Luke and McKee, Kevin R. and Muller, Paul and Pérolat, Julien and Tacchetti, Andrea and Tarassov, Eugene and Wang, Zhe and Tuyls, Karl},
                journal={AI Communications},
                publisher={IOS Press}
  • I. Gemp, B. McWilliams, C. Vernade, T. Graepel. EigenGame Unloaded: When playing games is better than optimizing. ICLR. 2022
    We build on the recently proposed EigenGame that views eigendecomposition as a competitive game. EigenGame's updates are biased if computed using minibatches of data, which hinders convergence and more sophisticated parallelism in the stochastic setting. In this work, we propose an unbiased stochastic update that is asymptotically equivalent to EigenGame, enjoys greater parallelism allowing computation on datasets of larger sample sizes, and outperforms EigenGame in experiments. We present applications to finding the principal components of massive datasets and performing spectral clustering of graphs. We analyze and discuss our proposed update in the context of EigenGame and the shift in perspective from optimization to games.
              author={Gemp, Ian and McWilliams, Brian and Vernade, Claire and Graepel, Thore},
              title = {EigenGame Unloaded: When playing games is better than optimizing},
              year = {2022},
              journal = {ICLR},
  • I. Gemp, K. R. McKee, R. Everett, E. Duéñez-Guzmán, Y. Bachrach, D. Balduzzi, A. Tacchetti. D3C: Reducing the Price of Anarchy in Multi-Agent Learning. AAMAS. 2022
    In multiagent systems, the complex interaction of fixed incentives can lead agents to outcomes that are poor (inefficient) not only for the group, but also for each individual. Price of anarchy is a technical, game-theoretic definition that quantifies the inefficiency arising in these scenarios -- it compares the welfare that can be achieved through perfect coordination against that achieved by self-interested agents at a Nash equilibrium. We derive a differentiable, upper bound on a price of anarchy that agents can cheaply estimate during learning. Equipped with this estimator, agents can adjust their incentives in a way that improves the efficiency incurred at a Nash equilibrium. Agents do so by learning to mix their reward (equiv. negative loss) with that of other agents by following the gradient of our derived upper bound. We refer to this approach as D3C. In the case where agent incentives are differentiable, D3C resembles the celebrated Win-Stay, Lose-Shift strategy from behavioral game theory, thereby establishing a connection between the global goal of maximum welfare and an established agent-centric learning rule. In the non-differentiable setting, as is common in multiagent reinforcement learning, we show the upper bound can be reduced via evolutionary strategies, until a compromise is reached in a distributed fashion. We demonstrate that D3C improves outcomes for each agent and the group as a whole on several social dilemmas including a traffic network exhibiting Braess's paradox, a prisoner's dilemma, and several multiagent domains.
              author={Gemp, Ian and McKee, Kevin R. and Everett, Richard and Duéñez-Guzmán, Edgar A. and Bachrach, Yoram and Balduzzi, David and Tacchetti, Andrea},
              title = {D3C: Reducing the Price of Anarchy in Multi-Agent Learning},
              year = {2022}
              journal = {AAMAS},
  • I. Gemp, R. Savani, M. Lanctot, Y. Bachrach, T. Anthony, R. Everett, A. Tacchetti, T. Eccles, J. Kramár. Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent. AAMAS. 2022
    Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by McKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms.
              author={Gemp, Ian and Savani, Rahul and Lanctot, Marc and Bachrach, Yoram and Anthony, Thomas and Everett, Richard and Tacchetti, Andrea and Eccles, Tom and Kramár, János},
              title = {Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent},
              year = {2022}
              journal = {AAMAS},
  • Y. Bachrach, I. Gemp, M. Garnelo, J. Kramár, T. Eccles, D. Rosenbaum, T. Graepel. A Neural Network Auction For Group Decision Making Over a Continuous Space. IJCAI-21 Demonstrations Track. 2021
    We propose a system for conducting an auction over locations in a continuous space. It enables participants to express their preferences over possible choices of location in the space, selecting the location that maximizes the total utility of all agents. We prevent agents from tricking the system into selecting a location that improves their individual utility at the expense of others by using a pricing rule that gives agents no incentive to misreport their true preferences. The system queries participants for their utility in many random locations, then trains a neural network to approximate the preference function of each participant. The parameters of these neural network models are transmitted and processed by the auction mechanism, which composes these into differentiable models that are optimized through gradient ascent to compute the final chosen location and charged prices.
              author={Bachrach, Yoram and Gemp, Ian and Garnelo, Marta and Kramar, Janos and Eccles, Tom and Rosenbaum, Dan and Graepel, Thore},
              title={A Neural Network Auction For Group Decision Making Over a Continuous Space},
              year = {2021}
              journal = {IJCAI-21 Demonstrations Track},
  • I. Gemp, B. McWilliams, C. Vernade, T. Graepel. EigenGame: PCA as a Nash Equilibrium. ICLR (Outstanding Paper Award). 2021
    We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm which combines elements from Oja's rule with a generalized Gram-Schmidt orthogonalization is naturally decentralized and hence parallelizable through message passing. We demonstrate the scalability of the algorithm with experiments on large image datasets and neural network activations. We discuss how this new view of PCA as a differentiable game can lead to further algorithmic developments and insights.
              author={Gemp, Ian and McWilliams, Brian and Vernade, Claire and Graepel, Thore},
              title = {EigenGame: PCA as a Nash Equilibrium},
              year = {2021},
              journal = {ICLR},
  • R. Patel, M. Garnelo, I. Gemp, C. Dyer, Y. Bachrach. Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index. NAACL. 2021
    The input vocabulary and their learned representations are crucial to the performance of neural NLP models. Using the full vocabulary results in less explainable and more memory intensive models, with the embedding layer often constituting the majority of model parameters. It is thus common to use a smaller vocabulary to lower memory requirements and construct more interpertable models. We propose a vocabulary selection method that views words as members of a team trying to maximize the model’s performance. We apply power indices from cooperative game theory, including the Shapley value and Banzhaf index, that measure the relative importance of individual team members in accomplishing a joint task. We approximately compute these indices to identify the most influential words. Our empirical evaluation examines multiple NLP tasks, including sentence and document classification, question answering and textual entailment. We compare to baselines that select words based on frequency, TF-IDF and regression coefficients under L1 regularization, and show that this game-theoretic vocabulary selection outperforms all baselines on a range of different tasks and datasets.
                title={Game-theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index},
                author={Patel, Roma and Garnelo, Marta and Gemp, Ian and Dyer, Chris and Bachrach, Yoram},
                booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  • I. Gemp, K. R. McKee, R. Everett, E. Duéñez-Guzmán, Y. Bachrach, D. Balduzzi, A. Tacchetti. D3C: Reducing the Price of Anarchy in Multi-Agent Learning. Neurips Workshop: Cooperative AI. 2020
    Even in simple multi-agent systems, fixed incentives can lead to outcomes that are poor for the group and each individual agent. We propose a method, D3C, for online adjustment of agent incentives that reduces the loss incurred at a Nash equilibrium. Agents adjust their incentives by learning to mix their incentive with that of other agents, until a compromise is reached in a distributed fashion. We show that D3C improves outcomes for each agent and the group as a whole on several social dilemmas including a traffic network with Braess's paradox, a prisoner's dilemma, and several reinforcement learning domains.
              author={Gemp, Ian and McKee, Kevin R. and Everett, Richard and Duéñez-Guzmán, Edgar A. and Bachrach, Yoram and Balduzzi, David and Tacchetti, Andrea},
              title = {D3C: Reducing the Price of Anarchy in Multi-Agent Learning},
              year = {2020}
              journal = {Neurips Workshop: Cooperative AI},
  • T. Anthony, T. Eccles, A. Tacchetti, J. Kramár, I. Gemp, T. Hudson, N. Porcel, M. Lanctot, J. Pérolat, R. Everett, S. Singh, T. Graepel, Y. Bachrach. Learning to Play No-Press Diplomacy with Best Response Policy Iteration. Neurips (Spotlight). 2020
    Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
              author={Anthony, Thomas and Eccles, Tom and Tacchetti, Andrea and Kramár, János and Gemp, Ian and Hudson, Thomas C. and Porcel, Nicolas and Lanctot, Marc and Pérolat, Julien and Everett, Richard and Singh, Satinder and Graepel, Thore and Bachrach, Yoram},
              title = {Learning to Play No-Press Diplomacy
    with Best Response Policy Iteration},
              year = {2020}
              journal = {Neurips},
  • K. R. McKee, I. Gemp, B. McWilliams, E. Duéñez-Guzmán, E. Hughes, J. Leibo. Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning. AAMAS. 2020
    Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixedmotive games—the imperfect correlation of incentives between group members—we study the effect of population heterogeneity on mixed-motive reinforcement learning. We draw on interdependence theory from social psychology and imbue reinforcement learning agents with Social Value Orientation (SVO), a flexible formalization of preferences over group outcome distributions. We subsequently explore the effects of diversity in SVO on populations of reinforcement learning agents in two mixed-motive Markov games. We demonstrate that heterogeneity in SVO generates meaningful and complex behavioral variation among agents similar to that suggested by interdependence theory. Empirical results in these mixed-motive dilemmas suggest agents trained in heterogeneous populations develop particularly generalized, high-performing policies relative to those trained in homogeneous populations.
              author={McKee, Kevin and Gemp, Ian and McWilliams, Brian and Duéñez-Guzmán, Edgar and Hughes, Edward and Leibo, Joel},
              title = {Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning},
              year = {2020}
              journal = {AAMAS},
  • D. Balduzzi, W. Czarnecki, T. Anthony, I. Gemp, E. Hughes, J. Leibo, G. Piliouras, T. Graepel. Smooth Markets: A Basic Mechanism for Organizing Gradient-Based Learners. ICLR. 2020
    With the success of modern machine learning, it is becoming increasingly importantly to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.
              author={Balduzzi, David and Czarnecki, Wojciech and Anthony, Thomas and Gemp, Ian and Hughes, Edward and Leibo, Joel and Pilouras, George and Graepel, Thore},
              title = {Smooth Markets: A Basic Mechanism for Organizing Gradient-Based Learners},
              year = {2020}
              journal = {ICLR},
  • I. Gemp, B. McWilliams. The Unreasonable Effectiveness of Adam on Cycles. Neurips Workshop: Bridging Game Theory & Deep Learning. 2019
    Generative adversarial networks (GANs) are state of the art generative models for images and other domains. Training GANs is difficult, although not nearly as difficult as expected given theoretical results on finding a Nash (PPAD complete) and our understanding of dynamical systems. Several new algorithms and techniques have been proposed to stabilize GAN training, but nearly all employ Adam or RMSProp. In fact, training a GAN with SGD instead of Adam often fails. Here, we aim to understand how Adam circumvents some of the difficulties associated with GAN training. To this end, we study Adam in the context of a cycle problem. The cycle problem is a canonical equilibrium problem for which naive optimization approaches, e.g., simultaneous SGD, fail. Understanding how Adam works in this context helps reveal reasons for its unexpected success.
              author={Gemp, Ian and McWilliams, Brian},
              title = {The Unreasonable Effectiveness of Adam on Cycles},
              year = {2019}
              journal = {Neurips Workshop: Bridging Game Theory & Deep Learning},
  • I. Gemp, R. Nallapati, R. Ding, F. Nan, B. Xiang. Weakly Semi-Supervised Neural Topic Models. ICLR Workshop: Learning with Limited Labeled Data. 2019
    We consider the problem of topic modeling in a weakly semi-supervised setting. In this scenario, we assume that the user knows a priori a subset of the topics she wants the model to learn and is able to provide a few exemplar documents for those topics. In addition, while each document may typically consist of multiple topics, we do not assume that the user will identify all its topics exhaustively. Recent state-of-the-art topic models such as NVDM, referred to herein as Neural Topic Models (NTMs), fall under the variational autoencoder framework. We extend NTMs to the weakly semi-supervised setting by using informative priors in the training objective. After analyzing the effect of informative priors, we propose a simple modification of the NVDM model using a logit-normal posterior that we show achieves better alignment to user-desired topics versus other NTM models.
              author={Gemp, Ian and Nallapati, Ramesh and Ding, Ran and Nan, Feng and Xiang, Bing},
              title = {Weakly Semi-Supervised Neural Topic Models},
              year = {2019}
              journal = {ICLR Workshop: Learning with Limited Labeled Data},
  • I. Gemp. From Optimization to Equilibration: Understanding an Emerging Paradigm in Artificial Intelligence and Machine Learning. UMass Amherst Thesis. 2018
    Many existing machine learning (ML) algorithms cannot be viewed as gradient descent on some single objective. The solution trajectories taken by these algorithms naturally exhibit rotation, sometimes forming cycles, a behavior that is not expected with (full-batch) gradient descent. However, these algorithms can be viewed more generally as solving for the equilibrium of a game with possibly multiple competing objectives. Moreover, some recent ML models, specifically generative adversarial net- works (GANs) and its variants, are now explicitly formulated as equilibrium problems. Equilibrium problems present challenges beyond those encountered in optimization such as limit-cycles and chaotic attractors and are able to abstract away some of the difficulties encountered when training models like GANs.

    In this thesis, I aim to advance our understanding of equilibrium problems so as to improve state-of-the-art in GANs and related domains. In the following chapters, I will present work on
    1. designing a no-regret framework for solving monotone equilibrium problems in online or streaming settings (with applications to Reinforcement Learning),
    2. ensuring convergence when training a GAN to fit a normal distribution to data by Crossing-the-Curl,
    3. improving state-of-the-art image generation with techniques derived from theory,
    4. and borrowing tools from dynamical systems theory for analyzing the complex dynamics of GAN training.
              author={Gemp, Ian},
              title = {From Optimization to Equilibration: Understanding an Emerging Paradigm in Artificial Intelligence and Machine Learning},
              year = {2018}
              journal = {UMass Amherst Technical Report},
  • I. Gemp, S. Mahadevan. Global Convergence to the Equilibrium of GANs using Variational Inequalities. arXiv preprint arXiv:1808.01531. 2018
    In optimization, the negative gradient of a function denotes the direction of steepest descent. Furthermore, traveling in any direction orthogonal to the gradient maintains the value of the function. In this work, we show that these orthogonal directions that are ignored by gradient descent can be critical in equilibrium problems. Equilibrium problems have drawn heightened attention in machine learning due to the emergence of the Generative Adversarial Network (GAN). We use the framework of Variational Inequalities to analyze popular training algorithms for a fundamental GAN variant: the Wasserstein Linear-Quadratic GAN. We show that the steepest descent direction causes divergence from the equilibrium, and guaranteed convergence to the equilibrium is achieved through following a particular orthogonal direction. We call this successful technique Crossing-the-Curl, named for its mathematical derivation as well as its intuition: identify the game's axis of rotation and move "across" space in the direction towards smaller "curling".
              author={Gemp, Ian and Mahadevan, Sridhar},
              title = {Global Convergence to the Equilibrium of GANs using Variational Inequalities},
              year = {2018}
              journal = {arXiv preprint arXiv:1808.01531},
  • B. Liu, I. Gemp, M. Ghavamzadeh, J. Liu,S. Mahadevan, M. Petrik. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity. Journal of Artificial Intelligence Research. 2018
    In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal ``mirror maps'' to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.
              title={Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity},
              author={Liu, Bo and Gemp, Ian and Ghavamzadeh, Mohammad and Liu, Ji and Mahadevan, Sridhar and Petrik, Marek},
              journal={Journal of Artificial Intelligence Research},
  • I. Gemp, M. Parente, S. Mahadevan. Inverting VAEs for Improved Generative Accuracy. NIPS Workshop: Advances in Approximate Bayesian Inference. 2017
    Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x}_l,\mathbf{y}_l$) to large unlabeled ones ($\mathbf{x}_u$). When the codomain ($\mathbf{y}$) has known structure, a large un\emph{featured} dataset ($\mathbf{y}_u$) is potentially available. We develop a parameter-efficient, deep semi-supervised generative model for the purpose of exploiting this untapped data source. Empirical results show improved performance in disentangling variable semantics as well as improved discriminative prediction on a new MNIST task.
              author={Gemp, Ian and Parente, Mario and Mahadevan, Sridhar},
              title = {Inverting VAEs for Improved Generative Accuracy},
              year = {2017}
              journal = {NIPS Workshop: Advances in Approximate Bayesian Inference},
  • I. Gemp, S. Mahadevan. Online Monotone Games. arXiv preprint arXiv:1710.07328. 2017
    Algorithmic game theory (AGT) focuses on the design and analysis of algorithms for interacting agents, with interactions rigorously formalized within the framework of games. Results from AGT find applications in domains such as online bidding auctions for web advertisements and network routing protocols. Monotone games are games where agent strategies naturally converge to an equilibrium state. Previous results in AGT have been obtained for convex, socially-convex, or smooth games, but not monotone games. Our primary theoretical contributions are defining the monotone game setting and its extension to the online setting, a new notion of regret for this setting, and accompanying algorithms that achieve sub-linear regret. We demonstrate the utility of online monotone game theory on a variety of problem domains including variational inequalities, reinforcement learning, and generative adversarial networks.
              author={Gemp, Ian and Mahadevan, Sridhar},
              title = {Online Monotone Games},
              year = {2017}
              journal = {arXiv preprint arXiv:1710.07328},
  • I. Gemp, M. Parente, I. Durugkar, D. Dyar, S. Mahadevan Inverting Variational Autoencoders for Improved Generative Accuracy. arXiv preprint arXiv:1608.05983. 2017
    Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets (x,y) to large unlabeled ones (x). In the case where the codomain has known structure, a large unfeatured dataset (y) is potentially available. We develop a parameter-efficient, deep semi-supervised generative model for the purpose of exploiting this untapped data source. Empirical results show improved performance in disentangling latent variable semantics as well as improved discriminative prediction on Martian spectroscopic and handwritten digit domains.
              author={Gemp, Ian and Parente, Mario and Durugkar, Ishan and Dyar, Darby and Mahadevan, Sridhar},
              title = {Inverting Variational Autoencoders for Improved Generative Accuracy},
              year = {2017}
              journal = {arXiv preprint arXiv:1608.05983},
  • I. Durugkar, I. Gemp, S. Mahadevan Generative Multi-Adversarial Networks. ICLR. 2017
    Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game. In this paper, we propose the Generative Multi-Adversarial Network (GMAN), a framework that extends GANs to multiple discriminators. In previous work, the successful training of GANs requires modifying the minimax objective to accelerate training early on. In contrast, GMAN can be reliably trained with the original, untampered objective. We explore a number of design perspectives with the discriminator role ranging from formidable adversary to forgiving teacher. Image generation tasks comparing the proposed framework to standard GANs demonstrate GMAN produces higher quality samples in a fraction of the iterations when measured by a pairwise GAM-type metric.
              author={Durugkar, Ishan and Gemp, Ian and Mahadevan, Sridhar},
              title = {Generative Multi-Adversarial Networks},
              year = {2017}
              journal = {ICLR},
  • I. Gemp, G. Theocharous, M. Ghavamzadeh. Automated Data Cleansing through Meta-Learning. IAAI Challenge Paper. 2017
    Data preprocessing or cleansing is one of the biggest hurdles in industry for developing successful machine learning applications. The process of data cleansing includes data imputation, feature normalization & selection, dimensionality reduction, and data balancing applications. Currently such preprocessing is manual. One approach for automating this process is meta-learning. In this paper we experiment with state of the art meta-learning methodologies and identify the inadequacies and research challenges for solving such a problem.
              author={Gemp, Ian and Theocharous, Georgios and Ghavamzadeh, Mohammad},
              title = {Automated Data Cleansing through Meta-Learning},
              year = {2017}
              journal = {IAAI},
  • I. Gemp, S. Mahadevan. Online Monotone Optimization. arXiv preprint arXiv:1608.07888. 2016
    This paper presents a new framework for analyzing and designing no-regret algorithms for dynamic (possibly adversarial) systems. The proposed framework generalizes the popular online convex optimization framework and extends it to its natural limit allowing it to capture a notion of regret that is intuitive for more general problems such as those encountered in game theory and variational inequalities. The framework hinges on a special choice of a system-wide loss function we have developed. Using this framework, we prove that a simple update scheme provides a no-regret algorithm for monotone systems. While previous results in game theory prove individual agents can enjoy unilateral no-regret guarantees, our result proves monotonicity sufficient for guaranteeing no-regret when considering the adjustments of multiple agent strategies in parallel. Furthermore, to our knowledge, this is the first framework to provide a suitable notion of regret for variational inequalities. Most importantly, our proposed framework ensures monotonicity a sufficient condition for employing multiple online learners safely in parallel.
              author={Gemp, Ian and Mahadevan, Sridhar},
              title = {Online Monotone Optimization},
              year = {2016}
              journal = {arXiv preprint arXiv:1608.07888},
  • I. Gemp. Exploring the Dynamics of Variational Inequality Games with Non-Concave Utilities. NIPS Workshop: Learning, Inference, and Control of Multi-Agent Systems. 2015
    Variational inequality (VI) theory has proven useful in modeling and analyzing a variety economic markets. However, in order to ensure the analysis is tractable, models are usually constrained to an unrealistic regime of concave utilities and monotone operators undermining the reliability of real-world conclusions such as the uniqueness and location of equilibria. We argue that machine learning can help address this issue. In this paper, we ignore typical monotonicity requirements and construct a generic, yet more realistic market model possessing several desirable qualities. We then borrow a tool from dynamical systems to cope with our model’s lack of theoretical guarantees. Additionally, in order to handle the large size of standard VI game formulations, we further enhance the tool to accommodate more sophisticated numerical algorithms and propose a heuristic for efficient use of generated trajectories. We illustrate these enhancements by applying the resulting pipeline in the context of cloud services.
              author={Gemp, Ian},
              title = {Exploring the Dynamics of Variational Inequality Games with Non-Concave Utilities},
              year = {2015}
              booktitle={NIPS Workshop: Learning, Inference, and Control of Multi-Agent Systems},
  • I. Gemp, S. Mahadevan. Finding Equilibria in Large Games using Variational Inequalities. AAAI Spring Symposium. 2015
    In this paper, we explore an approach to computational game theory based on variational inequalities (VIs). VIs represent a comprehensive framework that provides a way to model and analyze both cooperative and non-cooperative games. Given the potentially large size of real-world games, suitable algorithms must be designed that can scale gracefully with the dimension of the problems (e.g., number of players). In this paper, we explore the effectiveness of novel Runge-Kutta methods on finding equilibrium solutions to two real-world games defined by oligopolistic economies.
              author={Gemp, Ian and Mahadevan, Sridhar},
              title = {Finding Equilibria in Large Games using Variational Inequalities},
              year = {2015}
              booktitle={AAAI Spring Symposium},
  • I. Gemp, S. Mahadevan, B. Liu. Solving Large Sustainable Supply Chain Networks using Variational Inequalities. AAAI Workshop: Computational Sustainability. 2015
    In this paper, we explore a new approach to computational sustainability based on variational inequalities (VIs). Our challenge is to compute the steady state behaviors of networks of sustainable supply chains with possibly conflicting objectives. VIs provide a way to model large networks with numerous conflicting goals. Given the size of real-world networks, suitable algorithms must be selected that can scale with the dimension of the problems. In this paper, we explore the effectiveness of novel Runge-Kutta methods on finding equilibrium solutions to two real-world sustainable supply chain problems.
              author={Gemp, Ian and Mahadevan, Sridhar and Liu, Bo},
              title = {Solving Large Sustainable Supply Chain Networks using Variational Inequalities},
              year = {2015}
              booktitle={AAAI Workshop: Computational Sustainability},
  • I. Gemp, S. Mahadevan. Modeling Context in Cognition using Variational Inequalities. AAAI Fall Symposium. 2014
    Important aspects of human cognition, like creativity and play, involve dealing with multiple divergent views of objects, goals, and plans. We argue in this paper that the current model of optimization that drives much of modern machine learning research is far too restrictive a paradigm to mathematically model the richness of human cognition. Instead, we propose a much more flexible and powerful framework of equilibration, which not only generalizes optimization, but also captures a rich variety of other problems, from game theory, complementarity problems, network equilibrium problems in economics, and equation solving. Our thesis is that creative activity involves dealing not with a single objective function, which optimization requires, but rather balancing multiple divergent and possibly contradictory goals. Such modes of cognition are better modeled using the framework of variational inequalities (VIs). We provide a brief review of this paradigm for readers unfamiliar with the underlying mathematics, and sketch out how VIs can account for creativity and play in human and animal cognition.
              author={Gemp, Ian and Mahadevan, Sridhar},
              title = {Modeling Context in Cognition using Variational Inequalities},
              year = {2014}
              booktitle={AAAI Fall Symposium},
  • S. Mahadevan, B. Liu, P. Thomas, W. Dabney, S. Giguere, N. Jacek, I. Gemp, J. Liu. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces. arXiv preprint arXiv:1405.6757. 2014
    In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a Legendre transform. This allows temporal difference updates to occur in dual spaces, allowing a variety of important technical advantages. The Legendre transform elegantly generalizes past algorithms for solving reinforcement learning problems, such as natural gradient methods, which we show relate closely to the previously unconnected framework of mirror descent methods. Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in recent variants of gradient-based temporal difference learning. This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning. Finally, Legendre transforms enable a variety of other benefits, including modeling sparsity and domain geometry. Our work builds extensively on recent work on the convergence of saddle-point algorithms, and on the theory of monotone operators.
              author={Mahadevan, Sridhar and Liu, Bo and Thomas, Philip and Dabney, Will and Giguere, Steve and Jacek, Nicholas and Gemp, Ian and Liu, Ji},
              title = {Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces},
              year = {2014},
              journal={arXiv preprint arXiv:1405.6757},
  • I. Gemp, R. Carthew, S. Hilgenfeldt. Cadherin-dependent Cell Morphology in an Epithelium: constructing a quantitative dynamical model. PLoS Computational Biology. 2011
    Cells in the Drosophila retina have well-defined morphologies that are attained during tissue morphogenesis. We present a computer simulation of the epithelial tissue in which the global interfacial energy between cells is minimized. Experimental data for both normal cells and mutant cells either lacking or misexpressing the adhesion protein N-cadherin can be explained by a simple model incorporating salient features of morphogenesis that include the timing of N-cadherin expression in cells and its temporal relationship to the remodeling of cell-cell contacts. The simulations reproduce the geometries of wild-type and mutant cells, distinguish features of cadherin dynamics, and emphasize the importance of adhesion protein biogenesis and its timing with respect to cell remodeling. The simulations also indicate that N-cadherin protein is recycled from inactive interfaces to active interfaces, thereby modulating adhesion strengths between cells.
              title={Cadherin-dependent cell morphology in an epithelium: constructing a quantitative dynamical model},
              author={Gemp, Ian M and Carthew, Richard W and Hilgenfeldt, Sascha},
              journal={PLoS computational biology},
              publisher={Public Library of Science}
  • G. Duncan, K. Rabl, I. Gemp, R. Heidelberger, W.B. Thoreson. Quantitative Analysis of Synaptic Release at the Photoreceptor Synapse. Biophysical Journal. 2010
    Exocytosis from the rod photoreceptor is stimulated by submicromolar Ca2+ and exhibits an unusually shallow dependence on presynaptic Ca2+. To provide a quantitative description of the photoreceptor Ca2+ sensor for exocytosis, we tested a family of conventional and allosteric computational models describing the final Ca2+-binding steps leading to exocytosis. Simulations were fit to two measures of release, evoked by flash-photolysis of caged Ca2+: exocytotic capacitance changes from individual rods and postsynaptic currents of second-order neurons. The best simulations supported the occupancy of only two Ca2+ binding sites on the rod Ca2+ sensor rather than the typical four or five. For most models, the on-rates for Ca2+ binding and maximal fusion rate were comparable to those of other neurons. However, the off-rates for Ca2+ unbinding were unexpectedly slow. In addition to contributing to the high-affinity of the photoreceptor Ca2+ sensor, slow Ca2+ unbinding may support the fusion of vesicles located at a distance from Ca2+ channels. In addition, partial sensor occupancy due to slow unbinding may contribute to the linearization of the first synapse in vision.
              title={Quantitative analysis of synaptic release at the photoreceptor synapse},
              author={Duncan, Gabriel and Rabl, Katalin and Gemp, Ian and Heidelberger, Ruth and Thoreson, Wallace B},
              journal={Biophysical journal},