In the late 1990s, mathematicians and engineers became increasingly interested in trying to understand how spontaneous synchronicity occurs in nature. Specifically, there’s a book called Sync by Steven Strogatz (Applied Math, Cornell) which poetically describes how thousands of fireflies on the beaches of Thailand spontaneously blink in locked phase. Similar synchronicity occurs in schools of fish and flocks of birds as the group moves as a collective through only local coordination rules. The science of synchronicity gave rise to a provocative question for engineers in the 2000s: can we come up local coordination rules such that a collection of arbitrary agents (birds, fish, sensors, etc.) synchronize their behavior?

This question was probed, first in the math and physics community in the 1970s, in the context of Kuramoto oscillators and other mathematical models of electrical phenomena. Somewhat in parallel, control theorists John Tsitsiklis and Dimitri Bertsekas tried to connect these ideas to numerical optimization in the 1980s. Then, a famous series of papers beginning with “Coordination of groups of mobile autonomous agents using nearest neighbor rules” (Jadbabaie et al IEEE TAC 2003,) proposed a protocol in which each agent has a local state vector and updates its local state by computing a weighted local average of its neighbors’ states. Thereby, through only local communication, we can obtain * globally emergent* behavior: it can converge to a state that is common to all other agents, which is the average of all agents’ individual states. This observation then yielded a burgeoning field of networked controls and distributed signal processing broadly referred to as

*.*

**consensus**Numerous works built on top of the consensus protocol, primarily focusing on the case where agents not only seek to reach consensus but also minimize a convex function that is a sum of local functions available only to each agent. Famously, the paper “Distributed Subgradient Methods for Multi-Agent Optimization” (Nedich & Ozdaglar, IEEE TAC 2009) gave rise to the field of * consensus optimization*. In this work, the authors proposed a distributed version of gradient descent, that through only queries to agents’ local functions and weighted averaging with their neighbors, allows them to approximately reach the minimizer of a common global convex objective.

This algorithm was reinterpreted by the optimization community as a penalty method approximation of a constrained optimization problem, and therefore inferior to methods which can solve it exactly through use of Lagrange duality. The rest of the story of consensus optimization is about relaxing the conditions on the communications network that allows agents to exchange information, the technicalities regarding the objective to be minimized (from convexity to non-convexity, bounded gradients to unbounded gradients), and how to improve the learning rates by coming up with adaptations of acceleration and Newton’s method for decentralized settings.

Through all of these developments, consensus always presumes that agents reaching a common decision is the goal. However, we can easily envision technological applications in which this is not the case. Consider, for instance, an autonomous traffic network where sensory feeds on a car and a camera are both relevant to related inference tasks, but it is actively harmful (i.e., degrades local mean square error) for them to come up with identical statistical model parameters. This phenomenon holds more generally in heterogeneous networks such as ground and aerial robots, or IoT applications of smart devices in the home: the heterogeneity may be in agents’ observed data or the inference task they’re solving.

For these modern networked systems of intelligent smart devices, there is an open question of how to design the system such that the sum is more than the component parts. How does one agent exploit the relevant information of another, while ignoring irrelevant information? These questions motivate interesting questions that bridge decentralized optimization and game theory: when to communicate, what to communicate, and for which statistical learning tasks is this helpful? Two of my recent papers (listed below) provide a very small step in this direction from an optimization perspective; however, there is much more to be done that incorporates ideas from game theory.

A. Koppel, B. Sadler, and A. Ribeiro, “Proximity without Consensus in Online Multi-Agent Optimization,” in IEEE Trans. Signal Process., Mar. 2017.

H. Pradhan, A. S. Bedi, A. Koppel, and K. Rajawat, “Exact Decentralized Online Nonparametric Optimization,” in IEEE Global Conf. on Signals and Info. Processing (To appear), Anaheim, CA, Nov. 26-28, 2018.