How Long Does It Take to Spread a Rumor? Gossip Algorithms in Distributed Systems
Hello   |   Writing   |   Reading List   |   Backlog

How Long Does It Take to Spread a Rumor?
Gossip Algorithms in Distributed Systems

August 2023

Section 0: Note For the Reader


Section 1: Gossip Algorithms in Distributed Systems

We say a system is 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑\textit{distributed} if it runs on multiple servers that communicate by sending messages over a network. Although distributed systems can be challenging to program correctly, they can offer better performance and resiliency than single-node systems. In the background, many distributed systems use 𝐠𝐨𝐬𝐬𝐢𝐩 𝐚𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦𝐬\textbf{gossip algorithms} to coordinate sharing information quickly over large networks.

In a gossip algorithm, one node broadcasts an update to all other nodes through multiple rounds of point-to-point messaging. A node spreads an update by sending a message to (or “gossiping with”) a randomly selected neighbor once per round. In subsequent rounds, the selected node is “activated” and begins gossiping with its neighbors. We allow this process to continue until all nodes have received a message. For the remainder of our discussion, I’ll ignore many of the challenges associated with building real distributed systems and represent them as 𝑢𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑 𝑔𝑟𝑎𝑝h𝑠\textit{undirected graphs}.

𝐂𝐨𝐦𝐦𝐞𝐧𝐭:\textbf{Comment:} If you’re familiar with distributed systems you’ve likely heard of (or even used!) applications that run a variation of gossip. Notably, the distributed databases Riak, Apache Cassandra, and DynamoDB all use gossip algorithms for communicating cluster membership and node failure detection.


Consider a distributed database with nodes storing an identical set (𝐒)\Big(\textbf{S}). At any moment, any node may receive an update that modifies its copy of 𝐒\textbf{S}. Following the update, the node stores a different set (𝐒)\Big(\textbf{S}'). Until this change is propagated to all nodes, those storing 𝐒\textbf{S} and those storing 𝐒\textbf{S}' will return different results for identical operations.

To understand the scope of this problem, we must understand the performance of the gossip algorithm. Its clear that the number of rounds (RgR_g) required for gossip to complete is random, but what can we say about its 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛\textit{distribution}? In the following sections, we’ll calculate bounds on RgR_g by studying undirected complete graphs. By the end of this discussion, we’ll have a general answer to the following question:

Given an n node system, what’s the probability gossip fails to complete within x rounds?

𝐂𝐨𝐦𝐦𝐞𝐧𝐭\textbf{Comment}: A real number xx is an upper bound of a set 𝐑\textbf{R} if xrx \geq r for all r𝐑r \in \textbf{R}. For our purposes, the 𝑙𝑒𝑎𝑠𝑡 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑\textit{least upper bound} is the smallest demonstrable upper bound. A similar definition applies for the lower bound. For a given system, let 𝐑\textbf{R} represents the set of all values RgR_g may take. We’ll focus on finding the least upper bound (u0u_0) and greatest lower bound (l0l_0) for this set.

As you’ll soon see, RgR_g is 𝑢𝑛𝑏𝑜𝑢𝑛𝑑𝑒𝑑 𝑓𝑟𝑜𝑚 𝑎𝑏𝑜𝑣𝑒\textit{unbounded from above} so u0u_0 doesn’t exist. Instead, we’ll aim to find an alternative to the upper bound, some value rk𝐑r_k \in \textbf{R} that we can prove is larger than a significant portion of 𝐑\textbf{R}.


Section 2: Lower & Upper Bounds on The Gossip Process

Lower Bounds on Arbitrary Graphs

We’ll begin with a brief look at arbitrary graphs. Any results from this general case should also apply to our analysis on complete graphs.

𝐃𝐞𝐟𝐢𝐧𝐢𝐭𝐢𝐨𝐧:\textbf{Definition}: The 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟\textit{diameter} of a graph is the maximum shortest path distance between any pair of nodes. For a graph (𝐆)\Big(\textbf{G)} with a set of vertexes (𝐕)\Big(\textbf{V}):

Diam(𝐆)=max{d(vi,vj)|vi,vj𝐕}\begin{equation} \text{Diam}\Big(\textbf{G}) = \max\{ \ d\Big(v_i,v_j) \ | \ v_i, v_j \in \textbf{V} \ \} \end{equation}

Where d:V×Vd \colon V \times V \to \mathbb{R} a distance function defined on the graph. We’ll assume graphs are 𝑢𝑛𝑤𝑒𝑖𝑔h𝑡𝑒𝑑\textit{unweighted} and d(vi,vj)d\Big(v_i, v_j) is simply the count of edges between viv_i and vjv_j.

On an 𝐚𝐫𝐛𝐢𝐭𝐫𝐚𝐫𝐲 𝐠𝐫𝐚𝐩𝐡\textbf{arbitrary graph} with an update initialized at vov_o we can see that Rgmax{d(vo,vj)|vj𝐕}R_g \geq \max\{\ d\Big(v_o,v_j) \ | \ v_j \in \textbf{V} \ \}. In the worst case, vov_o is in the pair of nodes that define the diameter. In that event, we have RgDiam(𝐆)R_g \geq \text{Diam}\Big(\textbf{G}).

This inequality simply asserts that an update cannot travel further from its origin node than rounds have elapsed. If vov_o is five nodes from its furthest peer, we can be certain the algorithm does not complete in four rounds. This is a fine observation, but we can construct examples where Diam(𝐆)\text{Diam}\Big(\textbf{G}) is well below the minimum of RgR_g. One such example is a 𝐰𝐡𝐞𝐞𝐥 𝐠𝐫𝐚𝐩𝐡\textbf{wheel graph}.


Lower Bounds on Wheel Graphs

From looking at an nn node wheel graph, we can see Rgn1R_g \geq n - 1 regardless of the location of vov_o. There are just two cases to consider:

𝐂𝐨𝐦𝐦𝐞𝐧𝐭\textbf{Comment}: A wheel graph is among the worst possible configurations for gossip. Do you see why this is the case?

Assume an nn node wheel graph with the central node as the update origin. With j[1,n)j \in \Big[1, n) nodes updated, the probability the central node gossips to a non-updated node is (nj)/n(n - j)/n. If you’ve taken a course on probability, you may recognize this as the coupon collector’s problem. The expected number of rounds to complete the gossip process is 𝑒𝑥𝑎𝑐𝑡𝑙𝑦 𝑡h𝑒 𝑠𝑎𝑚𝑒\textit{exactly the same} as the solution to this well-studied problem:

E(Rg)=nHn=nk=1n1knln(n)+0.5772+1/2n E\Big(R_g) = nH_n = n\sum_{k=1}^{n}\frac{1}{k} \approx n \text{ln}\Big(n) + 0.5772 + 1/2n

The expectation of RgR_g doesn’t give any insight into the worst-case performance of the algorithm. We can approximate the right tail of the distribution by studying a single node’s state after tt rounds.

𝑃𝑟(𝑛𝑜𝑡 𝑢𝑝𝑑𝑎𝑡𝑒𝑑)=(11/n)tet/n𝑈𝑠𝑒 𝐹𝑎𝑐𝑡: (11/n)kek/n=e(nln(n)+cn)/nSet: t=nln(n)+cn=1/nec\begin{equation} \begin{align} \textit{Pr}(\textit{not updated}) & = & (1 - 1/n)^t \\ & \leq & e^{-t/n} \quad & \textit{Use Fact: } \big(1 - 1/n)^k \leq e^{-k/n} \\ & = & e^{-(n\text{ln}\Big(n) + cn)/n} \ \qquad & \text{Set: } t = n\text{ln}\Big(n) + cn \\ & = & 1/ne^{c} \end{align} \end{equation}

Finally, we union bound over nn nodes and find the probability that Rgnln(n)+cnR_g \geq n\text{ln}\Big(n) + cn is n/nec=ecn/ne^{-c} = e^{-c}. For large nn, this bound nicely relates the completion probability and the number of rounds beyond the expectation.

Using specific information about a graph allows us to produce a tighter bound than we could otherwise. In the next section we’ll apply this thinking to produce a lower bound on complete graphs.


Lower Bounds on Complete Graphs

Let’s turn our focus to 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑔𝑟𝑎𝑝h𝑠\textit{complete graphs}. As with wheel graphs, the diameter is not a tight lower bound for RgR_g. Let’s try running a simulation and seeing if it reveals any interesting patterns.

At right, I show the results from 10,000 iterations of the gossip process in a 64 node system. The first plot shows the updated nodes over time, the second shows the distribution of completion times, and the third shows the round-over-round growth rate.

If we had a reliable way to describe this probability distribution as a function of nn our job would be done. Unfortunately, we’re not there yet.

Let’s define UtU_t as the number of updated nodes after tt rounds. Notice that near the beginning of the process UtU_t nearly doubles in each round. Because we send just one message per node-round, doubling UtU_t is the optimal outcome for any round. As UtU_t increases, message efficiency decreases and we expect less round-over-round growth.

𝐔𝐩𝐝𝐚𝐭𝐞𝐝 𝐍𝐨𝐝𝐞𝐬 (𝐓𝐨𝐭𝐚𝐥 𝐁𝐲 𝐑𝐨𝐮𝐧𝐝)\textbf{Updated Nodes (Total By Round)}
𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 𝐨𝐟 𝐆𝐨𝐬𝐬𝐢𝐩 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐢𝐨𝐧 (𝐑𝐨𝐮𝐧𝐝𝐬)\textbf{Distribution of Gossip Completion (Rounds)}
𝐑𝐨𝐮𝐧𝐝-𝐎𝐯𝐞𝐫-𝐑𝐨𝐮𝐧𝐝 𝐆𝐫𝐨𝐰𝐭𝐡 (%)\textbf{Round-Over-Round Growth (\%)}

Though incredibly unlikely, we 𝑐𝑜𝑢𝑙𝑑\textit{could} double UtU_t in each round through the entire gossip process. If this were to happen, the process would complete in log2(n)\lceil \log_2\Big(n) \rceil rounds. This gives us our greatest lower bound on RgR_g, gossip will never complete before log2(n)\lceil \log_2\Big(n) \rceil rounds! The graph at right simulates a system with eight nodes completing gossip in log2(8)=3\lceil \log_2\Big(8) \rceil = 3 rounds.

𝐂𝐨𝐦𝐦𝐞𝐧𝐭:\textbf{Comment:} Consider that it requires at least n1n - 1 messages to update n1n - 1 nodes. The function M(j,n)M\Big(j, n) calculates the maximum messages sent through jj rounds in an nn node system, j=log2(n)j = \lceil \log_2\Big(n) \rceil is the smallest value that satisfies M(j,n)n1M\Big(j, n) \geq n - 1 for n2n \geq 2.

M(j,n)=k=0j1min(2k,n)\begin{equation} M\Big(j, n) = \sum_{k = 0}^{j-1} \min\Big(2^k, n) \end{equation}

With a randomized algorithm the probability that gossip completes in log2(n)\lceil \log_2\Big(n) \rceil rounds is 𝑣𝑒𝑟𝑦 𝑠𝑚𝑎𝑙𝑙\textit{very small} when nn is large. For example, A 2k2^k node system can complete in kk steps if 2k12^{k - 1} nodes are already updated through k1k - 1 rounds and the kthk^{th} round is executed perfectly. Conditioning on the former, the probability of the latter event is x=12k1x/2k\prod_{x=1}^{2^{k-1}} x / 2^k. Even at small values of kk this event is very improbable (e.g. k=3k=3 gives 0.58%\approx 0.58\% success probability).

We could design a 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐\textit{deterministic} algorithm that always finishes in log2(n)\lceil \log_2\Big(n) \rceil rounds (ignoring unreliable networks, failed nodes, etc.). This choice is probably not ideal in most distributed systems as it trades fault tolerance for a relatively minor performance improvement.


Demonstrating Non-Existence of an Upper Bound

Earlier I claimed we’d aim to find an upper bound on RgR_g. This was misleading, no such bound exists. For systems with n>2n > 2, we can show that RgR_g is unbounded from above.

Consider a three node system with two updated nodes. For gossip to complete in the next round, either of the updated nodes must message the third. Because this is a randomized algorithm there is a small probability the first two nodes gossip between themselves indefinitely.

𝐃𝐞𝐟𝐢𝐧𝐢𝐭𝐢𝐨𝐧:\textbf{Definition}: The probability mass function (pmf) of the geometric distribution is given by:

f(X=r)=(1p)r1p,r\begin{equation} f\Big(X=r) = (1-p)^{r-1} \cdot p \ , \ r \in \mathbb{N} \end{equation}

The probability a round passes without a message sent to the third node is (1/2)2=1/4\Big(1/2)^2 = 1/4. Because peers are chosen independently, the distribution of the number of rounds until the third node is updated is geometric with parameter p=3/4p = 3/4. Although the pmf of Geom(3/4)\text{Geom}\Big(3/4) decreases quite quickly, the distribution is memoryless and assigns non-zero probability to all rr \in \mathbb{N}.

Do we stop our analysis here and say we’re limited to “On an undirected complete graph 𝑃𝑟(Rg=r)>0\textit{Pr}\Big(R_g = r) > 0 for all r[log2(n),)r \in [\log_2\Big(n), \infty)”?

No, surely we can do better. In the following sections we’ll provide an alternative to the upper bound by estimating how often RgR_g exceeds any given value.


Section 3: Calculating Expected Time to Gossip

So far, we’ve found a lower bound and shown the non-existence of an upper bound. Before finding an alternative to the upper bound we’ll find the expectation of RgR_g. In some literature, gossip algorithms are referred to as “epidemic algorithms” and analyzed using models from epidemiology. We’ll use a similar model to find the expected number of updated nodes as a function of rounds, u(t)u\Big(t).

Let’s model du/dtdu/dt as a function of updated nodes, non-updated nodes, and the “contact rate” between any pair of an updated and non-updated node. On a complete graph there are (nu)u\big(n - u)u edges between these pairs. Instead of a “contact” rate, we’ll treat (β)\big(\beta) as the update probability per edge-round.

dudt=β(nu)u\begin{equation} \frac{du}{dt} = \beta\Big(n - u)u \end{equation}

To match the literature, we’ll use β=1/n\beta=1/n as the contact rate, but we should have some reservations about this choice. Think about this choice, does it imply a faster or slower gossip process than what we described previously?

A constant β\beta implies any updated node will update any non-updated node with fixed probability (e.g. 1/n1/n). This is an 𝑜𝑣𝑒𝑟𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒\textit{overestimate} of the contact rate and implies a faster completion time. This value ignores collisions and the one message per node-round cap we’ve imposed. It’s an OK approximation for the moment but better estimates are available.

dudt=β(nu)u,u(0)=1\begin{equation} \frac{du}{dt} = \beta\Big(n - u)u \ \ ,\ \ \ u\big(0) = 1 \end{equation}

If we solve for u(t)u\big(t) and set t=kln(n)t = k\text{ln}\Big(n), we get an equation that allows us to draw a some nice conclusions about the average performance of the algorithm.

u(t)=neβnteβnt+n1=netet+n1=nk+1nk+n1𝑆𝑒𝑡: t=2ln(n)\begin{equation} \begin{aligned} u\big(t) & = & \frac{ne^{\beta n t}}{e^{\beta nt} + n - 1} \\ & = & \frac{ne^{t}}{e^{t} + n - 1} \\ & = & \frac{n^{k + 1}}{n^k + n - 1} \qquad \textit{Set: } t=2\text{ln}\big(n) \end{aligned} \end{equation}

When nn is large and t=2ln(n)t = 2\text{ln}\big(n), u(t)n1u\big(t) \approx n - 1. With this, we can say that after 2ln(n)2\text{ln}\big(n) rounds almost all nodes have received the update, E(Rg)2ln(n)E\big(R_g) \approx 2\text{ln}\big(n).

𝐂𝐨𝐦𝐦𝐞𝐧𝐭:\textbf{Comment:} For a moment, we’ll ignore the fact that 2ln(n)2\text{ln}\big(n) is an underestimate of E(Rg)E\big(R_g) for any given nn. We can estimate the total messages sent by integrating u(t)u\big(t).

netet+n1dt=nln(et+n1)nkln(n)nln(n)=n(k1)ln(n),k>1\begin{equation} \begin{aligned} \int \frac{ne^{t}}{e^{t}+n-1} \ dt & = & n\text{ln}\big(e^t + n - 1) \\ & \approx & nk\text{ln}\big(n) - n\text{ln}\big(n) \\ & = & n\big(k - 1)\text{ln}\big(n) \ \ , \ \ k > 1 \end{aligned} \end{equation}

For a 128 node system this gives us an expectation of 622622 messages. Does this seem lightweight to you?

𝐄𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐑𝐨𝐮𝐧𝐝𝐬 𝐓𝐨 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞\textbf{Expected Rounds To Complete}

The choice of β\beta is quite important. We can set β\beta to be dependent on the current state of the system. By considering message collisions we’ll produce a more conservative estimate. From the perspective of a single, non-updated node the probability of being updated by 𝑎𝑛𝑦\textit{any} one of uu updated nodes is:

β=(1(11/n)u)u1(1eu/n)u1.\begin{equation} \beta = \big(1 - \big(1 - 1/n)^u)u^{-1} \approx \big(1 - e^{-u/n})u^{-1}. \end{equation}

Where (1(11/n)u)\big(1 - \big(1 - 1/n)^u) represents the probability of being updated by at least one node. We divide that value by uu to give the average update rate per edge. Using this value as β\beta gives:

dudt=(1eu/n)(nu)(u/nu2/2n2)(nu)By Taylor Expansion\begin{equation} \begin{aligned} \frac{du}{dt} & = & \big(1 - e^{-u/n})\Big(n - u) \\ & \approx & \ \big( u/n - u^2/2n^2 )\big(n - u) \qquad \text{By Taylor Expansion} \\ \end{aligned} \end{equation}

We can solve this equation as well, but the solution is not quite as nice. If we set u(t)=n1u\big(t) = n - 1 how much longer to we expect the gossip process to take? Our prior analysis gives 8.32\sim 8.32 rounds but the more precise analysis gives 11.78\sim 11.78 rounds. From the simulations run in 𝐬𝐞𝐜𝐭𝐢𝐨𝐧 𝟐\textbf{section 2}, 11.8 seems like a much more reasonable estimate than 8.32. How did this happen?

𝐄𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐍𝐨𝐝𝐞𝐬 𝐔𝐩𝐝𝐚𝐭𝐞𝐝 (𝐁𝐲 𝐑𝐨𝐮𝐧𝐝)\textbf{Expected Number of Nodes Updated (By Round)}
𝐄𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐑𝐨𝐮𝐧𝐝𝐬 𝐓𝐨 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞\textbf{Expected Number of Rounds To Complete}

As it turns out, we have not stumbled into a decades-long error in the theoretical computer science community. These numbers are quite different, but we can show these two functions behave well asymptotically. To do so, we’ll solve both equations as functions of tt and compare the ratios as nn \to \infty.

Using the original model that ignores collisions and the a message cap we get t2ln(n)t \approx 2\text{ln}\big(n) as expected.

n1=netet+n1tln(n2)=2ln(n)\begin{equation} n - 1 = \frac{ne^{t}}{{e^{t} + n - 1}} \implies t \approx \text{ln}\big(n^2) = 2\text{ln}\big(n) \end{equation}

To solve the equation with collisions and a message cap, we’ll treat nn and n1n - 1 as equivelent and solve the following equation:

n1=n(2netn(n+n2+2net))n2+2nettln(n3/2)\begin{equation} n - 1 = \frac{n \big(2 n e^t - n \big(-n + \sqrt{n^2 + 2 n e^t}))}{n^2 + 2 n e^t} \implies t \approx \text{ln}\big(n^3/2) \end{equation}

When nn is large, the ratio of these two functions is 1.5. The key result from this diversion is that a result obtained using β=1/n\beta = 1/n is of the same order (technically Θ(ln(n))\Theta\big(ln\big(n)), see: Landau notation) as the result obtained from a more rigorous analysis. Let’s revise our statement from earlier, E(Rg)3ln(n/2)E\big(R_g) \approx 3\text{ln}\big(n/2).

𝐑𝐚𝐭𝐢𝐨 𝐨𝐟 𝐑𝐨𝐮𝐧𝐝𝐬 𝐓𝐨 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞, 𝐈𝐧𝐜𝐥𝐮𝐝𝐞/𝐈𝐠𝐧𝐨𝐫𝐞 𝐂𝐨𝐥𝐥𝐢𝐬𝐢𝐨𝐧𝐬\textbf{Ratio of Rounds To Complete, Include/Ignore Collisions}

Section 4: Calculating Variance of Time To Gossip

We’re still in search of a way to describe the right tail of the distribution of RgR_g. In this section we’ll calculate the distributions of the 𝑡𝑖𝑚𝑒\textit{time} the system spends with exactly uu nodes updated. This approach will confirm our previous result about the expected completion time and provide the variance for the entire process.

Waiting on Updates

As in the previous section, we’ll assume each of (nu)u\big(n - u)u edges sends a message with probability β=1/n\beta = 1/n per round. Because peer selection is randomized, we may treat the time to send a message on any edge as exponentially distributed with mean of nn rounds (i.e. 𝐸𝑥𝑝(1/n)\textit{Exp}\big(1/n)).

Regardless of the number of rounds elapsed, the waiting time until uu is incremented is distributed as the minimum of (nu)u\big(n - u)u independent exponential variables with parameter nn.

Using order statistics we can show that the 𝑚𝑖𝑛𝑖𝑚𝑢𝑚\textit{minimum} value from a sample of kk exponential random variables with parameter λ\lambda is exponentially distributed with parameter kλk\lambda. In our case, the time until another node is updated is distributed as 𝐸𝑥𝑝(u(nu)n)\textit{Exp}\big(\frac{u\big(n - u)}{n}).

𝐃𝐞𝐟𝐢𝐧𝐢𝐭𝐢𝐨𝐧:\textbf{Definition:} The probability distribution function (pdf) and cumulative distribution function (cdf) of the exponential distribution are given by:

f(X=r)=λeλx𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑡𝑖𝑜𝑛F(X=r)=1eλx𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛\begin{equation} \begin{align} f\big(X = r) & = & \lambda e^{-\lambda x} & \qquad \textit{probability distribtion} \\ F\big(X = r) & = & 1 - e^{-\lambda x} & \qquad \textit{cumulative distribution} \end{align} \end{equation}

An exponential distribution with parameter λ\lambda has mean 1/λ1/\lambda and variance 1/λ21/\lambda^2.

𝐃𝐞𝐟𝐢𝐧𝐢𝐭𝐢𝐨𝐧:\textbf{Definition:} The kthk^{th} order statistic is the kthk^{th} smallest value from a sample of nn random variables. If we know the distribution used to generate a sample, we can determine the distribution of any order statistic of that sample. Generally, the kthk^{th} order statistic is given as:

FX(k)(x)=j=kn(nj)[Fx(x)]j[1Fx(x)]nj\begin{equation} F_X_\left(k\right) \big(x) = \sum_{j=k}^{n} \binom{n}{j} [F_x\big(x)]^j [1 - F_x\big(x)]^{n-j} \end{equation}

In each term we calculate the probability that jj events are smaller and njn-j events are larger than xx. When k=1k = 1 we have a special case that allows us to simplify this sum to:

FX(k)(x)=1[1Fx(x)]n\begin{equation} F_X_\left(k\right) \big(x) = 1 - [1 - F_x\big(x)]^{n} \end{equation}

And when the sample is exponential, we get:

FX(k)(x)=1[1(1eλx)]n=1enλx\begin{equation} \begin{aligned} F_X_\left(k\right) \big(x) & = & 1 - [1 - \big(1 - e^{-\lambda x})]^{n} \\ & = & 1 - e^{-n\lambda x} \end{aligned} \end{equation}

This result gives us the cumulative distribution of 𝐸𝑥𝑝(nλ)\textit{Exp}\big(n\lambda).


We’ve completed the challenging work of formulating the problem, now we must compute the expectation and variance on each interval. Let’s define τu\tau_u as the first time the system has at least uu nodes updated. We’ll then confirm that E(τn)E(Rg)E\big(\tau_n) \approx E\big(R_g). Our new framing of the problem should preserve our expectation of RgR_g.

E(τn)=u=1n1nu(nu)=u=1n11u+1(nu)𝑎𝑠 𝑝𝑎𝑟𝑡𝑖𝑎𝑙 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠2Hn2ln(n)+0.5772+12n𝑎𝑝𝑝𝑟𝑜𝑥. 𝑜𝑓 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝑆𝑒𝑟𝑖𝑒𝑠2ln(n)\begin{equation} \begin{aligned} E\big(\tau_n) & = & \sum_{u = 1}^{n - 1} \frac{n}{u\big(n - u)} \\ & = & \sum_{u = 1}^{n - 1} \frac{1}{u} + \frac{1}{\big(n - u)} \quad \textit{as partial fractions}\\ & \approx & 2 H_{n} \\ & \approx & 2 \text{ln}\big(n) + 0.5772 + \frac{1}{2n} \quad \textit{approx. of Harmonic Series} \\ & \approx & 2\text{ln}\big(n) \end{aligned} \end{equation}

Excellent, just as expected! Because the waiting times are independent variables we can sum the variances of all waits to get 𝑉𝑎𝑟(τn)\textit{Var}\big(\tau_{n}).

Var(τn)=u=1n1(nu(nu))2=u=1n11u2+1(nu)22n2nu𝑎𝑠 𝑝𝑎𝑟𝑡𝑖𝑎𝑙 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠u=1n11u2+1(nu)22π2/6\begin{equation} \begin{aligned} \text{Var}\big(\tau_n) & = & \sum_{u = 1}^{n - 1} (\frac{n}{u\big(n - u)})^{2} \\ & = & \sum_{u=1}^{n-1} \frac{1}{u^{2}} + \frac{1}{\big(n-u)^{2}} - \frac{2}{n^2 - nu} \quad \textit{as partial fractions} \\ & \approx & \sum_{u=1}^{n-1} \frac{1}{u^{2}} + \frac{1}{\big(n-u)^{2}} \\ & \approx & 2 \pi^2 / 6 \\ \end{aligned} \end{equation}

The last approximation uses the fact that the sum of inverse squares (see: The Basel Problem) converges to π2/6\pi^2/6. When nn is sufficiently large we can approximate i=1n1(1/u2+1/(nu)2)\sum_{i=1}^{n - 1}\left(1/u^2 + 1/\big(n - u)^2 \right) as π2/6\pi^2/6. This is a very interesting sum by itself, but it also gives us a very useful result. It suggests variance is bounded!

𝐂𝐨𝐦𝐦𝐞𝐧𝐭\textbf{Comment}: The variance of τn\tau_n is almost entirely comprised of the variance waiting on the first and last few nodes. Consider an nn node system with one updated node. In the discrete case, we’d expect the waiting time to update a second node to be tightly concentrated around one round.

However, the distribution for the first wait is 𝐸𝑥𝑝(n1n)\textit{Exp}\big(\frac{n-1}{n}). This distribution provides a mean that matches our intuition, but also produces 𝑉𝑎𝑟(τ2)=1\textit{Var}\big(\tau_2) = 1. Does this seem reasonable? Counter-intuitively, adding the first additional node is responsible for almost a third of the reported variance.


In the last section we showed 3ln(n/2)3\text{ln}\big(n/2) is a more reasonable estimate for E(Rg)E\big(R_g) than 2ln(n)2\text{ln}\big(n). Can we show the variance is still bounded when du/dt=(1eu/n)(nu)du/dt = \big(1 - e^{-u/n})\big(n - u)?

((nu)(unu22n2))1=1u2n2un+1u((nu)(unu22n2))2=1(u2n)2+4(un)2+3nu+1u23n(u2n)\begin{equation} \begin{aligned} \left(\big(n - u)\big(\frac{u}{n} - \frac{u^2}{2n^2})\right)^{-1} & = & \frac{1}{u - 2n} - \frac{2}{u - n} + \frac{1}{u} \\ \left(\big(n - u)\big(\frac{u}{n} - \frac{u^2}{2n^2})\right)^{-2} & = &\frac{1}{\big(u - 2n)^2} + \frac{4}{\big(u - n)^2} + \frac{3}{nu} + \frac{1}{u^2} - \frac{3}{n\big(u - 2n)} \\ \end{aligned} \end{equation}

Var(τn)=u=1n11(u2n)2+4(un)2+3nu+1u23n(u2n)5π2/6\begin{equation} \begin{aligned} \text{Var}\big(\tau_n) & = & \sum_{u=1}^{n-1} \frac{1}{\big(u - 2n)^2} + \frac{4}{\big(u - n)^2} + \frac{3}{nu} + \frac{1}{u^2} - \frac{3}{n\big(u - 2n)} & \approx & 5 \pi^2 / 6 \end{aligned} \end{equation}

We can! So what are the implications of having bounded variance? We confirmed that E(τn)E\big(\tau_n) increases slowly with nn and showed τn\tau_n stays near the expectation even for very large nn. If we use gossip to disseminate information in massive systems we don’t need to worry about worst case performance becoming orders of magnitude worse than the expectation.

We should be a but careful about this estimate though. We can show that the second-degree Taylor series 𝑜𝑣𝑒𝑟𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠\textit{overestimates} the variance when uu is near nn. It’s possible this would improve with a better approximation, but we’ll proceed for now.

We’re still missing a tidy answer to the question posed in the first section. If we had the distribution of τn\tau_n we could find the cumulative distribution function and we’d have a neat final answer. Unfortunately, we’ll have to rely on one more approximation before moving on.

𝐂𝐨𝐦𝐦𝐞𝐧𝐭:\textbf{Comment:} Having a probability distribution for gossip time would be a very satisyfing result. The sum of kk exponentials with the same parameter is called the Erlang distribution, it’s a specific case of the gamma distribution. Is there an argument we can make that allows us to just use the gamma?

Chebyshev’s Inequality bounds the probability a random variable deviates from its mean by more than a given amount. It’s an incredibly general result and requires only the variance of the distribution.

P(|Xμ|c)Var(X)c2\begin{equation} P(|X - \mu| \geq c) \leq \frac{\text{Var}(X)}{c^2} \end{equation}

By definition, Chebyshev’s ineuquality provides another 𝑜𝑣𝑒𝑟𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒\textit{overestimate} of the probability that gossip exceeds x rounds to finish. Consequently, given an n node system, the probability that gossip fails to complete within x rounds is no more than:

p5π26(x3ln(n))2\begin{equation} p \leq \frac{5\pi^2}{6\big(x - 3\ln\big(n))^2} \end{equation}

For today, this is a satisfactory answer. I’d encourage you to try a few values. What is an upper bound on the 99th percentile of gossip time on a 1M node distributed system?

𝐂𝐨𝐦𝐦𝐞𝐧𝐭:\textbf{Comment:} This is an OK answer. We set out to understand the risk of large divergences when nn is very large, but what if we 𝑟𝑒𝑎𝑙𝑙𝑦\textit{really} needed a distribution? I suggest two options:

  • The maximum-entropy distribution given a minimum (log2(n)\log_2\big(n)) and a mean (3ln(n)3\ln\big(n)) follows the pareto distribution. Can you solve for the shape parameter of the Pareto to find a closed form distribution which we 𝑘𝑛𝑜𝑤\textit{know} will overstimate the weight on the tails?

  • If we want a more realistic (rather than more conservative estimate) we might use method of moments to fit the mean and variance of this distribution to an Erlang distribution. The erlang distribution is given by the sum of kk exponentials with fixed parameter (λ\lambda). Although we have mixed parameters, can some approximation using some kk and λ\lambda get close?

𝐒𝐞𝐜𝐨𝐧𝐝 𝐃𝐞𝐠𝐫𝐞𝐞 𝐓𝐚𝐲𝐥𝐨𝐫 𝐒𝐞𝐫𝐢𝐞𝐬 𝐎𝐯𝐞𝐫𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐞𝐬 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞\textbf{Second Degree Taylor Series Overestimates Variance}


Section 5: Gossip Algorithms For the Real World

Finally, I’d like to mention a few variations of the idealized gossip algorithm that can make it more suitable for real-world distributed systems. This section does not build on any previous results, but instead answers a few questions about making gossip more robust.

Stopping The Gossip Process

We haven’t described a mechanism for 𝐬𝐭𝐨𝐩𝐩𝐢𝐧𝐠\textbf{stopping} gossip. In distributed systems we communicate information about remote state by passing messages, so are we meant to send another round of messages to signal that gossip has completed?

Not necessarily! From the previous section, we expect that gossip completes in 3ln(n/2)\sim3\text{ln}\big(n/2) rounds. If we set 3klog(n)3k\log\Big(n) as a cap on the number of rounds gossiped per node, for some k>1k > 1, all nodes will be updated 𝑏𝑒𝑓𝑜𝑟𝑒\textit{before} the algorithm stops. We can set kk even higher to achieve a given success probability. However, there are less message intensive solutions. Demers demonstrates we can achieve the same completeness guarantees if we deactivate nodes with probability 1/n1/n after each message they send to an already updated node.

The graph at right illustrates what happens when the deactivation probability is set 𝑡𝑜𝑜 h𝑖𝑔h\textit{too high}. When set properly the failure percentage is near zero. In this example, the algorithm will fail in ~20% of attempts.


Multiple Peers Per Round

In 𝐒𝐞𝐜𝐭𝐢𝐨𝐧 𝟑\textbf{Section 3} we assumed β=1/n\beta = 1/n and then in 𝐒𝐞𝐜𝐭𝐢𝐨𝐧 𝟒\textbf{Section 4} revised these values to be a bit more realistic. What about redoing those calculations with β=k/n,k>1\beta = k/n, k > 1? This suggests faster gossip time at the cost of some additional message overhead. Can we simply select multiple peers per round? What about broadcasting a single update to 𝑎𝑙𝑙\textit{all} nodes in the network in a single round?

Gossiping multiple point-to-point messages per round can be an effective way to improve the speed of the gossip algorithm. However, broadcasting an update to all nodes from a single site (“single-round broadcast”) is normally not the best choice in large distributed systems.


We’ve mostly focused on idealized models of distributed systems, but in practice we must account for unreliable networks and incomplete information. Creating a reliable broadcast from point-to-point messages helps us be resilient to both of these failure modes. We 𝑐𝑎𝑛\textit{can} construct reliable broadcast from single-round broadcast, but it comes at a cost of significantly more messages transmitted when networks are unreliable.

The graphs above illustrate how reliable broadcast built from point-to-point messaging and single-round broadcast perform when faced with unreliable networks.


Spreading Multiple Updates

Highly correlated traffic is a common occurrence in real systems. Imagine a system that receives twenty updates in quick succession. Should we initialize twenty unique gossip processes or is there a more efficient way to spread this information?

One solution is to use pull-based messaging. With pull-based difference resolution, every node contacts one peer per round and we mark a node updated if they’ve 𝑐𝑜𝑛𝑡𝑎𝑐𝑡𝑒𝑑\textit{contacted} an updated node. When there are many active updates, Its likely to a node contacts a peer with at least one new update.

However, if the system receives updates infrequently and far apart, pulling creates additional messaging overhead with relatively little benefit.


The average performance of pull-based messaging is a bit faster than the standard (push) algorithm. We can approximate the probability a node has not been updated after ii rounds as a recurrence relationship for both push and pull-based gossip.

pi+1=pi(11/n)n(1pi)pi+1=pi2\begin{equation} \begin{align} p_{i + 1} & = & p_i\Big(1 - 1/n)^{n\Big(1-p_i)} \\ p_{i + 1} & = & p_i^2 \\ \end{align} \end{equation}

In the earlier rounds, the performance of the two algorithms is close because pi(11/n)n(1pi)p_i \approx \big(1 - 1/n)^{n\big(1 - p_i)}. When pp is smaller, the pull-based algorithm decreases to zero quadratically while the push based method decreases by a constant factor 1e\frac{1}{e}.

It was demonstrated that using a combination of push in the beginning and then pull towards the end, total messages sent and time to finish gossip can be improved compared to either algorithm in isolation.

𝐒𝐢𝐧𝐠𝐥𝐞 𝐍𝐨𝐝𝐞 𝐔𝐩𝐝𝐚𝐭𝐞 𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 - 𝐏𝐮𝐬𝐡 & 𝐏𝐮𝐥𝐥-𝐁𝐚𝐬𝐞𝐝 𝐆𝐨𝐬𝐬𝐢𝐩 (𝐁𝐲 𝐑𝐨𝐮𝐧𝐝)\textbf{Single Node Update Probability - Push & Pull-Based Gossip (By Round)}
𝐃𝐢𝐟𝐟. 𝐒𝐢𝐧𝐠𝐥𝐞 𝐍𝐨𝐝𝐞 𝐔𝐩𝐝𝐚𝐭𝐞 𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 - 𝐏𝐮𝐬𝐡 & 𝐏𝐮𝐥𝐥-𝐁𝐚𝐬𝐞𝐝 𝐆𝐨𝐬𝐬𝐢𝐩 (𝐑𝐨𝐮𝐧𝐝-𝐨𝐯𝐞𝐫-𝐑𝐨𝐮𝐧𝐝)\textbf{Diff. Single Node Update Probability - Push & Pull-Based Gossip (Round-over-Round)}

Section 6: Closing Remarks

This concludes our tour of gossip algorithms. Let’s do a brief review:

Thanks for reading! I’ve enjoyed studying gossip algorithms the past few weeks because they’re easy to understand and easy to implement, but analyzing them involves pulling together bits of mathematics from many areas of study. If you’re an engineer reading this, hopefully its gotten you to think of analysis of algorithms as a bit more than a set of facts about complexity. If you’re a math enthusiast, hopefully you’ve learned a bit more about distributed computer systems and are able to apply techniques mentioned in this post to your own problems.


References

Course & Lecture Notes

Course Notes: Ayalvadi Ganesh, Complex Networks

Course Notes: Indranil Gupta, Distributed Systems

Course Notes: Martin Kleppmann, Concurrent and Distributed Systems

Course Notes: Sun He, Gossip Algorithms

Talks & Presentations

Dahlia Malkhi on Randomized Gossip Methods

Patterns of Distributed Systems: Gossip Dissemination

Martin Kleppmann on Distributed Broadcast

Research

Haobo Yu, Lee Breslau, and Scott Shenker. 1999. A scalable Web cache consistency architecture. SIGCOMM Comput. Commun. Rev. 29, 4 (Oct. 1999), 163–174

Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, and Doug Terry. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing (PODC ’87). Association for Computing Machinery, New York, NY, USA, 1–12.

Fan, C. Kenneth, Bjorn Poonen, and George Poonen. “How to Spread Rumors Fast.” Mathematics Magazine 70, no. 1 (1997): 40–42

Pittel, Boris. “On Spreading a Rumor.” SIAM Journal on Applied Mathematics, vol. 47, no. 1, 1987, pp. 213–23.

R. Karp, C. Schindelhauer, S. Shenker and B. Vocking, “Randomized rumor spreading,” Proceedings 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, USA, 2000, pp. 565-574, doi: 10.1109/SFCS.2000.892324.

A. Das, I. Gupta and A. Motivala, “SWIM: scalable weakly-consistent infection-style process group membership protocol,” Proceedings International Conference on Dependable Systems and Networks, Washington, DC, USA, 2002, pp. 303-312, doi: 10.1109/DSN.2002.1028914.

Benjamin Doerr and Marvin Künnemann. 2014. Tight analysis of randomized rumor spreading in complete graphs. In Proceedings of the Meeting on Analytic Algorithmics and Combinatorics. Society for Industrial and Applied Mathematics, USA, 82–91.

Frieze, A.M., & Grimmett, G.R. (1985). The shortest-path problem for graphs with random arc-lengths. Discret. Appl. Math., 10, 57-77.