1: Gossip Algorithms in Distributed Systems
We say a system is if it runs on multiple servers that communicate by sending messages over a network. Although distributed systems can be challenging to program correctly, they can offer better performance and resiliency than single-node systems. In the background, many distributed systems use to coordinate sharing information quickly over large networks.
If youβre familiar with distributed systems youβve likely heard of (or even used!) applications that run a variation of gossip. Notably, the distributed databases Riak, Apache Cassandra, and DynamoDB all use gossip algorithms for communicating information about cluster membership and node failure detection.
: Randomized Gossip on Complete Graph
In a gossip algorithm, one node broadcasts an update to all other nodes through multiple rounds of point-to-point messaging. A node spreads an update by sending a message to (or βgossiping withβ) a randomly selected neighbor once per round. In subsequent rounds, the selected node is βactivatedβ and begins gossiping with its neighbors. We allow this process to continue until all nodes have received a message. For the remainder of our post, Iβll ignore many of the challenges associated with building real distributed systems and represent them as .
Consider a distributed database with nodes storing an identical set . At any moment, any node may receive an update that modifies its copy of . Following the update, the node stores a different set . Until this change is propagated to all nodes, those storing and those storing will return different results for identical operations. To understand the magnitude of this problem, we must understand the performance of the gossip algorithm. Itβs clear that the number of rounds () required for gossip to complete is variable, but what can we say about its ?
Section 2: Lower & Upper Bounds on The Gossip Process
Weβll begin with a brief look at arbitrary graphs. Any results from this general case should also apply to our analysis on complete graphs. On an arbitrary graph with an update initialized at we can see that . In the worst case, is in the pair of nodes that define the diameter. In that event, we have .
The of a graph is the max. shortest path distance between any pair of nodes. For a graph with a set of vertexes :
Where a distance function defined on the graph. Weβll assume graphs are unweighted and is the count of edges between and .
: Randomized Gossip on Wheel Graph
I calulate an upper bound for gossiping on a star graph in a seperate post.
: Deterministic Gossip
We can simulate a system completing gossip in rounds. With randomized gossip, the probability the algorithm completes in rounds becomes as increases.
: Rounds to Gossip Is Unbounded
This inequality asserts that an update cannot travel further from its origin node than rounds have elapsed. If is five nodes from its furthest peer, we can be certain the algorithm does not complete in four rounds. This is a fine observation, but we can construct examples where is well below the minimum of . One such example is a .
From looking at an node star graph, we can see regardless of the location of . There are just two cases to consider:
The central node is the origin node. This node must send at least messages to update the outer nodes.
An outer node is the origin node. This node sends one message to the central node, and the central node sends at least messages to update the remaining nodes.
Using specific information about a graph allows us to produce a tighter bound than we could otherwise. Letβs turn our focus to . As with star graphs, the diameter is not a tight lower bound for .
Letβs define as the number of updated nodes after rounds. Intuitively, at the beginning of the process we expect to nearly double in each round. Because we send just one message per node/round, doubling is the optimal outcome for any round. As increases, message efficiency decreases and we expect less round-over-round growth.
Though incredibly unlikely, we double in each round through the entire gossip process and the process would complete in rounds. This gives us a greatest lower bound on , gossip will never complete before rounds!
We could design a algorithm that always finishes in rounds (ignoring unreliable networks, failed nodes, etc.), but this choice is likely not ideal in most distributed systems as it trades fault tolerance for a relatively minor performance improvement.
Now letβs consider an upper bound for . For systems with , is unbounded from above. Consider a three node system with two updated nodes. For gossip to complete in the next round, either of the updated nodes must message the third. Because peers are chosen independently, the distribution of the number of rounds until the third node is updated is geometrically distributed. Although exceedingly unlikely, this gives us a non-zero probability for all . Although we donβt have a strict upper bound, In the following sections weβll provide a probabilistic upper bound on by estimating how often it exceeds any given threshold.
Section 3: Calculating Expected Time to Gossip
Before finding a probabilistic upper bound weβll consider the of . In some literature, gossip algorithms are referred to as βepidemic algorithmsβ and analyzed using models from epidemiology. Weβll use a similar model to find the expected number of updated nodes as a function of rounds, .
Letβs model as a function of updated nodes (), non-updated nodes (), and the βcontact rateβ between any pair of an updated and non-updated node (). On a complete graph there are edges between these pairs. Instead of a βcontactβ rate, weβll treat as the message probability per edge-round.
Instead, we can set to be dependent on the current state of the system. From the perspective of a non-updated node, the probability of being updated by one of updated nodes is:
Where represents the probability of being updated by at least one node and dividing by gives the average update rate per edge. Using a second-degree Taylor Expansion of gives:
As in the simpler case, we can find and then solve for . I will admit that I needed to poke at Wolfram a while to resolve this.
The key result from this exercise is that a result using has the same order as the result obtained from a more rigorous analysis. When is large, the ratio of the two is just .
To match the literature, weβll use as the contact rate, but we should have some minor reservations about this choice. A constant implies any updated node will update any non-updated node with fixed probability (e.g.Β ). This is an of the contact rate and implies a faster completion time. This value necessarily ignores collisions and the one message per node-round cap weβve imposed.
If we solve for and set , we get an equation that allows us to draw a some nice conclusions about the average performance of the algorithm.
When is large and , . With this, we can say that after rounds almost all nodes have received the update, .
Section 4: Calculating Variance of Time To Gossip
Weβre still in search of a way to describe the right tail of the distribution of . In this section weβll calculate the distributions of the the system spends with exactly nodes updated. This approach will confirm our previous result about the expected completion time and provide the variance for the entire process.
As in the previous section, weβll assume each of edges sends a message with probability per round. Because peer selection is randomized, we may treat the time to send a message on any edge as exponentially distributed with mean of rounds (i.e.Β ).
Regardless of the number of rounds elapsed, the waiting time until is incremented is distributed as the minimum of independent exponential variables with parameter .
Using order statistics we can show that the value from a sample of exponential random variables with parameter is exponentially distributed with parameter . In our case, the time until another node is updated is distributed as .
Here each means β1 node updated, n - 1 nodes unchangedβ, β2 nodes updated, n - 2 nodes unchangedβ, etc. We can sume the expectation and waiting times in each of these states up to βn nodes updatedβ to find an alternative estimate for .
We now must compute the expectation and variance (in rounds) of time spent in each state. Letβs define as the first time the system has at least nodes updated. Weβll then confirm that . Our new framing of the problem should preserve our expectation of .
Excellent, just as expected! Because the waiting times are independent variables we can sum the variances of all waits to get .
The last approximation uses the fact that the sum of inverse squares (see: The Basel Problem) converges to . When is sufficiently large we can approximate as . This is an interesting sum by itself, but it also gives us a quick, useful result. It suggests variance is bounded even with large !
This is an OK answer. We set out to understand the risk of large divergences when is very large, but what if we needed a distribution? Iβd suggest the following:
- The maximum-entropy distribution given a minimum () and a mean () follows the pareto distribution. Can we solve for the shape parameter of the Pareto to produce a distribution which we will overstimate the weight on the tails?
With any luck (focus, time, etc.), I will have a follow up post on this soon.
In a diversion off the last section, I showed may be a more reasonable estimate for than . We can show the variance is still bounded when .
So what are the implications of having bounded variance? We confirmed that increases slowly with and showed stays near the expectation even for very large . If we use gossip to disseminate information in massive systems we donβt need to worry about worst case performance becoming orders of magnitude worse than the expectation. If we had the distribution of we could find the cumulative distribution function and get a neat final answer, but for today the mean and variance will suffice.
Section 5: Gossip Algorithms For the Real World
Finally, Iβd like to mention a few variations of the idealized gossip algorithm that can make it more suitable for real-world distributed systems. This section does not build on any previous results, but instead answers a few questions about making gossip more robust.
: Stopping The Gossip Process β Deactivate After Message to Updated Node
The graph above illustrates what happens when the deactivation probability is set . When set properly the failure percentage is near zero. In this example, the algorithm will fail in ~20% of attempts.
We havenβt described a mechanism for gossip. In distributed systems we communicate information about remote state by passing messages, so are we meant to send another round of messages to signal that gossip has completed?
Not necessarily! From the previous section, we expect that gossip completes in rounds. If we set as a cap on the number of rounds gossiped per node, we can show that for some , all nodes will be updated (with high probability) the algorithm stops. Luckily, there are solutions which are less message intensive. Demers demonstrates we can achieve the same completeness guarantees if we deactivate nodes with probability after each message sent to an already updated node.
Broadcast to All Peers Per Round
Spreading Multiple Updates w. .
The graphs above illustrate how reliable broadcast built from point-to-point messaging and single-round broadcast perform when faced with unreliable networks. The latter is successful with far fewer wasted messages.
In and we assumed and then revised these values to be a bit more realistic. What about redoing those calculations with ? Weβll try to achieve faster gossip time at the cost of some additional message overhead. Can we simply select multiple peers per round? What about broadcasting a single update to nodes in the network in a single round? Gossiping multiple point-to-point messages per round can be an effective way to improve the speed of the gossip algorithm. However, broadcasting an update to all nodes from a single site (βsingle-round broadcastβ) is normally not the best choice in large distributed systems.
In practice we must account for unreliable networks and incomplete information. Creating a reliable broadcast from point-to-point messages helps us be resilient to both of these failure modes. We construct reliable broadcast from single-round broadcast, but it comes at a cost of significantly more messages transmitted when networks are unreliable.
Imagine a system that receives twenty updates in quick succession. Should we initialize twenty unique gossip processes or is there a more efficient way to spread this information? One solution is to use pull-based messaging. With pull-based difference resolution, every node contacts one peer per round and we mark a node updated if theyβve an updated node. When there are many active updates, Itβs very likely that any node contacts a peer with at least one new update. If the system receives updates infrequently and far apart, pulling creates additional messaging overhead with relatively little benefit.
Pull-Based Gossip
The average performance of pull-based messaging is a bit faster than the standard (push) algorithm. We can approximate the probability a node has not been updated after rounds as a recurrence relationship for both push and pull-based gossip.
In the earlier rounds, the performance of the two algorithms is close
because
.
When
is smaller, the pull-based algorithm decreases to zero quadratically
while the push based method decreases by a constant factor
.
It was demonstrated
that using a combination of push in the beginning and then pull towards
the end, total messages sent and time to finish gossip can be improved
compared to either algorithm in isolation.
Section 6: Closing Remarks
This concludes our tour of gossip algorithms. Letβs do a brief review:
Gossip is used to spread information in a distributed system and is resilient to incomplete membership information and unreliable networks.
Gossiping with point-to-point messaging starts relatively slowly and has a minimum of rounds.
The number of rounds needed to complete the gossip process is unbounded from above.
Gossip scales well as the number of nodes in a system increases, with bounded variance
Variations can be made on the idealized gossip algorithm to make it more suitable for use in real distributed systems.
References & Further Reading