About two weeks ago I released a preliminary draft of my work on Consus and the complete work-in-progress source code. I have received a few questions about what that means for HyperDex, and why I have chosen to switch my attention from HyperDex to Consus. In this post, we will walk through the reasons for focusing on Consus instead of HyperDex.
While working on HyperDex, the team received many emails inquiring about HyperDex and its capabilities. Beyond business or support emails, the most commonly asked question was, "Can HyperDex support replication across multiple data centers?" HyperDex has no good answer to this question. It was built to provide strong consistency to applications and was designed for a single data center. These design decisions led to an architecture that, if blindly transplanted into a geo-replicated setting, would perform poorly.
Consus was designed to support geo-replication from day one. In this post we will look at the design goals of Consus and how these goals informed its overall architecture. We will also look at how experience with HyperDex informed the design of Consus.
Consus is designed with some specific goals in mind:
- Geo-replication: Data should be replicated across multiple geographically distributed data centers. This is the primary goal of Consus.
- Data center symmetry: Data centers should be treated equally by the system. No one data center should be elevated in importance above the others.
- Better fault tolerance: Failure should be the normal case path. The system should use protocols that can continue in the face of failed or transiently unavailable servers.
Consus supports geo-replication across multiple data centers with strong transactional guarantees. This allows any data center to crash or become partitioned away without any loss of correctness for the system. Whereas other geo-replicated systems often offer weaker guarantees, Consus provides strict serializability. This is a matter of providing a usable fault tolerance guarantee: If data that has been "committed" by the system can later be lost, the programmer must shoulder a significantly larger burden to write robust and correct systems.
To provide a usable fault tolerance guarantee when geo-replicating requires inter-data center communication. Here, the dominant cost is communication between data centers. Even with modern networks, moving data across the country can incur tens to hundreds of milliseconds of latency. In contrast, a modern SSD can perform a synchronous write in less than a millisecond and a round trip within a data center is at most a few milliseconds. Consequently, Consus' design actively favors intra-data center activity to avoid inter-data center communication.
Unlike Consus, HyperDex's design assumed that it would be deployed in a single data center. This leads to a different set of trade-offs. Round trips between data centers are not considered at all; instead, the system focuses on providing high throughput in a single data center by optimizing the intra-data center communication patterns. If HyperDex were deployed naively across multiple data centers, its assumptions regarding fast and largely symmetric messaging costs would not hold and the system would perform poorly.
Data Center Symmetry
Consus' design ensures that all data centers are treated the same by the system. This ensures that there is no Data Center Football carried by the network operations team that contains a Big Red Button for data center failover. Every data center is equal in the eyes of the system so no matter which data center is lost, the system will react the same without operator intervention.
This symmetry yields an intellectually elegant implementation. There is no bifurcation of the code into active and passive code paths, and there is no mechanism to change between multiple code paths. Doing so without a period of unavailability is quite an engineering challenge, and it gets exponentially harder if one is trying to maintain high performance and strong guarantees during the change. A purely symmetric design has one major code path that is followed in all situations—whether all data centers are functional or not.
The value of this approach goes beyond a simplified implementation and keeps the system more available under failure. Because there are no special actions to take when a data center fails, the system can keep making progress in case of a failure; in fact, the Paxos protocol employed by Consus does not require that a data center's failure even be made known to the system for a majority of the data centers to make progress.
The biggest improvement Consus makes over HyperDex is its approach to fault tolerance. Every component of Consus is designed to avoid a single point of failure by replicating data and state. Consus uses a variant of Paxos or quorum replication for each of these replicated components. This means that a complete server failure will not hold up progress of the system. The system can lazily replace the failed server in the background. HyperDex's value dependent chaining requires reconfiguring the chains in order to allow a write to make progress. The consequence of HyperDex's design is that a failure or transient latency spike is much more likely to trigger reconfiguration action within the system and to introduce periods of high latency during the reconfiguration.
A more subtle result of the use of Paxos instead of chain-based protocols is that Consus hides transient latency anomalies. The benefits of a protocol that uses a quorum to achieve its outcome is that the first quorum of servers to complete the request enables the request to complete in whole. For any given request a different minority of servers can exhibit high latency without impacting the latency of the overall request. HyperDex's chaining protocol incurs lower replication cost for the same degree of fault tolerance, but does not cannot tolerate latency anomalies in the same way that Consus can.
Finally, because Consus is able to transparently mask failures, it can take the coordination of replica sets off the critical path. In both Consus and HyperDex a replicated coordinator maintains group membership for the system. For HyperDex, this coordinator is on the critical path for failure recovery in HyperDex and must issue a new configuration after each failure to enable value-dependent chaining to route around the failure. Consus employs a similar replicated coordinator, but keeps the configuration out of the critical path. Any failure can be handled in the background while Paxos transparently masks the failure. Ultimately both Consus and HyperDex can continue in the face of a failure, but Consus does a better job of masking a failure and keeping latency consistent during a failure.
What Happens to HyperDex?
Consus provides a set of features and trade offs that are generally superior to HyperDex. It provides geo-replication and can tolerate strictly more modes of failure than HyperDex can, even when deployed in a single data center. In practice, this means that Consus upholds the same consistency guarantees as HyperDex and strictly stronger fault tolerance guarantees.
For this reason, I believe that Consus' design is technically superior to HyperDex and have switched my attention to developing Consus full time. The dependencies of HyperDex are reused to form a solid foundation on which to build Consus. Further, experience with HyperDex and its limitations have informed the design of Consus—especially with regard to availability or latency during failure.
HyperDex is not actively maintained as I feel that building Consus will be more useful in the long term. I have made publicly available all the build scripts and other maintenance tools used to build and distribute HyperDex to make it easier for others to pick up and maintain the project if they choose to do so.
For Consus, the same build scripts are publicly available from the start, enabling others to contribute to Consus and keep up with its development. Further, the Consus website is also made publicly available. This is a deliberate decision to encourage those looking to work with either HyperDex or Consus to be able to contribute to Consus, whether the contributions be source code, documentation, testing, or web page improvements.
There has been some speculation on Hacker News and other places about the state of HyperDex, and there are a some GitHub issues asking about the state of HyperDex. I would like to ask people to refrain from speculating about what happened with the project, as such speculation makes a number of assumptions using incomplete information.
There was a period of "radio silence" from me regarding HyperDex during which I was no longer associated with the research group or commercial effort behind the project, but did not yet have a sufficient prototype of Consus to discuss the work publicly. During this time, HyperDex seemed to be dead; GitHub issues were left unresolved, emails were largely unanswered, and there was no public commit activity. Because I was developing Consus in private, I was not able to devote any time to HyperDex.
As pointed out on Hacker News , the lack of communication regarding the project was not something unique to the transition period between HyperDex and Consus. In addition to being a full time Ph.D. student, I was also the sole supporter of both the open source and commercial sides of HyperDex, while also developing the project further. For a project with 100K+ lines of code, this is quite the burden for one person to shoulder. I was learning as I went and clearly had much to learn about growing a community around a large project.
For Consus, I am looking to learn from this experience and try to build a community to help manage the project. The entire project and the pieces necessary to keep it going are published and open source from the start. My hope is that this will help build a community around the project that will make it easier to avoid the pitfalls that held HyperDex back from wider adoption.
Consus is the logical successor to HyperDex.
In this post, we walked through the technical contributions of Consus that make it sufficiently superior to HyperDex that I have switched my focus from developing HyperDex to developing Consus. In short, Consus takes into account lessons learned developing HyperDex and makes a different set of design decisions to achieve efficient geo-replicated transactions.
- HyperDex paper The HyperDex paper explains the design and implementation of HyperDex. While the implementation has progressed far beyond what is described in the paper, the fundamental design decisions are largely the same.
- Consus paper The Consus paper explains the design of Consus. The implementation is not yet complete, but the paper outlines the ideas of the system and the current state of the implementation.